Afunana — AI & Analysis Pipeline

Overview

Afunana reads the source that actually runs the business — COBOL, RPG, CL, DDS, embedded SQL on IBM i, and PL/SQL on Oracle — and reconstructs the knowledge buried inside it into documentation a business can read, a developer can navigate, and an auditor can trust. Large language models do the writing and the reasoning; a governed gateway decides which model, a deterministic pipeline decides what to analyze, and a knowledge graph keeps the results cross-referenced and current.

This page describes the AI machinery: the One AI Hub gateway that fronts every model call, the model advisor and presets that keep the model roster current, the five-stage build pipeline that turns a whole collection into documentation and a searchable index, and the close-the-loop code generation that produces validated change plans in the codebase's own style.

Two framing points matter throughout:

The moat is the knowledge graph, not the model. Afunana maintains a persistent, queryable map of program → program, program → file, and file → field relationships across the entire estate, kept current as the code changes. A chat window over a context window can explain a snippet shown to it; it cannot hold a cross-referenced map of an entire system. Everything below feeds and draws on that graph.
One gateway, many models, no lock-in. Every LLM call routes through a single governed hub. The organization controls cost, access, and data, and can switch providers or models live with no rebuild.

One AI Hub — the governed gateway

The LLM router is the single gateway through which every AI operation in the platform passes. Nothing calls a provider SDK directly. This is what makes multi-model governance possible: cost telemetry, provider fallback, access control, and data residency are all enforced in one place.

How a call flows:

The caller names a role — a task type such as builder or chat_answer — never a model.
The router reads that role's fallback chain from configuration: an ordered list of provider/model pairs.
It invokes the first pair. On failure (timeout, rate limit, error, or an unsupported parameter it cannot strip) it falls through to the next pair in the chain.
It returns the result with per-call telemetry: provider and model actually used, input/output/thinking token counts, and latency — so cost is measurable per call, not estimated.

Because chains live in the database, an administrator repoints any role at a different provider or model live, with no restart — see Environment Configuration. (The provider API keys themselves are the exception: changing a key is flagged "restart required" in the admin UI.)

Providers

Afunana supports four providers. Any role's chain can mix them freely — for example an Anthropic primary with an OpenAI fallback, or a local Ollama model with an Azure fallback.

Provider	Notes
Anthropic	Prompt caching and extended thinking. Default primary for most roles.
OpenAI	Function calling and reproducible sampling. Common fallback.
Azure OpenAI	First-class provider with its own endpoint and key, via the Azure AI Foundry v1 OpenAI-compatible endpoint. Keeps all LLM traffic inside the customer's Azure tenant.
Ollama	Local, open-source models on the customer's own hardware. Nothing leaves the network — the foundation for fully air-gapped operation.

Azure is a distinct provider, not merely an alternate OpenAI URL. Roles are routed to a named Azure deployment.

LLM roles

The router defines 11 roles. Each has its own fallback chain, token limits, and behavioral parameters, all stored in configuration and editable at runtime. The defaults below pair an Anthropic primary with an OpenAI fallback; every one is fully configurable, and any role can be pointed at an Azure deployment or a local Ollama model instead.

Role	Purpose	Default primary	Notes
`builder`	Program documentation	Claude Sonnet	Streaming enabled; the heaviest consumer in a build.
`file_docs`	File and field documentation	Claude Sonnet
`sql_docs`	IBM i SQL member documentation	Claude Sonnet	RUNSQLSTM / QSQLSRC scripts: summary, statements, files/libraries in use, risk and migration notes.
`system_overview`	System narrative and structure	Claude Sonnet	Two-pass generation.
`chat_planner`	Query planning for the chat agent	Claude Sonnet	Emits the plan JSON (intents + tool calls).
`chat_answer`	Complex chat answers	Claude Opus	Extended-thinking model; manages its own thinking.
`chat_answer_simple`	Simple, direct answers	Claude Haiku	Fast, low-cost path.
`chat_classifier`	Intent classification (folded into the planner)	Claude Haiku	Smallest, cheapest model.
`code_developer`	Change-plan code generation	Claude Opus	Extended-thinking model; style-guarded (see below).
`spec_doc`	Specification documents and auto-tagging	Claude Sonnet
`attachment_ocr`	Vision OCR of chat attachments	Claude Haiku	Screenshots, scanned specs, spool/JCL printouts.

The seeded model identifiers are starting points in the DB, not hardcoded — the model advisor and presets (below) keep them current.

Adaptive parameter stripping

Models disagree on which optional request parameters they accept: some reject a temperature, a seed, or a thinking-budget setting. Rather than fail, the router invokes models adaptively — when a model rejects an optional field, it detects the rejected parameter, drops it, and retries the same call.

This is what lets one fallback chain span Anthropic, OpenAI, Azure, and Ollama with no per-model tuning: a local model that ignores extended-thinking budgets and a frontier model that uses them can sit in the same chain.

Extended thinking & prompt caching

Extended thinking / reasoning models are used for the roles that require multi-step reasoning — chat_answer and code_developer — for bug identification, multi-program analysis, and code generation. Those models manage their own thinking at their default effort; there is no per-role thinking-token budget applied. The adaptive layer keeps them interchangeable in a chain with models that do not reason internally.

Prompt caching (Anthropic) activates automatically when a system prompt exceeds ~4,000 characters. The build pipeline benefits heavily: every program in a batch shares the same large system prompt, so only the first call pays full prompt cost and the rest read from cache. Caching is transparent to callers — the router manages the cache headers.

Model advisor

The model advisor keeps the model roster from going stale without guessing. Critically, it is not an LLM call — it queries each provider's live Models API using the organization's own key, so it never hallucinates a model name.

Discover: call the provider Models API for the models the org's key can actually reach.
Classify: sort discovered models into capability tiers.
Map: map tiers → roles (a top-tier model for chat_answer/code_developer, an economy tier for chat_classifier, etc.).
Diff: compare the recommendation against the live per-role configuration and flag when a newer model is worth adopting.

The advisor is read-only. It surfaces a recommendation and a diff; applying a change is a separate, deliberate action available to administrators.

Model presets

A preset is a named, full per-role model bundle stored in the database. Presets let an administrator capture and switch between complete model configurations in one action — for example shipping variants like Anthropic+OpenAI, Anthropic-only, OpenAI-only, and Azure-only.

Snapshot: capture the current per-role assignments as a named preset.
Apply: replace the live per-role configuration with a saved preset. Apply is reversible — it snapshots the current configuration first, so a preset switch can be rolled back.

Managed by administrators. (Note: a separate "model_catalog" concept is not a live feature.)

Build pipeline

The build orchestrator drives an entire collection of legacy source through five stages. It handles IBM i and Oracle collections through the same knowledge-graph contract; the stages below describe the flow, with IBM i as the worked example.

Stage 1 — Program documentation (builder). Programs are ordered leaves → roots by call depth and documented level by level, in parallel within each level, so a program is documented after the subprograms it calls. Payloads are batched to stay within model context limits.

Stage 2 — File & SQL documentation. File and field docs (file_docs) and IBM i SQL member docs (sql_docs) are generated.

Stage 3 — Collection artifacts. In parallel: the system overview (system_overview, two passes), the data dictionary, and the cross-reference. Then change-impact injection, the build-quality report, and the parameter-check report are produced against the assembled graph.

Stage 4 — Embeddings & index. Chunks are embedded by a local model, intfloat/multilingual-e5-large (via HuggingFace), and stored in ChromaDB; a parallel BM25 lexical index is built alongside. There is no external embedding API — embeddings never leave the box, which is what makes air-gapped operation real.

Stage 5 — Auto-tagging (spec_doc). Up to ~15 business-concept labels per collection, ≤5 per program/file, are assigned automatically. Manual tags are preserved.

Chunking is section-based (by SEC), call-tree-ordered, with semantic slicing for large programs. Copybook rows are excluded from citeable ranges, and sequence numbers are mapped to display lines so that PROGRAM:LINE citations line up with what a developer sees in the source.

Delta / incremental rebuild (the REFRESH phase)

Documentation does not go stale. Build mode changes MD5-hashes each member's source, rebuilds only the programs that actually changed, and surgically updates both the Chroma vectors and the BM25 index for those programs. This content-hash delta rebuild is the REFRESH phase of Afunana's four-phase model, and it is real — a code change re-analyzes what changed, not the whole estate.

Local embeddings & hybrid index

The index the chat agent searches is built entirely on the box:

Vector: intfloat/multilingual-e5-large embeddings in ChromaDB — semantic, multilingual, no external service.
Lexical: a BM25 index for exact program names, field names, and identifiers.

The chat retriever fuses the two with weighted reciprocal-rank fusion. See Chat & RAG System for how retrieval, re-ranking, and the answer loop use this index.

Silent-failure detection (deterministic, no LLM)

Alongside the LLM documentation, two deterministic checker families run with no model involved — they catch the costliest, best-hidden defects: the ones that produce no error message. Every check's severity is set per customer through a check catalog (Off / Note / Error); see Code Quality Analysis.

COBOL (IBM i):

Parameter / interface mismatch — the headline check, and the strongest. A real data-division parser (PIC, COMP-3, OCCURS, REDEFINES byte math) compares each CALL against the target program's expected parameters by count and byte-size/structure. Catches cross-program interface drift that compiles clean and fails silently.
MOVE truncation (move_size_loss) — assignments that silently lose data across mismatched field sizes.
Unsafe control flow — unreachable code after GOBACK, reads with no status handler, scope-terminator mismatches, GOTO into a section.
Dead field — unused 77-level items.

PL/SQL (Oracle): 14 rules, including empty WHEN OTHERS / swallowed exceptions, DML without a WHERE, dynamic-SQL injection, = NULL, hardcoded credentials, COMMIT in a loop, SELECT *, excessive complexity, and dead private routines.

These are deterministic and explainable, not model guesses — the foundation of the "silent-failure layer."

Spec document generation

On demand, Afunana generates specification documents for a single program or a whole collection using the spec_doc role, in three audiences:

Business — what the program does, in business terms, for analysts and stakeholders.
Analyst — data flows, business rules, integration points.
Developer — data structures, technical detail, implementation notes.

Specs export to DOCX or PDF. This is distinct from Bring Your Own Documents (uploading existing PDFs/DOCX into the RAG index) — spec generation produces new documents; see Chat & RAG System.

Close-the-loop code generation

Beyond understanding, Afunana generates structured, validated change plans grounded in the live codebase and, when approved, can write the change back to the live system in the codebase's own style. The default posture is to stop at the plan and hand off to a developer; automated execution is available and human-approved.

Plan generation. The code_developer role produces a validated JSON plan — a title, a summary, and a list of typed steps (source edit, compile, file change, verify) — where each source-edit step carries the target member plus line-level diffs anchored by source sequence number, so edits land on exactly the right lines. Plans are persisted as pending_approval for review.

Style matching by regeneration. A deterministic, fixed-format-aware COBOL style guard scans only the AI-written COBOL and feeds any violations back into the LLM retry loop. Conformance to the codebase's own conventions is enforced by regeneration, not by post-editing.

Approval-gated execution. From the admin web UI, an approved plan is executed with live SSE step events: the executor applies the diffs into a per-plan workspace library on the IBM i, and swaps to production only if all steps succeed, with backup. The whole surface is gated by configuration (off on the public production instance).

Honest caveats. The compile step is currently stubbed. Execution targets the live IBM i, so it is an available, approval-gated capability — not the assumed default. And the VS Code extension renders change plans but does not apply them; execution happens only through this approval-gated admin workflow.

See Chat & RAG System for how a plan is produced from a chat request, and the API Overview for programmatic access.

Prompt templates

Prompt templates live as editable text files and cover program analysis, file/field docs, SQL member docs, the system-overview narrative, query planning and answer composition, code generation with style enforcement, specification formatting, and OCR extraction. Administrators can view and edit them through the admin UI.

Cost & usage metering

Every model call's token usage and cost are recorded. Per-model prices live in a database pricing table that is the single source of truth for costing and is editable in the admin UI — so when a provider changes prices, an administrator updates the table and every subsequent call is costed correctly. A call whose model has no configured price is flagged as unpriced rather than silently costed at zero, so gaps are visible. A Costs view reports spend broken down by build, by query, and by model, turning the per-call telemetry into an accountable ledger.

Audit trail

Every LLM interaction is logged. Chat calls are recorded with the role and model used, the full prompt and response, input/output/thinking token counts, latency, and session/message IDs — supporting debugging, cost tracking, and compliance. Combined with the router's per-call telemetry, the organization can see exactly what each model was asked and what it cost.