LLM Provider Routing
Supported Providers
| Provider | Models | Use Case |
|---|---|---|
| Anthropic | Claude Sonnet 4.6, Claude Haiku 4.5 | Primary provider for all roles |
| OpenAI | GPT-4.1, GPT-4.1-mini, GPT-4o | Fallback or alternative provider |
| Ollama | Any local model | Air-gapped environments |
Role-Based Model Assignment
Each AI task has a "role" with its own model configuration:
| Role | Default Model | Fallback | Purpose |
|---|---|---|---|
builder | Claude Sonnet 4.6 | GPT-4.1 | Program documentation generation |
file_docs | Claude Sonnet 4.6 | GPT-4.1-mini | File/field documentation |
system_overview | Claude Sonnet 4.6 | GPT-4.1 | Architecture analysis |
chat_planner | Claude Haiku 4.5 | GPT-4.1-mini | Query intent detection, context planning |
chat_answer | Claude Sonnet 4.6 | GPT-4.1 | Full chat responses with source code |
chat_answer_simple | Claude Haiku 4.5 | GPT-4.1-mini | Quick answers without code analysis |
chat_classifier | Claude Haiku 4.5 | GPT-4.1-mini | Intent classification |
code_developer | Claude Sonnet 4.6 | GPT-4.1 | COBOL code generation from specs |
spec_doc | Claude Haiku 4.5 | GPT-4.1-mini | Specification document generation |
attachment_ocr | Claude Haiku 4.5 | GPT-4o | Image text extraction (OCR) |
Fallback & Extended Thinking
Each role has a primary and fallback model. If the primary fails (rate limit, timeout, error), the fallback is tried automatically.
For complex tasks, "extended thinking" allows the model to reason internally before responding. Configurable budget per role (e.g., BUILDER_THINKING_BUDGET=8000 tokens). Requires Anthropic provider.
Prompt Caching
For Anthropic models, large system prompts (>4000 chars) are automatically marked for ephemeral caching, reducing cost on repeated calls with the same system prompt.
Build Pipeline
Stage 1: Parse
Reads CSV metadata files from the AS/400 extraction: TTPGMOUT.csv (program list), TTPGM2PGM.csv (call relationships), TTPGM2FIL.csv (file access), TTFIL2FLD.csv (field definitions).
Stage 2: Build Program Documents
For each program, the LLM analyzes the source code and generates structured documentation.
- Programs batched by size (max 18 sections, 80K chars per batch)
- Large programs (>2500 lines) get smaller batches
- Up to 6 concurrent LLM calls (
MAX_PARALLEL_LLM)
Output per program (output_program/{PROGRAM}.json):
{
"documentMetadata": { "programName", "generatedAt", "language" },
"programInfo": { "purpose", "businessContext", "module", "description" },
"io": { "inputs": [...], "outputs": [...] },
"fileAccess": [{ "fileName", "accessType", "fields": [...] }],
"callGraph": { "callsThisPgm": [...], "thisPgmCalls": [...] },
"errorHandling": [{ "code", "description" }],
"changeImpactReport": { ... }
}
Stage 3: System Overview
The LLM analyzes the complete call graph, file dependencies, and program documentation to generate a system-level architecture document with narrative description, module identification, key business flows, risk assessment, and statistics.
Stage 4: Embedding & Indexing
- ChromaDB: Stores semantic vector embeddings of program and file descriptions for RAG search
- BM25: Full-text index of source code, field names, and descriptions for exact-match queries
Stage 5: Auto-Tagging
The AI classifies each program by business function and assigns appropriate tags from the collection's tag set.
Progress Tracking
build_progress.json— Structured progress for UI (stage, done/total, last_activity)build_log.txt— Human-readable log (appended in real-time)- Admin panel shows live progress with floating log window
Specification Document Generation
On-demand generation of formal specification documents at three audience levels:
| Audience | Title | Content Focus |
|---|---|---|
business | Business Specification | Business purpose, rules, stakeholder impact |
analyst | Systems Analysis Document | Data flows, interfaces, business logic |
programmer | Program Specs | IO structures, call graph, error handling, change impact |
Documents are cached in the generated_spec_docs table. Stale detection compares generation timestamp vs program JSON modification time. Manual regeneration is available via "Regenerate" button.
Chat System
Afunana's chat interface allows users to ask natural language questions about their codebase using a Retrieval-Augmented Generation (RAG) pipeline.
| Mode | Purpose | API |
|---|---|---|
| Ask | Q&A about the codebase — no changes | /alerts or /alerts/v2 |
| Plan | Generate change plans with approval workflow | /alerts/v2 with planning |
Query Processing Pipeline
Step 1: Intent Classification
The chat_classifier role (Claude Haiku) determines the query type: bug analysis, feature request, design question, documentation lookup, or code generation.
Step 2: Query Planning
The chat_planner role analyzes the query and determines what context is needed: which programs are relevant, which files to examine, whether source code is needed, and what search queries to run.
Step 3: Context Retrieval (Hybrid Search)
Two search systems run in parallel:
- ChromaDB (Semantic) — Searches embedded descriptions, returns top K results by vector similarity. Good for conceptual queries.
- BM25 (Full-Text) — Searches source code, field names, descriptions by term frequency. Good for exact matches.
Hybrid Search Score Weighting
| Source | Weight |
|---|---|
| Source code match | 0.50 |
| Documentation hit | 0.35 |
| Code keyword match | 0.25 |
Step 4: Source Code Selection
| Scenario | Max Lines |
|---|---|
| Normal query | 500 lines |
| Bug analysis | 1,500 lines |
| Large program threshold | 500 lines |
Step 5: Response Generation
The chat_answer role (Claude Sonnet) generates the response with extended thinking enabled (10,000 token budget by default). Response includes markdown formatting, code snippets, and line references with citations linking to specific programs and line numbers.
Step 6: Streaming
Responses stream token-by-token to the frontend. Users see the response being generated in real-time. Stop button allows canceling mid-generation.
Chat Session Management
- Sessions stored in
dbo.chat_sessions, messages indbo.chat_messages - Grouped by user and collection
- Sessions listed in sidebar with title (derived from first message), grouped by date
- Rename and delete individual sessions; clear all sessions option
- Previous messages sent as context with configurable window size
Chat Attachments
| Type | Formats | Processing |
|---|---|---|
| Images | PNG, JPEG, WebP, GIF | OCR via vision LLM |
| Text | TXT, CSV, LOG | Direct text extraction |
Maximum 10 MB per file, 5 files per message. Extracted text added to the query context.
MCP Tools (Model Context Protocol)
Afunana exposes 7 MCP tools for integration with Claude Desktop and other MCP clients:
| Tool | Purpose |
|---|---|
get_alerts | Chat query with RAG retrieval |
build_all | Trigger full collection rebuild |
get_info_pgm | Fetch program metadata |
get_all_pgms | List all programs |
get_all_fils | List all files |
tool_get_html_tre | Get call tree HTML |
add_doc_to_docs | Upload documentation |
Chat Configuration
| Setting | Default | Description |
|---|---|---|
CHROMA_SEARCH_K | 10 | Semantic search result count |
BM25_TOP_K | 15 | Full-text search result count |
CHAT_ANSWER_MAX_TOKENS | 12000 | Max response tokens |
CHAT_ANSWER_THINKING_BUDGET | 10000 | Extended reasoning budget |
CHAT_SOURCE_CODE_MAX_LINES | 500 | Max source lines in context |
CHAT_SOURCE_CODE_BUG_MAX_LINES | 1500 | Max lines for bug analysis |
CHAT_CACHE_ENABLED | false | Cache similar responses |
CHAT_CACHE_THRESHOLD | 0.82 | Similarity threshold for cache hit |
Privacy
Chat content logging to audit trail is configurable (AUDIT_LOG_CHAT_CONTENT). When disabled, only the event type and metadata are logged. Chat sessions are per-user and per-collection — users cannot see each other's conversations.