High-Level Architecture
+------------------+ +------------------+ +------------------+
| | | | | |
| IBM i / AS400 |<--->| Afunana App |<--->| SQL Server |
| (Source System)| | (Docker) | | (Docker) |
| | | | | |
+------------------+ +--------+---------+ +------------------+
|
+--------+---------+
| |
| Caddy Proxy |
| (HTTPS/TLS) |
| |
+--------+---------+
|
+-------------+-------------+
| |
+-----+------+ +------+------+
| | | |
| Browser | | VS Code |
| (React) | | Extension |
| | | |
+------------+ +-------------+
Components
1. Frontend (React SPA)
- Framework: React 18 + TypeScript + Vite
- UI Library: Shadcn UI (Radix primitives + Tailwind CSS)
- Routing: React Router v6 with lazy-loaded pages
- State: React Context (language, theme) + TanStack Query (server state)
- Internationalization: Custom LanguageContext with 500+ translation keys, full RTL support
- Theme: Light/dark mode with system preference detection
Key pages: Programs, Files, Tree, System Overview, Data Dictionary, Cross Reference, Chat, Tools, Admin (12+ sub-pages).
2. Backend (FastAPI)
- Framework: Python FastAPI + Uvicorn
- Auth: JWT (HS256, 15-min idle, 8-hour max session)
- Database: SQL Server via pyodbc
- AI: Multi-provider LLM routing (Anthropic Claude Sonnet 4.6/Haiku 4.5, OpenAI GPT-4.1/4.1-mini, Ollama)
- Search: ChromaDB (semantic embeddings) + BM25 (full-text)
- AS/400: JDBC via jaydebeapi + jt400.jar
- MCP: Model Context Protocol server (7 tools for Claude Desktop integration)
3. Database (SQL Server 2022)
- Containerized or external instance
- Tables: users, collections, config, audit log, chat sessions, tags, build history, spec docs, token revocation
- TDE encryption at rest (AES-256)
- Automated daily backups with 7-day retention
4. Caddy (Reverse Proxy)
- Automatic HTTPS via Let's Encrypt
- Routes:
/api/*to backend, everything else to React SPA - Security headers (HSTS, CSP, X-Frame-Options, etc.)
- Gzip compression
- Serves static landing site at root domain
5. IBM i Connector
- JDBC connection via jt400.jar
- Extraction job submission (SBMJOB to TTDOCPGM1)
- IFS file download (FTP binary mode)
- Source member read/write for plan execution
Data Flow
Source Extraction Flow
IBM i -> TTDOCPGM1 batch job -> IFS output files -> FTP download -> Data/{collection}/
Extracted files:
TTPGMOUT.csv— Program list with metadataTTPGM2PGM.csv— Call relationships (caller to called)TTPGM2FIL.csv— Program-to-file usage (read/write/update)TTFIL2FLD.csv— File-to-field definitionsTTFLDKEY.csv— Key field definitions- Source members to
programs/directory (EBCDIC to UTF-8)
Build Pipeline Flow
CSV files -> Parse -> LLM Analysis (batched) -> JSON output -> Embeddings -> Search indices
Stages:
- Parse — CSV to structured data (programs, files, relationships)
- Build Documents — LLM generates program JSON (purpose, IO, calls, files, errors)
- System Overview — LLM generates architecture analysis
- Embeddings — ChromaDB stores semantic vectors for RAG
- Auto-Tagging — Classify programs by business function
Request Flow
Browser -> Caddy (HTTPS) -> FastAPI -> Auth middleware -> Route handler -> Response
|
SQL Server (user/collection validation)
|
Data/{collection}/ (program JSON, source code)
|
LLM Provider (if chat or doc generation)
Data Storage
Per-Collection Directory
Data/{collection_name}/
|-- system_overview.json # Architecture analysis
|-- output_program/ # One JSON per program (structured metadata)
|-- programs/ # COBOL/RPG/CL source text
|-- programs_csv/ # Call relationship data
|-- info-from-as400/ # Raw metadata CSVs from extraction
|-- chroma_store/ # ChromaDB vector embeddings
|-- bm25_store/ # BM25 full-text indices
|-- build_progress.json # Live build status
+-- generated_docs/ # User-uploaded documentation
Database Tables
| Table | Purpose |
|---|---|
AFUNANA_USERS | User accounts (username, email, password hash, role, status) |
user_collections | User-to-collection access mapping |
col_packs | Collection metadata (name, language, status, AS/400 config) |
app_config | Runtime configuration (200+ keys, categorized) |
security_event_log | Immutable audit trail with hash chain |
build_history | Build status, timing, error tracking |
chat_sessions / chat_messages | Persistent chat history |
generated_spec_docs | Cached AI-generated specification documents |
collection_tags / entity_tags | Tag definitions and assignments |
revoked_tokens / revoked_sessions | Token/session invalidation |
Network Architecture
| Port | Service | Access |
|---|---|---|
| 443 | Caddy (HTTPS) | Public |
| 80 | Caddy (HTTP redirect) | Public |
| 8001 | FastAPI | Internal (127.0.0.1 only) |
| 1433 | SQL Server | Internal (container network) |
| 8080 | Adminer (DB GUI) | Internal (127.0.0.1 only) |
| 9000 | Deploy Receiver | Internal (Docker bridge) |
Scalability Considerations
- LLM Processing: Configurable parallelism (
MAX_PARALLEL_LLM, default 6 concurrent calls) - Build Batching: Programs processed in batches (18 sections, 80K chars per batch)
- Caching: In-memory caches for config, programs, collections; session storage for UI state
- Database: Connection pooling, parameterized queries, indexed lookups
- Frontend: Code splitting, lazy loading, React Query cache (5-min stale, 30-min GC)