Fully local, offline RAG system with layout-aware parsing, heading-aware chunking, two-stage retrieval (embedding + reranking), and evaluation module.
Built a fully local, offline-capable Retrieval-Augmented Generation (RAG) system using layout-aware document parsing, heading-aware chunking, semantic retrieval, and reranking to generate accurate, source-grounded answers with explicit citations.
Layout-Aware Document Parsing
Docling extracts headings, paragraphs, and tables from the PDF with structural fidelity, including scanned elements via EasyOCR.
Heading-Aware Sliding Window Chunking
Token-based, sentence-boundary-respecting chunks (via spaCy) that carry section heading context - preventing information loss across section breaks.
Two-Stage Retrieval
Fast ANN search via Qdrant surfaces a broad candidate set; a Cross-Encoder Reranker then scores and re-ranks for contextual relevance.
Citation Grounding
The LLM answers strictly from retrieved context, with every response citing source page numbers and section headings for verification.
Quantitative Evaluation
Both retrieval and generation components are scored using RAGAS metrics and custom LLM-as-a-judge scripts. Achieved 91% Context Precision, 73% Faithfulness, and 70% Context Recall on a custom curated dataset.
Parsing
Retrieval
LLM (Local)
Evaluation
Working with sensitive documentation or need a fully local AI pipeline? Let's discuss designing a privacy-preserving RAG system for your domain.
✉ Get in Touch