F-16 Technical Manual RAG Pipeline

Fully local, offline RAG system with layout-aware parsing, heading-aware chunking, two-stage retrieval (embedding + reranking), and evaluation module.

GitHub

Context Precision91%

Faithfulness73%

Context Recall70%

Overview

Built a fully local, offline-capable Retrieval-Augmented Generation (RAG) system using layout-aware document parsing, heading-aware chunking, semantic retrieval, and reranking to generate accurate, source-grounded answers with explicit citations.

Designed and implemented end-to-end retrieval, evaluation, and deployment workflows.

System Demonstration

Pipeline Architecture

Core Pipeline Features

Layout-Aware Document Parsing

Docling extracts headings, paragraphs, and tables from the PDF with structural fidelity, including scanned elements via EasyOCR.

Heading-Aware Sliding Window Chunking

Token-based, sentence-boundary-respecting chunks (via spaCy) that carry section heading context - preventing information loss across section breaks.

Two-Stage Retrieval

Fast ANN search via Qdrant surfaces a broad candidate set; a Cross-Encoder Reranker then scores and re-ranks for contextual relevance.

Citation Grounding

The LLM answers strictly from retrieved context, with every response citing source page numbers and section headings for verification.

Quantitative Evaluation

Both retrieval and generation components are scored using RAGAS metrics and custom LLM-as-a-judge scripts. Achieved 91% Context Precision, 73% Faithfulness, and 70% Context Recall on a custom curated dataset.

Key Learnings

Reranking significantly improves retrieval quality: Cross-encoder rerankers filter vector-space near-misses that embedding similarity alone cannot reliably distinguish.
Fully local pipeline stacks are viable for sensitive domains: Local parsing, embedding, retrieval, and inference eliminates external API dependency and improves deployability for proprietary document systems.
Heading-aware chunking is crucial for structured documents: Splitting by token length orphans critical context. Preserving section hierarchy ensures the LLM receives coherent, self-contained chunks.
Source citation is a trust requirement: In critical or safety-sensitive applications, grounding every answer to a specific document, page and section is essential.
Evaluation replaces guesswork: RAGAS and LLM-as-a-judge workflows enabled measurable optimization of retrieval precision, grounding quality, and chunking strategies.

Tech Stack

Parsing

DoclingEasyOCRspaCytiktoken

Retrieval

QdrantCross-Encoder RerankerANN Search

LLM (Local)

Ollamanomic-embed-textllama3.1:8b

Evaluation

RAGASMLflowStreamlitPlotly

Want to Work Together?

Working with sensitive documentation or need a fully local AI pipeline? Let's discuss designing a privacy-preserving RAG system for your domain.

✉ Get in Touch

Previous ProjectMulti-Tenant Biometric Verification & AI Liveness System

Next Project Behavioral Classification Pipeline for Clinical Interview Transcripts