Back to Projects

F-16 Technical Manual RAG Pipeline

Fully local, offline RAG system with layout-aware parsing, heading-aware chunking, two-stage retrieval (embedding + reranking), and evaluation module.

Context Precision91%
Faithfulness73%
Context Recall70%

Overview

Built a fully local, offline-capable Retrieval-Augmented Generation (RAG) system using layout-aware document parsing, heading-aware chunking, semantic retrieval, and reranking to generate accurate, source-grounded answers with explicit citations.

Designed and implemented end-to-end retrieval, evaluation, and deployment workflows.

System Demonstration

Pipeline Architecture

Core Pipeline Features

1

Layout-Aware Document Parsing

Docling extracts headings, paragraphs, and tables from the PDF with structural fidelity, including scanned elements via EasyOCR.

2

Heading-Aware Sliding Window Chunking

Token-based, sentence-boundary-respecting chunks (via spaCy) that carry section heading context - preventing information loss across section breaks.

3

Two-Stage Retrieval

Fast ANN search via Qdrant surfaces a broad candidate set; a Cross-Encoder Reranker then scores and re-ranks for contextual relevance.

4

Citation Grounding

The LLM answers strictly from retrieved context, with every response citing source page numbers and section headings for verification.

5

Quantitative Evaluation

Both retrieval and generation components are scored using RAGAS metrics and custom LLM-as-a-judge scripts. Achieved 91% Context Precision, 73% Faithfulness, and 70% Context Recall on a custom curated dataset.

Key Learnings

  • Reranking significantly improves retrieval quality: Cross-encoder rerankers filter vector-space near-misses that embedding similarity alone cannot reliably distinguish.
  • Fully local pipeline stacks are viable for sensitive domains: Local parsing, embedding, retrieval, and inference eliminates external API dependency and improves deployability for proprietary document systems.
  • Heading-aware chunking is crucial for structured documents: Splitting by token length orphans critical context. Preserving section hierarchy ensures the LLM receives coherent, self-contained chunks.
  • Source citation is a trust requirement: In critical or safety-sensitive applications, grounding every answer to a specific document, page and section is essential.
  • Evaluation replaces guesswork: RAGAS and LLM-as-a-judge workflows enabled measurable optimization of retrieval precision, grounding quality, and chunking strategies.

Tech Stack

Parsing

DoclingEasyOCRspaCytiktoken

Retrieval

QdrantCross-Encoder RerankerANN Search

LLM (Local)

Ollamanomic-embed-textllama3.1:8b

Evaluation

RAGASMLflowStreamlitPlotly

Want to Work Together?

Working with sensitive documentation or need a fully local AI pipeline? Let's discuss designing a privacy-preserving RAG system for your domain.

✉ Get in Touch