Case Study

CV RAG Chatbot

A live Retrieval-Augmented Generation chatbot — ask questions about my professional background, powered by LangChain + FAISS.

LangChain LCEL FAISS FastEmbed (ONNX) Groq Llama 3.3 Streamlit RAGAS 2024

🤖 Try the Live Chatbot →

The Problem

My GitHub profile doesn't tell the full story. Most of my best work is under NDA, and a static CV can't answer follow-up questions. I needed a way for recruiters and potential clients to interactively explore my background — ask specific questions and get context-aware answers.

This is also a live demonstration of RAG architecture — showing that I can build, evaluate, and deploy LLM applications, not just talk about them.

The Architecture

graph TB A[User Question] --> B[Streamlit UI] B --> C[LangChain LCEL Orchestrator] C --> D[FAISS Vector Store] C --> E[Groq Llama 3.3] D --> F[CV + Project Docs] E --> G[Generated Answer] G --> B H[RAGAS Evaluation] --> I[Faithfulness] H --> J[Answer Relevancy] H --> K[Context Precision] style C fill:#6c5ce7,stroke:#7c6df0,color:#fff style D fill:#00d2ff,stroke:#00b8e6,color:#0a0a0f style H fill:#f7c948,stroke:#e0b830,color:#0a0a0f

Why It's Hard

Retrieval quality is everything. If the retriever pulls irrelevant chunks, the LLM hallucinates. I used RAGAS to measure faithfulness, answer relevancy, and context precision — iterating on chunking strategy and embedding models until metrics were solid.
Chunking strategy matters. CV data is structured (sections, bullet points). Naive chunking breaks semantic coherence. I implemented section-aware chunking that respects document structure.
Evaluation-driven development. Instead of guessing whether the chatbot was good, I built a RAGAS evaluation pipeline with synthetic test questions. Every change to chunking, prompting, or retrieval was measured.
Deployment on Streamlit Cloud. Free tier, no server management. But cold starts and memory limits required optimizing the FAISS index size and model loading.

Technical Stack

LangChain LCEL — declarative chain composition: document loading, splitting, retrieval, and prompt management
FAISS — vector store for semantic search over CV and project documents
FastEmbed (ONNX-based) — local embeddings, no API dependency, fast cold starts
Groq + Llama 3.3 — LLM for answer generation via Groq's high-throughput inference API
Streamlit — UI framework, deployed on Streamlit Community Cloud
RAGAS — evaluation framework measuring faithfulness, answer relevancy, and context precision

What I'd Do Differently

Add hybrid search (BM25 + semantic). Pure semantic search sometimes misses exact keyword matches. A hybrid approach would improve recall for specific skill/technology queries.
Implement query rewriting. User questions are often vague. Rewriting them into more specific queries before retrieval would improve context precision.
Add source citations in the UI. Showing which CV section each answer came from builds trust. Users should be able to click through to the source.
Experiment with reranking. A cross-encoder reranker after initial retrieval could further improve context precision, especially for ambiguous queries.

Key Takeaways

Building a RAG chatbot is easy. Building one that's actually useful is hard. The difference is evaluation — measuring retrieval quality, iterating on chunking, and being honest about when the system doesn't know something. RAGAS made this measurable instead of subjective.

← Back to Projects