Case Study
CV RAG Chatbot
A live Retrieval-Augmented Generation chatbot โ ask questions about my professional background, powered by LangChain + FAISS.
LangChain LCEL
FAISS
FastEmbed (ONNX)
Groq Llama 3.3
Streamlit
RAGAS
2024
The Problem
My GitHub profile doesn't tell the full story. Most of my best work is under NDA, and a static CV can't answer follow-up questions. I needed a way for recruiters and potential clients to interactively explore my background โ ask specific questions and get context-aware answers.
This is also a live demonstration of RAG architecture โ showing that I can build, evaluate, and deploy LLM applications, not just talk about them.
The Architecture
graph TB
A[User Question] --> B[Streamlit UI]
B --> C[LangChain LCEL Orchestrator]
C --> D[FAISS Vector Store]
C --> E[Groq Llama 3.3]
D --> F[CV + Project Docs]
E --> G[Generated Answer]
G --> B
H[RAGAS Evaluation] --> I[Faithfulness]
H --> J[Answer Relevancy]
H --> K[Context Precision]
style C fill:#6c5ce7,stroke:#7c6df0,color:#fff
style D fill:#00d2ff,stroke:#00b8e6,color:#0a0a0f
style H fill:#f7c948,stroke:#e0b830,color:#0a0a0f
Why It's Hard
- Retrieval quality is everything. If the retriever pulls irrelevant chunks, the LLM hallucinates. I used RAGAS to measure faithfulness, answer relevancy, and context precision โ iterating on chunking strategy and embedding models until metrics were solid.
- Chunking strategy matters. CV data is structured (sections, bullet points). Naive chunking breaks semantic coherence. I implemented section-aware chunking that respects document structure.
- Evaluation-driven development. Instead of guessing whether the chatbot was good, I built a RAGAS evaluation pipeline with synthetic test questions. Every change to chunking, prompting, or retrieval was measured.
- Deployment on Streamlit Cloud. Free tier, no server management. But cold starts and memory limits required optimizing the FAISS index size and model loading.
Technical Stack
- LangChain LCEL โ declarative chain composition: document loading, splitting, retrieval, and prompt management
- FAISS โ vector store for semantic search over CV and project documents
- FastEmbed (ONNX-based) โ local embeddings, no API dependency, fast cold starts
- Groq + Llama 3.3 โ LLM for answer generation via Groq's high-throughput inference API
- Streamlit โ UI framework, deployed on Streamlit Community Cloud
- RAGAS โ evaluation framework measuring faithfulness, answer relevancy, and context precision
What I'd Do Differently
- Add hybrid search (BM25 + semantic). Pure semantic search sometimes misses exact keyword matches. A hybrid approach would improve recall for specific skill/technology queries.
- Implement query rewriting. User questions are often vague. Rewriting them into more specific queries before retrieval would improve context precision.
- Add source citations in the UI. Showing which CV section each answer came from builds trust. Users should be able to click through to the source.
- Experiment with reranking. A cross-encoder reranker after initial retrieval could further improve context precision, especially for ambiguous queries.
Key Takeaways
Building a RAG chatbot is easy. Building one that's actually useful is hard. The difference is evaluation โ measuring retrieval quality, iterating on chunking, and being honest about when the system doesn't know something. RAGAS made this measurable instead of subjective.