← Back
2026·shipped

Cannondale BI Agent

Ask a question a sales rep would actually ask. Get an answer they can actually use.

Role
Sole engineer — DataScience GenAI Bootcamp project
Stack
LangGraphAdvanced RAGChromaDBGPT-4o-miniPythonStreamlit
Problem

BI tools answer questions you already knew to ask. A sales rep asking 'which of our road bikes have the highest dealer margin in the Pacific region this quarter?' is not asking a SQL question — they're asking a business question. The hard part isn't text-to-SQL; it's building a system that's trustworthy enough for production: no hallucinated tables, no stale schema assumptions, confidence scores a non-technical user can interpret.

Approach
  1. RAG pipeline over ChromaDB embeddings of product catalog, dealer data, and pricing — retrieval runs before any SQL generation so the model sees grounded context, not just a schema dump.
  2. LangGraph for multi-turn memory: the agent tracks conversation state so follow-up questions ('what about Q3?') don't require the user to re-specify the full context.
  3. Human feedback and confidence scoring on every response — the agent returns a confidence level alongside the answer; low-confidence responses surface the underlying retrieval chunks for inspection.
  4. AI explainability layer: for each answer, the agent generates a brief explanation of which data points drove the conclusion, so a rep can sanity-check the reasoning, not just the number.
  5. RAG evaluation suite with RAGAS metrics — precision, recall, and faithfulness tracked per query type to catch retrieval drift as the product catalog updates.
Architecture
Key decisions

RAG before SQL, not SQL before retrieval

Early version went straight to SQL generation. It hallucinated column names when the schema had ambiguous naming (two tables both had a 'region' column). Retrieving relevant schema chunks and sample rows first gave the model enough grounding to write correct SQL on the first attempt ~90% of the time.

GPT-4o-mini over GPT-4o

For structured generation over retrieved context, mini matched 4o's accuracy at 20x lower cost. The quality ceiling was the retrieval, not the model — investing in better chunking and metadata filtering paid off more than upgrading the model.

Confidence scores surfaced to the user

Most BI agents hide uncertainty behind a confident-sounding answer. That's dangerous in a sales context where reps act on numbers. We exposed confidence as a first-class output so users know when to dig in rather than blindly trust.

Result

Conversational BI over Cannondale's product catalog deployed on Streamlit. Handles multi-turn sessions, confidence-scored answers, and per-response explainability. RAGAS eval suite tracking retrieval quality across schema updates.

What I'd change

The RAG evaluation suite revealed that precision degrades noticeably when new products are added without re-embedding the full catalog. I'd build an incremental embedding pipeline rather than full re-index — the current approach works but gets slower as the catalog grows.

← All projects