2026·shipped

DataScienceGPT

End-to-end ML exploration in minutes, not days — for the analyst who shouldn't need to write model code.

Role

Sole engineer — personal project

Stack

LangGraphRAGChromaDBFLAML AutoMLOpenAITool CallingPythonStreamlit

Problem

A data scientist with a new dataset spends 60–80% of their time on undifferentiated work: EDA, feature engineering, model selection, hyperparameter search, evaluation. That's before any actual insight. For an analyst without ML background, this pipeline is a black box. The goal was a system where 'drop in a dataset, describe your outcome variable' produces a defensible ML solution with explanations — not a magic button, but a force multiplier.

Approach

Decomposed the DS workflow into a team of specialist agents — EDA, feature engineering, AutoML, and explainability — orchestrated by a LangGraph planner that decides which agents run and in what order based on dataset characteristics.
FLAML AutoML handles model selection and hyperparameter search automatically; the AutoML agent interprets results and selects the best model with a rationale the user can read.
Human-in-the-loop checkpoints at EDA summary and model selection: the user can redirect the agent (exclude a feature, change the objective metric) before expensive training runs start.
ChromaDB RAG context for domain knowledge — the system retrieves relevant ML best practices and dataset-specific guidance to inform agent decisions rather than relying solely on the model's parametric knowledge.
Model explainability via SHAP values: the explainability agent surfaces feature importance and local explanations in plain language alongside the technical output.

Architecture

Key decisions

FLAML over manual model search

Writing model selection logic that's actually good — proper CV strategy, handling class imbalance, tuning search space by dataset size — is a solved problem. FLAML does it better than a hand-rolled loop and integrates cleanly as a tool the AutoML agent calls.

Human checkpoints before expensive steps

Early version ran straight through without pauses. Users would realize they'd set the wrong objective metric only after a 10-minute training run. Two checkpoints — after EDA summary and after model selection — let the user course-correct without re-running expensive steps.

Multi-model support from the start

Locking to a single LLM backend meant users on OpenAI quotas hit rate limits mid-session. Making the agent layer model-agnostic (swap provider via config) cost a week of refactoring but made the system usable in practice.

Result

End-to-end ML pipeline from raw CSV to explainable model in under 10 minutes for typical tabular datasets. Deployed on Streamlit Community Cloud with multi-model support.

What I'd change

The planner's ordering is fixed — EDA → FE → AutoML → XAI. That works for tabular classification but breaks for time-series or text inputs. A more dynamic planner that adapts the agent graph to dataset type is the obvious next step.

← All projects