SV
← Back to code
— ML Engineering

GlucoPred

Degree thesis

Forecasting blood glucose 15-60 minutes ahead for Type 1 Diabetes, trained on ~2.5 years of my child's CGM and insulin-pump data. An automated pipeline feeds a 45-column feature matrix; an LSTM beats classical baselines (1.28 mmol/L RMSE at +30 min), with Clarke Error Grid analysis showing where accuracy and clinical safety diverge.

Python PyTorch statsmodels Pandas NumPy DuckDB FastAPI React TypeScript Vite Tailwind CSS Recharts Streamlit Vercel Data Pipeline Time-Series Forecasting
~ when Mar 2026 → Ongoing
~ status Active
~ kind School Project

GlucoPred is my degree thesis (examensarbete): forecasting blood glucose 15-60 minutes ahead for a person with Type 1 Diabetes, using ~2.5 years of my child's continuous glucose monitor (CGM) and insulin-pump data (Dexcom via Glooko, 5-minute intervals). All values are in mmol/L (≈18 mg/dL per mmol/L).

Pipeline

  • Automated, idempotent ZIP→DuckDB ingestion of raw CGM and pump exports (language-aware: Swedish/English)
  • CGM cleaning, sensor-session detection, and gap interpolation
  • 45-column engineered feature matrix - lags, rolling statistics, time encodings, and a NovoRapid pharmacokinetic insulin-on-board model (bolus + basal) - against +15 / +30 / +60 min targets, with verified zero future leakage (CI-tested)

Models & evaluation

Baselines (persistence, moving average, linear extrapolation, AR(2)) and an LSTM are scored with RMSE/MAE, time-in-range, and Clarke Error Grid zone analysis (A-E per model and horizon). The LSTM beats every baseline at all horizons - 1.28 mmol/L RMSE at +30 min vs AR(2)'s 1.45 - evaluated on held-out recent data.

The most interesting finding is a divergence between accuracy and clinical safety: although the LSTM is the most accurate model, a plain persistence baseline produces a lower fraction of dangerous-zone errors at longer horizons. The best average-error model isn't automatically the safest near hypoglycemia - surfacing that, rather than just the RMSE headline, is central to the work.

Interface

A React + Recharts UI renders actual-vs-predicted traces with model and horizon selectors. Because the real app runs against personal medical data, it is never hosted - instead, a public demo built on entirely synthetic data is deployed separately, so the interface can be explored without exposing any real record.

Next steps

Gradient-boosting models, hyperparameter tuning, and a deeper clinical error analysis (pinpointing where and when the unsafe predictions occur). Figures reflect the current state of an evolving thesis project.

Core LogicAI & ML ModelingData EngineeringInfrastructure & APIFrontend & UI
About this chart
Each axis is a functional pillar; the orange area is where my focus went, and the purple how much of that work was AI-augmented. That AI layer is where tools sped up implementation - architecture, code review, and the quality bar stay mine. I treat AI as a precision tool with strict conventions, not auto-pilot.