— ML Engineering
Taxi Price Prediction
Predicts taxi fares from ride details plus live distance, traffic, and weather data - a GradientBoosting model behind a FastAPI and Streamlit app.
Scikit-learn Python FastAPI Streamlit GradientBoosting
~ when Sep 2025 – Oct 2025
~ status Archived
~ kind School Project
An end-to-end ML application that predicts taxi fares by integrating ride parameters with real-time external data. Built as coursework for OPA24 AI Engineering.
Data & Features
- Google Places API for address lookup and autocomplete
- Google Routes API for distance calculation with departure-time congestion prediction
- Weather integration via Google Weather API - maps conditions to pricing multipliers (Clear: 1.0x, Rain: 1.15x, Snow: 1.3x)
- Traffic multipliers - Low: 1.0x, Medium: 1.1x, High: 1.25x
- 97.7% of the original dataset preserved through intelligent data cleaning
Model Performance
- GradientBoosting achieved $15.56 MAE with 0.828 R² on 196 test samples
- Outperformed LinearRegression ($17.00 MAE) and RandomForest ($15.91 MAE)
- Distance-based features dominate at 61% importance, interaction features (distance × conditions) at 37%
Architecture
- FastAPI backend serving the trained model via REST endpoint with Pydantic validation
- Streamlit frontend with multi-page dashboard - performance metrics, dataset exploration, and prediction breakdown
- Serialized model via joblib for fast inference
About this chart
Each axis is a functional pillar; the orange area is where my focus went, and the purple how much of that work was AI-augmented. That AI layer is where tools sped up implementation - architecture, code review, and the quality bar stay mine. I treat AI as a precision tool with strict conventions, not auto-pilot.