SV
← Back to code
— ML Engineering

Taxi Price Prediction

Predicts taxi fares from ride details plus live distance, traffic, and weather data - a GradientBoosting model behind a FastAPI and Streamlit app.

Scikit-learn Python FastAPI Streamlit GradientBoosting
~ when Sep 2025 – Oct 2025
~ status Archived
~ kind School Project

An end-to-end ML application that predicts taxi fares by integrating ride parameters with real-time external data. Built as coursework for OPA24 AI Engineering.

Data & Features

  • Google Places API for address lookup and autocomplete
  • Google Routes API for distance calculation with departure-time congestion prediction
  • Weather integration via Google Weather API - maps conditions to pricing multipliers (Clear: 1.0x, Rain: 1.15x, Snow: 1.3x)
  • Traffic multipliers - Low: 1.0x, Medium: 1.1x, High: 1.25x
  • 97.7% of the original dataset preserved through intelligent data cleaning

Model Performance

  • GradientBoosting achieved $15.56 MAE with 0.828 R² on 196 test samples
  • Outperformed LinearRegression ($17.00 MAE) and RandomForest ($15.91 MAE)
  • Distance-based features dominate at 61% importance, interaction features (distance × conditions) at 37%

Architecture

  • FastAPI backend serving the trained model via REST endpoint with Pydantic validation
  • Streamlit frontend with multi-page dashboard - performance metrics, dataset exploration, and prediction breakdown
  • Serialized model via joblib for fast inference
AI & ML ModelingCore LogicInfrastructure & APIFrontend & UI
About this chart
Each axis is a functional pillar; the orange area is where my focus went, and the purple how much of that work was AI-augmented. That AI layer is where tools sped up implementation - architecture, code review, and the quality bar stay mine. I treat AI as a precision tool with strict conventions, not auto-pilot.