SV
← Back to code
— AI / RAG Solutions

DocBird / MCP Doc Reader

A RAG assistant that answers questions about open-source libraries, grounded in their official documentation.

PydanticAI OpenAI API LanceDB Embeddings FastMCP DuckDB Python
~ when Feb 2026 – Mar 2026
~ status Paused
~ team collaborative
~ kind School Project

An AI-powered documentation search engine that ingests docs from multiple open-source projects, chunks and embeds them into a vector database, and lets users ask natural-language questions via a streaming chat UI. Built as a school project, collaboratively with Andreas and Ludvig.

RAG Pipeline

  • Ingestion fetches documentation from 8+ sources (Pydantic AI, FastMCP, LanceDB, etc.), splits by markdown headers with min/max token chunk sizes, embeds via OpenAI text-embedding-3-small, and loads into LanceDB via dlt
  • Retrieval exposed as an MCP tool (find_information) over HTTP - the API agent consumes it via FastMCPToolset, cleanly separating retrieval from orchestration
  • Generation via gpt-4o-mini with the last N messages as conversation context

Architecture

  • 4 Python microservices (API, ingestion, serving, inspector) + React frontend, each independently containerized via Docker Compose
  • Streaming chat via SSE with abort and edit-and-resend support
  • Conversation persistence in DuckDB with full CRUD
  • Observability via Logfire + OpenTelemetry

Deployment

  • Azure Container Apps via Bicep IaC
  • GitHub Actions CI/CD with OIDC federated identity (no long-lived Azure credentials)
  • Managed identity for Azure Container Registry access
Agentic LogicAI & ML ModelingInfrastructure & APIData EngineeringCore Logic
About this chart
Each axis is a functional pillar; the orange area is where my focus went, and the purple how much of that work was AI-augmented. That AI layer is where tools sped up implementation - architecture, code review, and the quality bar stay mine. I treat AI as a precision tool with strict conventions, not auto-pilot.