— AI / RAG Solutions
DocBird / MCP Doc Reader
A RAG assistant that answers questions about open-source libraries, grounded in their official documentation.
PydanticAI OpenAI API LanceDB Embeddings FastMCP DuckDB Python
~ when Feb 2026 – Mar 2026
~ status Paused
~ team collaborative
~ kind School Project
An AI-powered documentation search engine that ingests docs from multiple open-source projects, chunks and embeds them into a vector database, and lets users ask natural-language questions via a streaming chat UI. Built as a school project, collaboratively with Andreas and Ludvig.
RAG Pipeline
- Ingestion fetches documentation from 8+ sources (Pydantic AI, FastMCP, LanceDB, etc.), splits by markdown headers with min/max token chunk sizes, embeds via OpenAI
text-embedding-3-small, and loads into LanceDB via dlt - Retrieval exposed as an MCP tool (
find_information) over HTTP - the API agent consumes it via FastMCPToolset, cleanly separating retrieval from orchestration - Generation via
gpt-4o-miniwith the last N messages as conversation context
Architecture
- 4 Python microservices (API, ingestion, serving, inspector) + React frontend, each independently containerized via Docker Compose
- Streaming chat via SSE with abort and edit-and-resend support
- Conversation persistence in DuckDB with full CRUD
- Observability via Logfire + OpenTelemetry
Deployment
- Azure Container Apps via Bicep IaC
- GitHub Actions CI/CD with OIDC federated identity (no long-lived Azure credentials)
- Managed identity for Azure Container Registry access
About this chart
Each axis is a functional pillar; the orange area is where my focus went, and the purple how much of that work was AI-augmented. That AI layer is where tools sped up implementation - architecture, code review, and the quality bar stay mine. I treat AI as a precision tool with strict conventions, not auto-pilot.