Overview
GovIntel is a local-first federal procurement intelligence system. It imports public USAspending contract awards, stores them in PostgreSQL, builds a Chroma-backed retrieval index, and generates citation-grounded market briefs through a FastAPI API and Streamlit UI.
I built GovIntel as a reviewable RAG application with ingestion, indexing, retrieval, analytics, generation, citation checks, Dockerized local setup, CI, tests, and a walkthrough.
Links
Status
GovIntel provides a reviewable end-to-end path: seed contract data, build the retrieval index, run the FastAPI service, open the Streamlit UI, ask a procurement question, and inspect cited award evidence.
I focused this portfolio version on engineering completeness: local setup, retrieval, generation, citation checks, tests, and a walkthrough, rather than production hosting or benchmark claims.
Review Path
The walkthrough shows agency and NAICS inputs, a 1-10 year range control, a DHS cybersecurity question, a generated brief, and cited contract evidence. For hands-on review, the repository documents local setup through make db-up, make db-seed, make index, make run, and make ui.
Problem
Federal procurement data is useful but hard to scan quickly. The goal of GovIntel is to turn contract records into structured intelligence briefs: contractor patterns, agency spending questions, strategic implications from contractor rankings, quarterly spend trends, concentration, and cited award evidence.
What I Built
- Async USAspending award ingestion with pagination and idempotent PostgreSQL upserts
- Typed Pydantic models for awards, analysis requests, contractor summaries, retrieved evidence, and generated briefs
- Chroma-backed vector indexing with sentence-transformer embeddings
- BM25 keyword retrieval, vector retrieval, hybrid merge, and cross-encoder reranking
- SQL analytics for top contractors, quarterly spend trends, and market concentration
- Versioned prompt templates and structured JSON generation
- Fail-closed citation validation before returning a brief
- FastAPI
/api/v1/analyzeendpoint with optionalX-API-Keyprotection - Streamlit UI for choosing filters, generating briefs, and inspecting cited contract evidence
- Docker Compose stack for PostgreSQL, the API, and the UI
Tech Stack
- Backend: Python, FastAPI, Uvicorn, Pydantic v2
- Data: PostgreSQL 16, SQLAlchemy asyncio, asyncpg
- Retrieval: ChromaDB, sentence-transformers, BM25, hybrid retrieval, cross-encoder reranking
- Generation: Gemini provider path, structured JSON prompts, optional Hugging Face provider path
- Frontend: Streamlit
- Operations: Docker Compose, GitHub Actions, Ruff, mypy, pytest, pytest-cov
- Optional extensions: Langfuse tracing, Pinecone mirroring, offline evaluation and QLoRA utilities
Architecture
The system flow is:
- Pull bounded USAspending award data into normalized contract models.
- Persist contract rows in PostgreSQL.
- Build local retrieval indexes with Chroma vector search and BM25 keyword search.
- Merge and rerank retrieval candidates.
- Compute SQL analytics for contractor ranking, spend trend, and market concentration.
- Render a procurement-intelligence prompt with retrieved context and analytics.
- Generate a structured brief through the selected provider.
- Validate citations against retrieved contract evidence before returning the answer.
- Serve results through FastAPI and the Streamlit UI.
Engineering Highlights
- End-to-end application path with ingestion, indexing, API, Streamlit UI, evidence inspection, and regression tests
- Guarded RAG design that rejects briefs whose citation list references award IDs outside the retrieved contract evidence
- Hybrid retrieval stack combining lexical recall, vector retrieval, deduplication, and reranking
- Structured analytics layer for contractor rankings, spend trends, and market concentration
- Provider boundaries that keep external LLM use explicit and configurable
- Quality gates through Ruff, strict mypy, and pytest with a 90% coverage threshold across API, retrieval, generation, ingestion, frontend, evaluation, observability, and training tests
Evaluation and Quality
GovIntel includes evaluation fixtures and an ablation harness; I use them as methodology evidence rather than headline quality claims.
Why It Matters
GovIntel demonstrates my RAG and data-systems work: retrieval, structured generation, analytics, citation validation, API design, local deployment, and evaluation scaffolding over real procurement data.
