GovIntel

Overview

GovIntel is a local-first federal procurement intelligence system. It imports public USAspending contract awards, stores them in PostgreSQL, builds a Chroma-backed retrieval index, and generates citation-grounded market briefs through a FastAPI API and Streamlit UI.

I built GovIntel as a reviewable RAG application with ingestion, indexing, retrieval, analytics, generation, citation checks, Dockerized local setup, CI, tests, and a walkthrough.

Status

GovIntel provides a reviewable end-to-end path: seed contract data, build the retrieval index, run the FastAPI service, open the Streamlit UI, ask a procurement question, and inspect cited award evidence.

I focused this portfolio version on engineering completeness: local setup, retrieval, generation, citation checks, tests, and a walkthrough, rather than production hosting or benchmark claims.

Review Path

The walkthrough shows agency and NAICS inputs, a 1-10 year range control, a DHS cybersecurity question, a generated brief, and cited contract evidence. For hands-on review, the repository documents local setup through make db-up, make db-seed, make index, make run, and make ui.

Problem

Federal procurement data is useful but hard to scan quickly. The goal of GovIntel is to turn contract records into structured intelligence briefs: contractor patterns, agency spending questions, strategic implications from contractor rankings, quarterly spend trends, concentration, and cited award evidence.

What I Built

Async USAspending award ingestion with pagination and idempotent PostgreSQL upserts
Typed Pydantic models for awards, analysis requests, contractor summaries, retrieved evidence, and generated briefs
Chroma-backed vector indexing with sentence-transformer embeddings
BM25 keyword retrieval, vector retrieval, hybrid merge, and cross-encoder reranking
SQL analytics for top contractors, quarterly spend trends, and market concentration
Versioned prompt templates and structured JSON generation
Fail-closed citation validation before returning a brief
FastAPI /api/v1/analyze endpoint with optional X-API-Key protection
Streamlit UI for choosing filters, generating briefs, and inspecting cited contract evidence
Docker Compose stack for PostgreSQL, the API, and the UI

Tech Stack

Backend: Python, FastAPI, Uvicorn, Pydantic v2
Data: PostgreSQL 16, SQLAlchemy asyncio, asyncpg
Retrieval: ChromaDB, sentence-transformers, BM25, hybrid retrieval, cross-encoder reranking
Generation: Gemini provider path, structured JSON prompts, optional Hugging Face provider path
Frontend: Streamlit
Operations: Docker Compose, GitHub Actions, Ruff, mypy, pytest, pytest-cov
Optional extensions: Langfuse tracing, Pinecone mirroring, offline evaluation and QLoRA utilities

Architecture

The system flow is:

Pull bounded USAspending award data into normalized contract models.
Persist contract rows in PostgreSQL.
Build local retrieval indexes with Chroma vector search and BM25 keyword search.
Merge and rerank retrieval candidates.
Compute SQL analytics for contractor ranking, spend trend, and market concentration.
Render a procurement-intelligence prompt with retrieved context and analytics.
Generate a structured brief through the selected provider.
Validate citations against retrieved contract evidence before returning the answer.
Serve results through FastAPI and the Streamlit UI.

Engineering Highlights

End-to-end application path with ingestion, indexing, API, Streamlit UI, evidence inspection, and regression tests
Guarded RAG design that rejects briefs whose citation list references award IDs outside the retrieved contract evidence
Hybrid retrieval stack combining lexical recall, vector retrieval, deduplication, and reranking
Structured analytics layer for contractor rankings, spend trends, and market concentration
Provider boundaries that keep external LLM use explicit and configurable
Quality gates through Ruff, strict mypy, and pytest with a 90% coverage threshold across API, retrieval, generation, ingestion, frontend, evaluation, observability, and training tests

Evaluation and Quality

GovIntel includes evaluation fixtures and an ablation harness; I use them as methodology evidence rather than headline quality claims.

Why It Matters

GovIntel demonstrates my RAG and data-systems work: retrieval, structured generation, analytics, citation validation, API design, local deployment, and evaluation scaffolding over real procurement data.