What I Built

Maintenance-Eye is a real-time multimodal maintenance copilot for field technicians. A technician can point a phone camera at equipment, speak naturally, and get back live visual analysis, voice responses, equipment lookup, maintenance context, and work-order support.

This project was built for the Google Gemini Live Agent Challenge and is deployed as a working web application.

Why This Workflow Matters

Maintenance work happens in noisy, high-stakes environments where technicians are moving, inspecting, and handling tools. Traditional maintenance software assumes the user can stop, type, search, and document everything manually.

I wanted to explore a different interface: an AI system that can see, listen, speak, and help the technician act without forcing a context switch away from the inspection itself.

What Is Live Today

The deployed system currently supports:

  • live camera and microphone input from a mobile web app
  • real-time multimodal reasoning with Gemini 2.5 Flash Live
  • voice interaction with barge-in interruption
  • tool-based lookup across assets, work orders, inspection history, safety protocols, and knowledge-base content
  • human-in-the-loop confirmation for critical actions
  • deployment on Google Cloud Run with Firestore as the backing store

System Architecture

The frontend is a mobile-friendly web app that streams camera frames and microphone audio over WebSocket to a FastAPI backend. The backend uses Google ADK to manage the agent loop and pass real-time media to the Gemini Live API. Agent tools handle structured retrieval and work-order operations, and responses stream back as audio, text, and confirmation UI cards.

Agent Tools and Safety Controls

This is not a single-prompt demo. The agent uses dedicated tools for asset lookup, inspection history, knowledge retrieval, work-order support, safety protocol access, and action confirmation.

For safety-sensitive actions, the system keeps a human in the loop: the agent can propose an action, but the technician must explicitly confirm it before execution.

Technical Challenges

The hardest engineering problems were not model selection. They were systems problems:

  • handling bidirectional real-time streaming reliably
  • making voice interaction usable with interruption
  • normalizing noisy ASR output for equipment IDs and maintenance terms
  • designing tool flows that help the user act without letting the model overstep safety boundaries

Demo and Repository

What I’d Improve Next

Next steps would be tighter EAM integration, stronger offline behavior for low-connectivity environments, and more domain-specific agent flows by maintenance discipline.