Overview
Built for the Google Gemini AI Challenge 2026, this multimodal AI agent enables maintenance technicians to diagnose equipment issues by pointing a camera at a device and describing the problem through natural conversation.
Problem
Maintenance technicians often spend significant time diagnosing equipment issues — cross-referencing manuals, searching past incident reports, and consulting senior colleagues. This slows down repair times and increases downtime costs.
Solution
An AI agent that combines:
- Computer vision to analyze equipment images and identify visual anomalies
- Natural language understanding to process technician descriptions of symptoms
- Knowledge retrieval to surface related past incidents and recommended repair procedures
- Work order generation to streamline the documentation process
Tech Stack
- Python
- Google Gemini (multimodal API)
- TensorFlow
- Docker
Links
- GitHub Repository (coming soon)
Architecture
The system uses Google Gemini’s multimodal capabilities to process both camera input and voice/text descriptions simultaneously. It retrieves relevant maintenance history from a vector database and generates actionable diagnostic recommendations.
Key Features
- Real-time visual analysis — Point camera at equipment, get instant diagnostic suggestions
- Conversational interface — Describe symptoms naturally, ask follow-up questions
- Historical context — Automatically surfaces related past incidents and solutions
- Work order generation — Creates structured maintenance work orders from the diagnosis