Vision Maintenance AI Agent

Overview

Built for the Google Gemini AI Challenge 2026, this multimodal AI agent enables maintenance technicians to diagnose equipment issues by pointing a camera at a device and describing the problem through natural conversation.

Problem

Maintenance technicians often spend significant time diagnosing equipment issues — cross-referencing manuals, searching past incident reports, and consulting senior colleagues. This slows down repair times and increases downtime costs.

Solution

An AI agent that combines:

Computer vision to analyze equipment images and identify visual anomalies
Natural language understanding to process technician descriptions of symptoms
Knowledge retrieval to surface related past incidents and recommended repair procedures
Work order generation to streamline the documentation process

Tech Stack

Python
Google Gemini (multimodal API)
TensorFlow
Docker

Architecture

The system uses Google Gemini’s multimodal capabilities to process both camera input and voice/text descriptions simultaneously. It retrieves relevant maintenance history from a vector database and generates actionable diagnostic recommendations.

Key Features

Real-time visual analysis — Point camera at equipment, get instant diagnostic suggestions
Conversational interface — Describe symptoms naturally, ask follow-up questions
Historical context — Automatically surfaces related past incidents and solutions
Work order generation — Creates structured maintenance work orders from the diagnosis