Overview

Built for the Google Gemini AI Challenge 2026, this multimodal AI agent enables maintenance technicians to diagnose equipment issues by pointing a camera at a device and describing the problem through natural conversation.

Problem

Maintenance technicians often spend significant time diagnosing equipment issues — cross-referencing manuals, searching past incident reports, and consulting senior colleagues. This slows down repair times and increases downtime costs.

Solution

An AI agent that combines:

  • Computer vision to analyze equipment images and identify visual anomalies
  • Natural language understanding to process technician descriptions of symptoms
  • Knowledge retrieval to surface related past incidents and recommended repair procedures
  • Work order generation to streamline the documentation process

Tech Stack

  • Python
  • Google Gemini (multimodal API)
  • TensorFlow
  • Docker

Architecture

The system uses Google Gemini’s multimodal capabilities to process both camera input and voice/text descriptions simultaneously. It retrieves relevant maintenance history from a vector database and generates actionable diagnostic recommendations.

Key Features

  1. Real-time visual analysis — Point camera at equipment, get instant diagnostic suggestions
  2. Conversational interface — Describe symptoms naturally, ask follow-up questions
  3. Historical context — Automatically surfaces related past incidents and solutions
  4. Work order generation — Creates structured maintenance work orders from the diagnosis