Leveraging Multimodal AI for Traffic Light Detection

We benchmarked multimodal models for recognizing the color of far-distance traffic lights in complex street scenes, comparing SEAL’s LLM-guided visual search (V*) with GPT-4V and YOLO baselines on curated datasets (S2TLD and Google Street View). The study probes small-object recognition, multi-light scenes, and “no-light” negatives, and explores hybrid pipelines that pair precise visual grounding with stronger VQA inference.

Code

Highlights

SEAL advantage on small/far lights: Best on Small Lights and high-res street-view cases via guided search.
GPT-4V excels on easy cases: 100% on Big Lights where the signal is large and clear.
Hybrid idea: Using SEAL for localization plus a stronger VQA (e.g., GPT-4o-mini) improves “No Light” accuracy to 100% while keeping competitive results elsewhere.

Methods

SEAL (V*): LLM-guided visual search to crop candidate regions → VQA for color choice.
SEAL + GPT-4o-mini: stronger VQA for option selection over SEAL crops.
YOLO + GPT-4o-mini: object detector for crops + VQA on crops & scene.
GPT-4V direct: single-pass VQA on the whole image.

Results (Snapshot)

Big Lights: GPT-4V = 100% accuracy; SEAL ≈ 93% (misses tied to VQA confidence).
Small Lights: SEAL = 100% accuracy; GPT-4V ≈ 87%.
Google Street View HD: SEAL ≈ 45% vs GPT-4o-mini ≈ 27%.
No Light: SEAL + GPT-4o-mini = 100% (vs 0% with SEAL’s original VQA).

Role & Contributions

Co-designed the study and prepared portions of the benchmark datasets (S2TLD splits & Google Street View subsets); labeled and verified ground-truth options.
Implemented Python tooling to run and log VQA trials; evaluated GPT-4V across all datasets and documented outcomes.
Built data and experiment scaffolding for hybrid pipelines (SEAL crops → GPT-4o-mini option choosing) and compared against YOLO + GPT baselines.
Attempted LLaVA fine-tuning for SEAL’s VQA head; documented GPU/checkpoint constraints and laid out next-step training plans.
Co-authored proposal and final report; summarized findings and future work on guided search + stronger VQA integration.

Tech Stack

Python
PyTorch
OpenAI GPT-4V / GPT-4o-mini
YOLO
pandas
Matplotlib