Leveraging Multimodal AI for Traffic Light Detection
We benchmarked multimodal models for recognizing the color of far-distance traffic lights in complex street scenes, comparing SEAL’s LLM-guided visual search (V*) with GPT-4V and YOLO baselines on curated datasets (S2TLD and Google Street View). The study probes small-object recognition, multi-light scenes, and “no-light” negatives, and explores hybrid pipelines that pair precise visual grounding with stronger VQA inference.
Code
Highlights
- SEAL advantage on small/far lights: Best on Small Lights and high-res street-view cases via guided search.
- GPT-4V excels on easy cases: 100% on Big Lights where the signal is large and clear.
- Hybrid idea: Using SEAL for localization plus a stronger VQA (e.g., GPT-4o-mini) improves “No Light” accuracy to 100% while keeping competitive results elsewhere.
Methods
- SEAL (V*): LLM-guided visual search to crop candidate regions → VQA for color choice.
- SEAL + GPT-4o-mini: stronger VQA for option selection over SEAL crops.
- YOLO + GPT-4o-mini: object detector for crops + VQA on crops & scene.
- GPT-4V direct: single-pass VQA on the whole image.
Results (Snapshot)
- Big Lights: GPT-4V = 100% accuracy; SEAL ≈ 93% (misses tied to VQA confidence).
- Small Lights: SEAL = 100% accuracy; GPT-4V ≈ 87%.
- Google Street View HD: SEAL ≈ 45% vs GPT-4o-mini ≈ 27%.
- No Light: SEAL + GPT-4o-mini = 100% (vs 0% with SEAL’s original VQA).
Role & Contributions
- Co-designed the study and prepared portions of the benchmark datasets (S2TLD splits & Google Street View subsets); labeled and verified ground-truth options.
- Implemented Python tooling to run and log VQA trials; evaluated GPT-4V across all datasets and documented outcomes.
- Built data and experiment scaffolding for hybrid pipelines (SEAL crops → GPT-4o-mini option choosing) and compared against YOLO + GPT baselines.
- Attempted LLaVA fine-tuning for SEAL’s VQA head; documented GPU/checkpoint constraints and laid out next-step training plans.
- Co-authored proposal and final report; summarized findings and future work on guided search + stronger VQA integration.
Tech Stack
- Python
- PyTorch
- OpenAI GPT-4V / GPT-4o-mini
- YOLO
- pandas
- Matplotlib