Leveraging Multimodal AI for Traffic Light Detection

Research · New York University

We benchmarked multimodal models for recognizing the color of far-distance traffic lights in complex street scenes, comparing SEAL’s LLM-guided visual search (V*) with GPT-4V and YOLO baselines on curated datasets (S2TLD and Google Street View). The study probes small-object recognition, multi-light scenes, and “no-light” negatives, and explores hybrid pipelines that pair precise visual grounding with stronger VQA inference.

Code

Highlights

Methods

Results (Snapshot)

Role & Contributions

Tech Stack