Glossary term
Glossary term
Multimodal AI
Computer vision task of identifying and localising objects in images with bounding boxes and class labels.
Tesla Autopilot uses a real-time object detection pipeline (HydraNet) to detect vehicles, pedestrians, cyclists, and lane markings across 8 cameras at 36 FPS in the vehicle's onboard computer.
YOLO v8 (Ultralytics) is used by a retail loss-prevention system to detect shoplifting behaviour in real time across 500 store cameras, reducing false alarms by 60% vs rule-based motion detection.
DETIC (Facebook) detects 20,000+ object categories by leveraging image classification datasets without bounding-box annotations - used in long-tail detection tasks where annotated detection datasets don't exist.