Vision language models for physical reasoning about sports video. Ball tracking, scoring decisions, and chain-of-thought analysis powered by a DeepStream pipeline, VLM reasoning, and video search.
View Live Project →RefereAI applies vision language models to the problem of sports officiating. The system watches video, tracks the ball, and reasons through scoring decisions with chain-of-thought analysis — the same way a human referee would, but at machine speed.
A DeepStream pipeline handles real-time video ingestion and object detection, while VLM reasoning layers interpret what's happening on the field. Video search lets you query specific plays and moments across entire games.
Built on NVIDIA's Cosmos Reason 2 for spatial understanding, DeepStream 7.1 for video pipeline processing, and LLaVA for vision-language inference — all running on Jetson AGX Orin at the edge.
We build AI systems that watch, reason, and decide — on real-world video, at the edge.
Book a Free Call →