HuggingFace's Eyas is an offline AI security camera agent that transforms raw CCTV footage into structured security event logs using a chain of small, local models. The system runs entirely on-device without relying on cloud APIs, enabling real-time detection and response to potential shoplifting incidents. The project was developed by a team motivated by the challenges faced by small retail businesses dealing with theft.

The Eyas system employs a series of models including YOLO11n for object detection, MiniCPM-V 4.6 for visual analysis, and Nemotron 3 Nano 4B for structured event-log reasoning. The team initially considered using a single vision-language model but found it too slow and unstructured. Instead, they opted for a three-stage pipeline where each model focuses on specific tasks, ensuring efficiency and accuracy. The heuristic structurer in the middle plays a crucial role by converting the VLM's observations into structured JSON before the LLM processes them, making the system more reliable and consistent.

The team faced several challenges during development, including deciding which frames to send to the VLM and ensuring the model could accurately interpret CCTV footage. MiniCPM-V 4.6, while not trained on security footage, performed reasonably well in identifying suspicious behavior. However, it often failed to confirm theft unless it was confident, which led the team to rely more on heuristics than the model's explicit judgment. The system also uses configurable polygons from filenames to assign zones, allowing it to work with arbitrary video feeds.

Source: huggingface