A new AI model called 'Count Anything' has been developed by researchers at Tsinghua University and other institutions. The model is designed to count and label objects across a wide variety of image types, from satellite imagery and medical scans to everyday photos, using only a text prompt. The system combines two approaches to improve accuracy: drawing boxes around large objects and placing points on small, dense targets, then merging the results without double counting. According to the paper, the model outperforms many competitors in tests but still struggles with ambiguous terms and extremely dense scenes. Source: thedecoder

The key idea behind 'Count Anything' is combining two complementary approaches. One focuses on large, clearly visible objects by drawing bounding boxes, while the other handles small, densely packed objects by placing a dot on each detected target. The model merges these predictions into a final point set, ensuring that the same object is not counted twice. The system builds on a pretrained model from Meta called SAM3, adding small adapter components for the counting task instead of retraining the whole model from scratch. This approach allows the model to handle a wide range of image types effectively. Source: thedecoder

The researchers built a custom dataset called CLOC, which spans six different image domains, including everyday photos, satellite imagery, medical tissue samples, microscopic cell images, agricultural images, and bacterial culture photos. The dataset contains about 220,000 images, 619 categories, and 15 million labeled objects. Both error metrics drop sharply as CLOC training data grows, showing the value of large cross-domain counting datasets. Source: thedecoder