AWS and Vexcel have developed a system that turns vast aerial imagery libraries into searchable knowledge bases using natural language queries. Traditional methods required manual inspection or custom models for each task, but the new approach uses multimodal embeddings, large language models, and vector search to index data once and query it efficiently. This system was tested on multi-view aerial imagery collected across 45+ countries, offering a faster and more scalable solution for geospatial data analysis.

The collaboration focused on evaluating embedding models, fusion strategies, caption integration, and search methods. AWS worked with Vexcel to assess how different approaches affect search accuracy, ultimately finding that Amazon Nova Multimodal Embeddings delivered the highest F1 scores across benchmark queries. The system was built on Amazon Bedrock and Amazon OpenSearch Serverless, enabling efficient ingestion, processing, and retrieval of large-scale aerial data. This work led to the creation of Vexcel Intelligence, a product now in preview that allows users to query their imagery library using natural language.

The project addressed the unique challenges of geospatial imagery search, which differs from consumer image search due to the complexity of multi-view data. A single map tile includes seven complementary perspectives, each capturing different details about the same location. This complexity requires strategies for combining multiple views to ensure accurate results. The evaluation framework used OpenStreetMap as a ground truth source, highlighting the need for clear definitions of what constitutes a correct result in this context.

Source: awsml