Software

AWS Uses Multimodal AI to Enable Searchable Aerial Imagery

AWS and Vexcel developed a system to make aerial imagery searchable using natural language, achieving high F1 scores with Amazon Nova Multimodal Embeddings.

AWS and Vexcel have developed a system that turns vast aerial imagery libraries into searchable knowledge bases using natural language queries. Traditional methods required manual inspection or custom models for each task, but the new approach uses multimodal embeddings, large language models, and vector search to index data once and query it efficiently. This system was tested on multi-view aerial imagery collected across 45+ countries, offering a faster and more scalable solution for geospatial data analysis.

The collaboration focused on evaluating embedding models, fusion strategies, caption integration, and search methods. AWS worked with Vexcel to assess how different approaches affect search accuracy, ultimately finding that Amazon Nova Multimodal Embeddings delivered the highest F1 scores across benchmark queries. The system was built on Amazon Bedrock and Amazon OpenSearch Serverless, enabling efficient ingestion, processing, and retrieval of large-scale aerial data. This work led to the creation of Vexcel Intelligence, a product now in preview that allows users to query their imagery library using natural language.

The project addressed the unique challenges of geospatial imagery search, which differs from consumer image search due to the complexity of multi-view data. A single map tile includes seven complementary perspectives, each capturing different details about the same location. This complexity requires strategies for combining multiple views to ensure accurate results. The evaluation framework used OpenStreetMap as a ground truth source, highlighting the need for clear definitions of what constitutes a correct result in this context.

Source: awsml

Key points

AWS and Vexcel developed a system to make aerial imagery searchable using natural language.
The system uses multimodal embeddings, large language models, and vector search to index data once and query it efficiently.
AWS tested Amazon Nova Multimodal Embeddings, Amazon Titan Multimodal Embeddings G1, and Cohere Embed v4, finding that Amazon Nova delivered the highest F1 scores across benchmark queries.
The system was built on Amazon Bedrock and Amazon OpenSearch Serverless, enabling efficient ingestion, processing, and retrieval of large-scale aerial data.
The collaboration led to the creation of Vexcel Intelligence, a product now in preview that allows users to query their imagery library using natural language.
A single map tile includes seven complementary perspectives, each capturing different details about the same location.

Source: AWS Machine Learning Read the original →

WRITTEN BY

Theo Almeida

AI Software & Developer Tools

Theo covers AI software, developer tools, frameworks, and the platforms builders use every day.

AWS Uses Multimodal AI to Enable Searchable Aerial Imagery

Key points

Related articles

Google Uses Interactions API as Default for Gemini Models

Ampersend Uses Amazon Bedrock AgentCore Payments for AI Agent Billing

Nvidia Launches AI for Science Software Tools

X.ai Introduces /goal for Autonomous Task Execution in Grok Build