Amazon and Anthropic have introduced a cost-optimized solution for digitizing scanned documents by combining Amazon Nova 2 Lite with Claude Sonnet 4.6. The system processes scanned yearbook pages to extract names and match them to faces using spatial reasoning. According to the source, the pipeline handled 336 pages and produced 3,122 name-to-face associations with 93 percent scoring at or above 0.95 confidence. This approach costs about two-thirds less per page than a single-model alternative. The solution is designed to handle the complexities of document layouts and provide scalable, predictable costs for large-scale processing. Source: awsml

The pipeline consists of two stages, each using a different model for specific tasks. In the first stage, Amazon Nova 2 Lite performs native multimodal extraction, detecting photos with bounding boxes and extracting visible names with approximate positions. It also returns page-level metadata like titles and categories. In the second stage, Claude Sonnet 4.6 uses spatial reasoning to match names to faces based on page layout. The system was tested across all 336 pages, showing no meaningful accuracy difference between LOW, MEDIUM, and HIGH reasoning levels for structured extraction. The source states that setting reasoning to LOW is the cheapest option for this task. Source: awsml

Amazon Nova 2 Lite has fixed per-image pricing, making cost forecasting straightforward for large-scale document processing. The model bills image and document page inputs at a fixed per-image rate, regardless of resolution or file size. For a full page extraction including prompt and output, the per-page cost breaks down into image tokens, prompt tokens, and output tokens. The total cost per page is approximately $0.0027 at published input-token rates. This pricing model simplifies cost projections for yearbook-scale workloads, as image input cost scales linearly with page count and is independent of resolution. Source: awsml