Amazon Bedrock Data Automation (BDA) now offers a feature called blueprint instruction optimization, which enhances the accuracy of data extraction from unstructured documents. This feature automatically refines extraction instructions to improve precision, reducing the time needed for manual tuning. Users can upload three to ten example documents with expected values, and BDA refines the instructions to better match the document formats and business requirements. This optimization helps address challenges such as varying field labels, document layouts, and edge cases that impact extraction accuracy.
The traditional approach to improving extraction accuracy involved manual iteration, where users tested different instruction phrasings, added context, and refined descriptions through trial and error. This process could take weeks per document type, especially for organizations handling documents from hundreds of vendors. With blueprint instruction optimization, BDA automates this refinement loop, analyzing the differences between its extraction results and the provided ground truth. The system then refines natural language instructions for each field, delivering optimized instructions in minutes instead of weeks. This approach improves accuracy metrics such as F1 score and exact match rate, which are critical for measuring extraction quality.
The source describes how blueprint instruction optimization works by refining the instruction values for each field while keeping the type and inferenceType unchanged. Users can view the full purchase order schema in the GitHub repository, which includes fields with specific types, inference types, and instructions. The optimization process typically completes in minutes, with metrics such as per-file exact match and aggregate exact match providing insights into the accuracy improvements achieved. The feature is accessible through the Amazon Bedrock console or API, allowing users to apply it to their own documents or use the sample solution provided.
Source: awsml