Amazon SageMaker AI Async Inference now supports inline request payloads, enabling customers to send inference data directly in the API request body. This change eliminates the need to upload input data to Amazon S3 before invoking the endpoint, reducing network round-trips and simplifying client-side code. The new feature is designed to improve efficiency for workloads with small payloads, offering a more streamlined approach to asynchronous inference.
The update introduces a new Body parameter in the InvokeEndpointAsync API, which accepts raw payload data up to 128,000 bytes. This parameter is mutually exclusive with the InputLocation parameter, which previously required S3 uploads. Customers can now send payloads directly, avoiding the need for an S3 bucket, IAM permissions for S3 uploads, and cleanup strategies for stale objects. The change also reduces latency, lowers costs, and provides immediate validation feedback for size and format errors.
Before this update, the async inference workflow required customers to upload payloads to S3 and then invoke the endpoint with the S3 object URI. This two-step process was efficient for large payloads but added unnecessary complexity for smaller ones. The new inline payload support simplifies the process, making it more accessible for a broader range of use cases. Customers can now choose between inline payloads for small data and S3 uploads for larger payloads, depending on their specific needs.
Source: awsml