Software

Amazon SageMaker AI Async Inference Adds Inline Payload Support

Amazon SageMaker AI Async Inference now allows customers to send inference payloads directly in API requests, eliminating the need for S3 uploads for payloads up to 128,000 bytes.

Amazon SageMaker AI Async Inference now supports inline request payloads, enabling customers to send inference data directly in the API request body. This change eliminates the need to upload input data to Amazon S3 before invoking the endpoint, reducing network round-trips and simplifying client-side code. The new feature is designed to improve efficiency for workloads with small payloads, offering a more streamlined approach to asynchronous inference.

The update introduces a new Body parameter in the InvokeEndpointAsync API, which accepts raw payload data up to 128,000 bytes. This parameter is mutually exclusive with the InputLocation parameter, which previously required S3 uploads. Customers can now send payloads directly, avoiding the need for an S3 bucket, IAM permissions for S3 uploads, and cleanup strategies for stale objects. The change also reduces latency, lowers costs, and provides immediate validation feedback for size and format errors.

Before this update, the async inference workflow required customers to upload payloads to S3 and then invoke the endpoint with the S3 object URI. This two-step process was efficient for large payloads but added unnecessary complexity for smaller ones. The new inline payload support simplifies the process, making it more accessible for a broader range of use cases. Customers can now choose between inline payloads for small data and S3 uploads for larger payloads, depending on their specific needs.

Source: awsml

Key points

Amazon SageMaker AI Async Inference now supports inline request payloads up to 128,000 bytes.
The new Body parameter in the InvokeEndpointAsync API accepts raw payload data directly in the API request.
Inline payloads eliminate the need for S3 uploads, reducing latency and simplifying client-side code.
The Body parameter is mutually exclusive with the InputLocation parameter, which previously required S3 uploads.
Customers can now send small payloads directly, avoiding S3 bucket provisioning and IAM permissions.
The change reduces latency, lowers costs, and provides immediate validation feedback for size and format errors.
Inline payload support is available in 31 commercial AWS Regions.

Source: AWS Machine Learning Read the original →

WRITTEN BY

Theo Almeida

AI Software & Developer Tools

Theo covers AI software, developer tools, frameworks, and the platforms builders use every day.