[Feature]: Bedrock latency-optimized inference #7606

marchellodev · 2025-01-07T15:13:39Z

This feature decreases the latency of a couple of models:

Anthropic Claude 3.5 Haiku | us.anthropic.claude-3-5-haiku-20241022-v1:0 | US East (Ohio)
Meta Llama 3.1 70B Instruct | us.meta.llama3-1-70b-instruct-v1:0 | US East (Ohio)
Llama 3.1 405B Instruct

No

No response

krrishdholakia · 2025-01-07T18:49:30Z

interesting - would you expect this to be passed to bedrock by default or opt in? @marchellodev

marchellodev added the enhancement New feature or request label Jan 7, 2025

Provide feedback