Tools · AWS ML Blog ·
Build real-time voice applications with Amazon SageMaker AI and vLLM
AWS describes how to build real-time speech-to-text applications using Amazon SageMaker AI and vLLM over a persistent streaming connection. The post contrasts this with request-response inference, which waits for full audio upload before starting transcription and adds latency for voice agents, live captioning, contact