Models · MarkTechPost · 24 May 2026

StepFun Releases StepAudio 2.5 Realtime: An End-to-End Voice Model with Roleplay-Specific RLHF and Paralinguistic Comprehension

StepFun released StepAudio 2.5 Realtime, an end-to-end real-time speech model with customizable persona features. It supports Chinese and English, connects through a WebSocket API, and ranked first across five benchmark dimensions in April testing.

Read the full story at MarkTechPost →