VibeVoice-Large

Create 90 minutes long posts with up to 4 distinct speakers.

Create 90 minutes long posts with up to 4 distinct speakers.

VibeVoice is a novel framework designed for generating expressive, long-form, multi-speaker conversational audio, such as podcasts, from text. It addresses significant challenges in traditional Text-to-Speech (TTS) systems, particularly in scalability, speaker consistency, and natural turn-taking.

[View Product]