remix logo

Hacker Remix

Spark-TTS: Text-2-Speech Model Single-Stream Decoupled Tokens [pdf]

66 points by bilekas 3 days ago | 3 comments

mike978 19 hours ago

smusamashah 9 hours ago

The voices with Chinese origin when generated as English samples do sound like a Chinese person speaking English. It is very interesting.

vessenes 12 hours ago

This is really quite good at sounding like Donald, especially for the first half of the audio. I’ll probably play around with this for a bit; it’s. It clear to me how much variation you can get in voice in latent space. Anyway it looks to be a very high quality (at least) short form tts engine with open weights so thanks team!

fdafds 15 hours ago

[flagged]