Add support for vocal bursts and non-verbal output | Voters

Add support for vocal bursts and non-verbal output

Richard Marmorstein

April 18, 2025

Richard Marmorstein

Merged in a post:

Non-verbal sounds like laughing

Rakin Ishraq

Dia has the ability to add tags like (laughs) and (coughs) to cue non-verbal sounds. The TTS doesn't necessarily have to react to tags like this directly but there should be capabilities for saying something with a bubbly mid-chuckle tone, for example. More simply, being able to laugh when the user makes a joke is, in my opinion, an essential for immersive "empathic" voices. (I know Richard Marmorstein made a similar post but didn't include any details so I wrote this one)

June 20, 2025

Kirk

I have a Story Builder application for parents to design customized stories for their children. The user can choose a "Read to Me" feature which uses TTS by HUME AI. One thing that happens to be very entertaining to children when reading a story to them is to incorporate onomatopoeia or some other types of vocal bursts. So the inclusion of this feature does drive engagement and retains audible interest.

Richard Marmorstein

User @kirkrock in Discord reports this would be extremely useful for their TTS application that reads children's stories.

Rakin Ishraq

Being able to laugh when the user makes a joke is, in my opinion, an essential for immersive "empathic" voices. As Francisco referenced, Dia has this capability and its certainly imperfect at times but it'd be great if this was considered for Hume.

Francisco Castillo

Yeah, what Dia (https://fal.ai/models/fal-ai/dia-tts) does with its non verbal sounds is awesome!
Generate non-verbal like (laughs), (coughs), (clears throat), (sighs), (gasps), (singing), (sings), (mumbles), (beep), (groans), (sniffs), (claps), (screams), (inhales), (exhales), (applause), (burps), (humming), (sneezes), (chuckle), (whistles)
Source: https://github.com/nari-labs/dia/blob/main/README.md