Richard Marmorstein
Merged in a post:
Non-verbal sounds like laughing
Rakin Ishraq
Dia has the ability to add tags like (laughs) and (coughs) to cue non-verbal sounds. The TTS doesn't necessarily have to react to tags like this directly but there should be capabilities for saying something with a bubbly mid-chuckle tone, for example. More simply, being able to laugh when the user makes a joke is, in my opinion, an essential for immersive "empathic" voices. (I know Richard Marmorstein made a similar post but didn't include any details so I wrote this one)
K
Kirk
I have a Story Builder application for parents to design customized stories for their children. The user can choose a "Read to Me" feature which uses TTS by HUME AI. One thing that happens to be very entertaining to children when reading a story to them is to incorporate onomatopoeia or some other types of vocal bursts. So the inclusion of this feature does drive engagement and retains audible interest.
Richard Marmorstein
User @kirkrock in Discord reports this would be extremely useful for their TTS application that reads children's stories.
Rakin Ishraq
Being able to laugh when the user makes a joke is, in my opinion, an essential for immersive "empathic" voices. As Francisco referenced, Dia has this capability and its certainly imperfect at times but it'd be great if this was considered for Hume.
F
Francisco Castillo
Yeah, what Dia (https://fal.ai/models/fal-ai/dia-tts) does with its non verbal sounds is awesome!
Generate non-verbal like (laughs), (coughs), (clears throat), (sighs), (gasps), (singing), (sings), (mumbles), (beep), (groans), (sniffs), (claps), (screams), (inhales), (exhales), (applause), (burps), (humming), (sneezes), (chuckle), (whistles)