Word boundaries in Text to Speech transcriptions
complete
Jim Page
Please add word boundary timing information (in milliseconds) to
assistant_message
and/or audio_output
. This is to support real time video avatar animation. Additional nice-to-have: in addition to word boundary timing information, supply oculus visemes and their timings, as supplied by Azure TTS.
Hume EVI2 voices with word boundary timing plus the existing prosody output would put Hume a LONG way ahead of the opposition in this space.
Rob Hughes
complete
Timestamps are now available at the phoneme and syllable level! https://dev.hume.ai/docs/text-to-speech-tts/timestamps