Support for timing / alignment data / SRT format
Hume Operations
This feature request is to provide timing data similar to what ElevenLabs provides:
* Timing information about the timestamps in the audio of when the speech reaches a certain point, at the character level.
* Alignments i.e. -- "lipsync" data that helps in knowing what the shape of the mouth would be at each timestamp if somebody were to speak this audio.
* Possible support for formats like SRT?
Richard Marmorstein
Merged in a post:
TTS timing and alignments
Richard Marmorstein
This feature request is to provide timing data similar to what ElevenLabs provides:
* Timing information about the timestamps in the audio of when the speech reaches a certain point, at the character level.
* Alignments i.e. -- "lipsync" data that helps in knowing what the shape of the mouth would be at each timestamp if somebody were to speak this audio.
Richard Marmorstein
This is related to https://hume.canny.io/feature-requests/p/tts-timing-and-alignments -- a broader feature request for the API to return timing data.
Richard Marmorstein
One thing to try: you can take Octave's Text-To-Speech, find a third party API that provides Speech-To-Text with alignment, and put Octave's output into that to get the timing data.
Not as good as a complete solution with Hume: it will add cost and latency; but for now if you really really like an Octave voice but require timing data, this is the way to have your cake and eat it too.
Sean Ray
I am a power user of this ElevenLabs API method with roboedit.app . I would love to see Hume introduce this. I've done some work on normalizing these alignments via my backend.
Photo Viewer
View photos in a modal