Support streaming responses from CLMs
Richard Marmorstein
Currently, when you use a CLM, EVI doesn't start speaking until all the text for its turn has been generated. There is no way for the CLM to start generating some text, have EVI start speaking, and then decide to generate more.
This is a feature request to support "streaming" so EVI can start speaking sooner.
Vikhyat
My expectation was that a streaming response would reduce latency for my users, but that was not the case.
This is especially important for use cases where a user asks for detailed information about a topic or service. Making them wait until the CLM fully generates the output before starting the audio response is not suitable.
I would like the audio response to start as soon as a couple of chunks are streamed out by my model.
This would help showcase Hume's low-latency capability as well. Currently, even if the audio is generated in 700ms, the lack of streaming support makes it seem like there's a 2-second delay on Hume’s side.