Added
-
Added
google.generativeai
model support, including vision. This newgoogle
service defaults to usinggemini-1.5-flash-latest
. Example inexamples/foundational/12a-describe-video-gemini-flash.py
. -
Added vision support to
openai
service. Example inexamples/foundational/12a-describe-video-gemini-flash.py
. -
Added initial interruptions support. The assistant contexts (or aggregators) should now be placed after the output transport. This way, only the completed spoken context is added to the assistant context.
-
Added
VADParams
so you can control voice confidence level and others. -
VADAnalyzer
now uses an exponential smoothed volume to improve speech detection. This is useful when voice confidence is high (because there's someone talking near you) but volume is low.
Fixed
-
Fixed an issue where TTSService was not pushing TextFrames downstream.
-
Fixed issues with Ctrl-C program termination.
-
Fixed an issue that was causing
StopTaskFrame
to actually not exit thePipelineTask
.