Azure/azure-sdk-for-python azure-ai-voicelive

1.2.0 (2026-05-22)

Features Added

Web Search & File Search: Added support for built-in web search and file search tools:
- New item types: ResponseWebSearchCallItem, ResponseFileSearchCallItem
- New server events for web/file search lifecycle (searching, in_progress, completed)
- New models: ActionFind, ActionOpenPage, ActionSearch, ActionSearchSource, FileSearchResult
- New enum values: ItemType.WEB_SEARCH_CALL, ItemType.FILE_SEARCH_CALL
- New SessionIncludeOption enum for controlling what data is included in session responses
MCP (Model Context Protocol) Support: Added comprehensive support for Model Context Protocol integration:
- MCPServer tool type for defining MCP server configurations with authorization, headers, and approval requirements
- MCPTool model for representing MCP tool definitions with input schemas and annotations
- MCPApprovalType enum for controlling approval workflows (never, always, or tool-specific)
- New item types for MCP approval and call workflows
- New server events for MCP tool listing, call lifecycle, and approval flows
Avatar Enhancements:
- Added AzureAvatarVoiceSyncVoice for avatar voice sync configuration
- Added ServerEventSessionAvatarSwitchToIdle and ServerEventSessionAvatarSwitchToSpeaking events
- Added ServerEventResponseVideoDelta for avatar video frame streaming
- Added ClientEventOutputAudioBufferClear and ServerEventOutputAudioBufferCleared for output buffer management
- Added AvatarConfigTypes enum with support for video-avatar and photo-avatar types
- Added AvatarOutputProtocol enum for avatar streaming protocols (webrtc, websocket)
- Added Scene model for controlling avatar zoom, position, rotation, and movement amplitude
- Added output_audit_audio field to AvatarConfig
OpenTelemetry Tracing Support: Added VoiceLiveInstrumentor for opt-in OpenTelemetry-based
tracing of VoiceLive WebSocket connections, following Azure SDK and GenAI semantic conventions.
- Enable via AZURE_EXPERIMENTAL_ENABLE_GENAI_TRACING=true environment variable
- Content recording controlled by OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT
- Comprehensive session-level telemetry: session ID, audio format, first-token latency,
  turn count, interruption count, audio bytes sent/received, message size
- Response & function call ID tracking for end-to-end tracing
- Agent v2 telemetry with agent identity and configuration tracking
- MCP telemetry with tool call and approval flow tracking
Agent Session Configuration: Added flattened connect() keyword arguments for configuring Azure AI Foundry agents
at connection time with agent_name, project_name, agent_version, conversation_id, and more
Transcription Improvements:
- Added TranscriptionPhrase and TranscriptionWord models for detailed transcription data
- Added ServerEventResponseAudioTranscriptAnnotationAdded event
- Added gpt-4o-transcribe-diarize and mai-transcribe-1 transcription model support
Interim Response Configuration: Added StaticInterimResponseConfig and LlmInterimResponseConfig
for generating interim responses during latency or tool calls
Image Content Support: Added RequestImageContentPart for image inputs in conversations
Reasoning Effort Control: Added reasoning_effort field with ReasoningEffort enum
Response Metadata: Added metadata field to Response and ResponseCreateParams
Server Warning Events: Added ServerEventWarning for handling non-fatal warnings
Personal Voice Models: Added DragonHDOmniLatestNeural and MAI-Voice-1 model options
Enhanced OpenAI Voices: Added marin and cedar voices to OpenAIVoiceName enum
Enhanced Azure Personal Voice: Added custom_lexicon_url, prefer_locales, locale, style,
pitch, rate, and volume properties
Pre-generated Assistant Messages: Added pre_generated_assistant_message in ResponseCreateParams
Explicit Null Values: Enhanced RequestSession to properly serialize explicitly set None values

Breaking Changes

Removed Foundry Agent Tool classes (FoundryAgentTool, ResponseFoundryAgentCallItem, etc.) —
use flattened Azure AI Foundry keyword arguments with connect() instead
Audio Format Values: Changed OutputAudioFormat enum values to use underscore format
(pcm16_8000hz, pcm16_16000hz) instead of the previous hyphenated values.
This is a breaking change for code that compares, persists, or serializes the raw enum values.
Legacy hyphenated values continue to deserialize for backward compatibility.
Renamed AvatarConfig.type field to avatar_type to avoid conflict with Python's built-in type

Other Changes

Updated default API version to 2026-04-10

Azure/azure-sdk-for-python azure-ai-voicelive_1.2.0 on GitHub

1.2.0 (2026-05-22)

Features Added

Breaking Changes

Other Changes

Azure/azure-sdk-for-python azure-ai-voicelive_1.2.0
on GitHub