github Azure/azure-sdk-for-java com.azure+azure-ai-voicelive_1.0.0

3 hours ago

1.0.0 (2026-06-01)

This is the first General Availability (GA) release of the Azure VoiceLive client library for Java.

Breaking Changes

  • Narrowed VoiceLiveAsyncClient session startup to three overloads:
    • startSession()
    • startSession(String, VoiceLiveRequestOptions)
    • startSession(AgentSessionConfig, VoiceLiveRequestOptions)
  • Renamed token-count accessors on token statistic models (JSON wire format unchanged):
    • CachedTokenDetails.getTextTokens() / getAudioTokens() / getImageTokens()getTextTokenCount() / getAudioTokenCount() / getImageTokenCount()
    • InputTokenDetails.getCachedTokens() / getTextTokens() / getAudioTokens() / getImageTokens()getCachedTokenCount() / getTextTokenCount() / getAudioTokenCount() / getImageTokenCount()
    • OutputTokenDetails.getTextTokens() / getAudioTokens() / getReasoningTokens()getTextTokenCount() / getAudioTokenCount() / getReasoningTokenCount()
    • ResponseTokenStatistics.getTotalTokens() / getInputTokens() / getOutputTokens()getTotalTokenCount() / getInputTokenCount() / getOutputTokenCount()
  • RequestImageContentPart URL accessor renamed and JSON field changed:
    • getUrl() / setUrl(String)getImageUrl() / setImageUrl(String)
    • JSON property urlimage_url
  • Renamed base event types for client↔server symmetry:
    • ClientEvent (base for outbound events) → SessionClientEvent
    • SessionUpdate (base for inbound events) → SessionServerEvent
    • VoiceLiveSessionAsyncClient.receiveEvents() now returns Flux<SessionServerEvent>
    • VoiceLiveSessionAsyncClient.sendEvent(...) now accepts SessionClientEvent
  • Renamed MCP-related model types to Pascal case (MCP*Mcp*): McpApprovalType, McpServer, McpTool, McpApprovalResponseRequestItem, ResponseMcpApprovalRequestItem, ResponseMcpApprovalResponseItem, ResponseMcpCallItem, ResponseMcpListToolItem.
  • VoiceLiveSessionAsyncClient.truncateConversation(String, int, int) now accepts a java.time.Duration for the audio-end position instead of raw milliseconds. The two-argument overload (itemId, contentIndex) is preserved and defaults to Duration.ZERO.
  • Removed sendInputAudio(byte[]); use sendInputAudio(BinaryData) (wrap raw bytes with BinaryData.fromBytes(...)).
  • AgentSessionConfig.toQueryParameters() is no longer part of the public API; the conversion is handled internally by VoiceLiveAsyncClient.
  • VoiceLiveSessionOptions.setAnimation(...) renamed to setAnimationOptions(...).
  • AnimationOptions.setOutputs(...) / getOutputs() renamed to setOutputTypes(...) / getOutputTypes().
  • LogProbProperties.getLogprob() renamed to getLogProb().
  • SessionUpdateConversationItemInputAudioTranscriptionCompleted.getLogprobs() renamed to getLogProbs().
  • Removed preview service versions from VoiceLiveServiceVersion; only GA versions remain (V2025_10_01, V2026_04_10). The latest version is now V2026_04_10.

Features Added

  • Avatar voice synchronization for video avatars:
    • New AzureVoiceType.AVATAR_VOICE_SYNC and AzureAvatarVoiceSyncVoice class
    • New server events ServerEventSessionAvatarSwitchToSpeaking / ServerEventSessionAvatarSwitchToIdle
    • New ServerEventResponseVideoDelta for streaming avatar video frames
    • New ClientEventOutputAudioBufferClear (output_audio_buffer.clear) and ServerEventOutputAudioBufferCleared (output_audio_buffer.cleared) for clearing the avatar output audio buffer
  • Web search and file search tool calls:
    • New ItemType.WEB_SEARCH_CALL, ItemType.FILE_SEARCH_CALL
    • New ResponseWebSearchCallItem (with ResponseWebSearchCallItemStatus) and ResponseFileSearchCallItem (with ResponseFileSearchCallItemStatus, plus FileSearchResult results)
    • New lifecycle server events: ServerEventResponseWebSearchCall{Searching,InProgress,Completed} and ServerEventResponseFileSearchCall{Searching,InProgress,Completed}
  • Transcription enhancements:
    • New transcription models on AudioInputTranscriptionOptionsModel: GPT_4O_TRANSCRIBE_DIARIZE, MAI_TRANSCRIBE_1
    • New TranscriptionPhrase and TranscriptionWord types with timing/confidence information
    • SessionUpdateConversationItemInputAudioTranscriptionCompleted now exposes getLogProbs() and getPhrases()
    • New ServerEventResponseAudioTranscriptAnnotationAdded event
  • Session include options and metadata:
    • New SessionIncludeOption expandable enum for opting into additional response payloads (e.g. logprobs, phrases, file-search results)
    • VoiceLiveSessionOptions and VoiceLiveSessionResponse now expose include (List<SessionIncludeOption>) and metadata (Map<String,String>, up to 16 entries)
  • Personal voice models: added PersonalVoiceModels.DRAGON_HDOMNI_LATEST_NEURAL and MAI_VOICE_1
  • Reasoning token usage: OutputTokenDetails.getReasoningTokenCount() exposes reasoning token counts
  • Interim response on response.create: ResponseCreateParams.setInterimResponse(BinaryData) lets callers attach interim response config to a single response request
  • Restored no-arg VoiceLiveAsyncClient.startSession() overload (uses the deployment's default model).
  • Significantly improved Javadoc for ServerVadTurnDetection, AzureCustomVoice, AzurePersonalVoice, AzureStandardVoice, AzureSemanticVadTurnDetection*, and other model types

Other Changes

  • Updated default service API version to 2026-04-10 (GA).

Don't miss a new azure-sdk-for-java release

NewReleases is sending notifications on new releases.