alibaba/spring-ai-alibaba v1.1.2.2 on GitHub

AgentScope Integration

AgentScope Java is an agent-oriented programming framework for building LLM-powered applications.

AgentScope – AgentScopeAgent wraps AgentScope ReActAgent as a BaseAgent for use in graph workflows.

# Add the following dependency to ochestrate AgentScope using SAA Graph
<dependency>
  <groupId>com.alibaba.cloud.ai</groupId>
  <artifactId>spring-ai-alibaba-starter-agentscope</artifactId>
  <version>1.1.2.2</version>
</dependency>

Multiagent Patterns

Subagent – Main orchestrator delegates tasks to specialized sub-agents (codebase explorer, web researcher, etc.) via Task/TaskOutput tools; supports both Markdown and API-defined sub-agents.
Supervisor – Central supervisor agent wraps calendar and email agents as tools (AgentTool), invokes them on demand, and synthesizes results.
Skills – Single agent uses read_skill to load skill content on demand; system prompt shows only skill descriptions for progressive disclosure and smaller context.
Routing
- Routing (simple) – LlmRoutingAgent classifies the user query, invokes specialist agents (GitHub, Notion, Slack) in parallel, then synthesizes a single answer.
- Routing (graph) – LlmRoutingAgent as a StateGraph node with preprocess/postprocess and an internal merge node for routing and result synthesis.
Handoffs
- Handoffs (single-agent) – One ReactAgent advances steps via state (e.g. current_step); a ModelInterceptor injects step-specific system prompt and tools per turn.
- Handoffs (multi-agent) – Sales and support agents as graph nodes; handoff tools update active_agent and conditional edges route between agents.
Workflow – Custom workflow examples: RAG (rewrite → retrieve → prepare → agent) and SQL agent (list_tables → get_schema → run_query) as graph-based flows.

Multimodal & Voice Agent

Voice Agent – Sandwich architecture (STT → ReactAgent → TTS): WebSocket-based real-time voice with DashScope ASR and CosyVoice TTS, plus text input; agent uses sandwich-order tools and streams events (stt_chunk, agent_chunk, tts_chunk).
Multimodal – Vision (image in/out) and TTS: DashScope vision models for image understanding and ReactAgent with media input; image generation via tools (Wanx), TTS via DashScopeAudioSpeechModel; ToolMultimodalResult for structured tool responses (url/base64).