cua-agent v0.4.0
This update refactored the Agent SDK to make it easier to implement new features and support the release of new agent models/loops.
Changelog:
- Reworked agent loop, now all agent providers share a loop (Generate, Execute, Repeat), with the only difference between loops being the implementation of the Generate function
- Replaced LLM clients with LiteLLM, now all agent providers support any provider supported by LiteLLM
- Added 2 custom LiteLLM providers for local model inference on CUDA and MLX devices:
huggingface-local/,mlx/ - Reworked callback system to have hooks at every step of the lifecycle
- Converted logging, trajectory saving, image retention into callbacks
- Added new callbacks - PII Anonymization (still a W.I.P) & budget management
- Anthropic providers - Added support for explicit prompt caching
- OpenAI providers - Added support for zero data retention
- Added Agent CLI for quick testing:
python -m agent.cli <model name>
Breaking Changes
- Initialization:
ComputerAgent(v0.4.x) usesmodelas a string (e.g. "anthropic/claude-3-5-sonnet-20241022") instead ofLLMandAgentLoopobjects.toolsis a list (can include multiple computers and decorated functions).callbacksare now first-class for extensibility (image retention, budget, trajectory, logging, etc).
- No explicit
loopparameter:- Loop is inferred from the
modelstring (e.g.anthropic/,openai/,omniparser+,ui-tars).
- Loop is inferred from the
- No explicit
computerparameter:- Computers are added to
toolslist.
- Computers are added to
Install
# Before merge:
pip install --pre "cua-agent[all]==0.4.0b4"
# After merge:
pip install "cua-agent[all]"
# or install specific providers
pip install "cua-agent[openai]" # OpenAI computer-use-preview support
pip install "cua-agent[anthropic]" # Anthropic Claude support
pip install "cua-agent[omni]" # Omniparser + any LLM support
pip install "cua-agent[uitars]" # UI-TARS
pip install "cua-agent[uitars-mlx]" # UI-TARS + MLX support
pip install "cua-agent[uitars-hf]" # UI-TARS + Huggingface support
pip install "cua-agent[ui]" # Gradio UI supportSupported Models
Anthropic Claude (Computer Use API)
model="anthropic/claude-3-5-sonnet-20241022"
model="anthropic/claude-3-5-sonnet-20240620"
model="anthropic/claude-opus-4-20250514"
model="anthropic/claude-sonnet-4-20250514"OpenAI Computer Use Preview
model="openai/computer-use-preview"UI-TARS (Local or Huggingface Inference)
model="huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B"
model="ollama_chat/0000/ui-tars-1.5-7b"Omniparser + Any LLM
model="omniparser+ollama_chat/mistral-small3.2"
model="omniparser+vertex_ai/gemini-pro"
model="omniparser+anthropic/claude-3-5-sonnet-20241022"
model="omniparser+openai/gpt-4o"