github trycua/cua agent-v0.4.0
cua-agent v0.4.0

latest releases: lume-v0.2.23, computer-v0.4.18, agent-v0.5.2...
4 months ago

cua-agent v0.4.0

This update refactored the Agent SDK to make it easier to implement new features and support the release of new agent models/loops.

Changelog:

  • Reworked agent loop, now all agent providers share a loop (Generate, Execute, Repeat), with the only difference between loops being the implementation of the Generate function
  • Replaced LLM clients with LiteLLM, now all agent providers support any provider supported by LiteLLM
  • Added 2 custom LiteLLM providers for local model inference on CUDA and MLX devices: huggingface-local/, mlx/
  • Reworked callback system to have hooks at every step of the lifecycle
  • Converted logging, trajectory saving, image retention into callbacks
  • Added new callbacks - PII Anonymization (still a W.I.P) & budget management
  • Anthropic providers - Added support for explicit prompt caching
  • OpenAI providers - Added support for zero data retention
  • Added Agent CLI for quick testing: python -m agent.cli <model name>

Breaking Changes

  • Initialization:
    • ComputerAgent (v0.4.x) uses model as a string (e.g. "anthropic/claude-3-5-sonnet-20241022") instead of LLM and AgentLoop objects.
    • tools is a list (can include multiple computers and decorated functions).
    • callbacks are now first-class for extensibility (image retention, budget, trajectory, logging, etc).
  • No explicit loop parameter:
    • Loop is inferred from the model string (e.g. anthropic/, openai/, omniparser+, ui-tars).
  • No explicit computer parameter:
    • Computers are added to tools list.

Install

# Before merge:
pip install --pre "cua-agent[all]==0.4.0b4"

# After merge:
pip install "cua-agent[all]"

# or install specific providers
pip install "cua-agent[openai]"        # OpenAI computer-use-preview support
pip install "cua-agent[anthropic]"     # Anthropic Claude support
pip install "cua-agent[omni]"          # Omniparser + any LLM support
pip install "cua-agent[uitars]"        # UI-TARS
pip install "cua-agent[uitars-mlx]"    # UI-TARS + MLX support
pip install "cua-agent[uitars-hf]"     # UI-TARS + Huggingface support
pip install "cua-agent[ui]"            # Gradio UI support

Supported Models

Anthropic Claude (Computer Use API)

model="anthropic/claude-3-5-sonnet-20241022"
model="anthropic/claude-3-5-sonnet-20240620"
model="anthropic/claude-opus-4-20250514"
model="anthropic/claude-sonnet-4-20250514"

OpenAI Computer Use Preview

model="openai/computer-use-preview"

UI-TARS (Local or Huggingface Inference)

model="huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B"
model="ollama_chat/0000/ui-tars-1.5-7b"

Omniparser + Any LLM

model="omniparser+ollama_chat/mistral-small3.2"
model="omniparser+vertex_ai/gemini-pro"
model="omniparser+anthropic/claude-3-5-sonnet-20241022"
model="omniparser+openai/gpt-4o"

Don't miss a new cua release

NewReleases is sending notifications on new releases.