Agent TARS is a general multimodal AI Agent stack, it brings the power of GUI Agent and Vision into your terminal, computer, browser and product.
It primarily ships with a CLI and Web UI for usage. It aims to provide a workflow that is closer to human-like task completion through cutting-edge multimodal LLMs and seamless integration with various real-world MCP tools.
π£ Just released: Agent TARS Beta - check out our announcement blog post!
agent-tars-new-flight.mp4
Booking Hotel | Generate Chart with extra MCP Servers |
---|---|
agent-tars-book-hotel.mp4 |
mcp-chart.mp4 |
Instruction: I am in Los Angeles from September 1st to September 6th, with a budget of $5,000. Please help me book a Ritz-Carlton hotel closest to the airport on booking.com and compile a transportation guide for me | Instruction: Draw me a chart of Hangzhou's weather for one month |
For more use cases, please check out #842.
Core Features
- π±οΈ One-Click Out-of-the-box CLI - Supports both headful Web UI and headless server) execution.
- π Hybrid Browser Agent - Control browsers using GUI Agent, DOM, or a hybrid strategy.
- π Event Stream - Protocol-driven Event Stream drives Context Engineering and Agent UI.
- π§° MCP Integration - The kernel is built on MCP and also supports mounting MCP Servers to connect to real-world tools.
Quick Start
# Luanch with `npx`.
npx @agent-tars/cli@latest
# Install globally, required Node.js >= 22
npm install @agent-tars/cli@latest -g
# Run with your preferred model provider
agent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-key
Visit the comprehensive Quick Start guide for detailed setup instructions.
π Resources
- π Blog Post
- π¦ Release Announcement on Twitter
- π¦ Official Twitter
- π¬ Discord Community
- π¬ ι£δΉ¦δΊ€ζ΅ηΎ€
- π Quick Start
- π» CLI Documentation
- π₯οΈ Web UI Guide
- π Workspace Documentation
- π MCP Documentation
What's Changed
See Full CHANGELOG