🎉 First Stable Release
PageAgent is now ready for production use. The API is stable and breaking changes will follow semantic versioning.
Features
Core
- PageAgent - Main entry class with built-in UI Panel
- PageAgentCore - Headless agent class for custom UI or programmatic use
- DOM Analysis - Text-based DOM extraction with high-intensity dehydration
- LLM Support - Works with OpenAI, Claude, DeepSeek, Qwen, and other OpenAI-compatible APIs
- Tool System - Built-in tools for click, input, scroll, select, and more
- Custom Tools - Extend agent capabilities with your own tools (experimental)
- Lifecycle Hooks - Hook into agent execution (experimental)
- Instructions System - System-level and page-level instructions to guide agent behavior
- Data Masking - Transform page content before sending to LLM
Page Controller
- Element Interactions - Click, input text, select options, scroll
- Visual Mask - Blocks user interaction during automation
- DOM Tree Extraction - Efficient page structure extraction for LLM consumption
UI
- Interactive Panel - Real-time task progress and agent thinking display
- Ask User Tool - Agent can ask users for clarification
- i18n Support - English and Chinese localization
Configuration
interface PageAgentConfig {
// LLM Configuration (required)
baseURL: string
apiKey: string
model: string
temperature?: number
maxRetries?: number
customFetch?: typeof fetch
// Agent Configuration
language?: 'en-US' | 'zh-CN'
maxSteps?: number // default: 20
customTools?: Record<string, PageAgentTool> // experimental
instructions?: InstructionsConfig
transformPageContent?: (content: string) => string | Promise<string>
// Page Controller Configuration
enableMask?: boolean // default: true
viewportExpansion?: number
interactiveBlacklist?: Element[]
interactiveWhitelist?: Element[]
}Packages
| Package | Description |
|---|---|
page-agent
| Main entry with UI Panel |
@page-agent/core
| Core agent logic without UI |
@page-agent/llms
| LLM client with retry logic |
@page-agent/page-controller
| DOM operations and visual feedback |
@page-agent/ui
| Panel and i18n |
Known Limitations
- Single-page application only (cannot navigate across pages)
- No visual recognition (relies on DOM structure)
- Limited interaction support (no hover, drag-drop, canvas operations)
- See Limitations for details
Acknowledgments
This project builds upon the excellent work of browser-use. DOM processing components and prompts are adapted from browser-use (MIT License).