🎉 First Stable Release

PageAgent is now ready for production use. The API is stable and breaking changes will follow semantic versioning.

Features

Core

PageAgent - Main entry class with built-in UI Panel
PageAgentCore - Headless agent class for custom UI or programmatic use
DOM Analysis - Text-based DOM extraction with high-intensity dehydration
LLM Support - Works with OpenAI, Claude, DeepSeek, Qwen, and other OpenAI-compatible APIs
Tool System - Built-in tools for click, input, scroll, select, and more
Custom Tools - Extend agent capabilities with your own tools (experimental)
Lifecycle Hooks - Hook into agent execution (experimental)
Instructions System - System-level and page-level instructions to guide agent behavior
Data Masking - Transform page content before sending to LLM

Page Controller

Element Interactions - Click, input text, select options, scroll
Visual Mask - Blocks user interaction during automation
DOM Tree Extraction - Efficient page structure extraction for LLM consumption

UI

Interactive Panel - Real-time task progress and agent thinking display
Ask User Tool - Agent can ask users for clarification
i18n Support - English and Chinese localization

Configuration

interface PageAgentConfig {
    // LLM Configuration (required)
    baseURL: string
    apiKey: string
    model: string
    temperature?: number
    maxRetries?: number
    customFetch?: typeof fetch

    // Agent Configuration
    language?: 'en-US' | 'zh-CN'
    maxSteps?: number // default: 20
    customTools?: Record<string, PageAgentTool> // experimental
    instructions?: InstructionsConfig
    transformPageContent?: (content: string) => string | Promise<string>

    // Page Controller Configuration
    enableMask?: boolean // default: true
    viewportExpansion?: number
    interactiveBlacklist?: Element[]
    interactiveWhitelist?: Element[]
}

Packages

Package	Description
`page-agent`	Main entry with UI Panel
`@page-agent/core`	Core agent logic without UI
`@page-agent/llms`	LLM client with retry logic
`@page-agent/page-controller`	DOM operations and visual feedback
`@page-agent/ui`	Panel and i18n

Known Limitations

Single-page application only (cannot navigate across pages)
No visual recognition (relies on DOM structure)
Limited interaction support (no hover, drag-drop, canvas operations)
See Limitations for details

Acknowledgments

This project builds upon the excellent work of browser-use. DOM processing components and prompts are adapted from browser-use (MIT License).

alibaba/page-agent v1.0.0 🌟 V1 on GitHub