github craft-ai-agents/craft-agents-oss v0.10.0

4 hours ago

v0.10.0 — Remote browser_tool bridging, per-workspace browser tab isolation, and #824 basic-auth fix

Features

  • Remote browser_tool bridged into the user's local Electron browser — Agents running on a remote workspace (headless server, docker, WebUI) can now drive the user's local desktop BrowserPaneManager end-to-end. Adds a client:browser:invoke WS capability (advertised on handshake, server invokes client, plain Error with .code preserved through both directions), mirroring how shell.openExternal already works for OPEN_URL. Transport gains hasClientCapability / findClientsWithCapability for routing; Electron gets a new __browser:invoke IPC dispatcher with per-method owner-key authorization, no-manual-window-reuse for remote callers, session-scoped listInstances, and screenshot BufferUint8Array conversion across the wire. server-core gets a RemoteBrowserPaneManager (session-bound IBPM impl) and SessionManager.getBrowserPaneManagerForSession with capability-aware host-client fallback and per-session pin cleanup on disconnect. uploadFile is blocked over the bridge; evaluate is gated by a local allowRemoteEvaluate setting. The Pi runtime learns friendly error mappings for BROWSER_NO_CAPABLE_CLIENT, CAPABILITY_UNAVAILABLE, CLIENT_DISCONNECTED, CLIENT_REQUEST_TIMEOUT, BROWSER_INSTANCE_NOT_OWNED, BROWSER_REMOTE_UPLOAD_NOT_SUPPORTED, and BROWSER_REMOTE_EVALUATE_BLOCKED, and now mirrors Claude's getBrowserToolEnabled gate so Pi no longer advertises browser_tool when the toggle is off. 27 new tests cover wire packaging, per-method authz, capability introspection, host-client fallback, screenshot round-trip, error-code preservation, and the Pi error-mapping contract. (1d926c33)

  • Browser tabs isolated per workspaceBrowserPaneManager is process-global and STATE_CHANGED used to broadcast { to: 'all' }, so a chat in workspace A saw browser tabs and status banners owned by sessions in workspace B. Every BrowserInstance (and the BrowserInstanceInfo DTO) now carries a nullable workspaceId; STATE_CHANGED routes to { to: 'workspace', workspaceId } when set (falling back to { to: 'all' } for unbound manual windows); the browserPane.LIST handler filters by ctx.workspaceId; and the renderer reads a new browserInstancesForWorkspaceAtomFamily keyed by activeWorkspaceId. Windows still run in parallel as real BrowserWindows — this is a UI visibility filter, not a sandbox. REMOVED / INTERACTED stay broadcast-to-all (id-only payloads, harmless no-op on workspaces that never saw the entry). workspaceId ships optional on the DTO, so old renderers tolerate missing values (treats undefined as null → passes the filter, equivalent to today's behavior). 17 new tests across atom filter, BPM stamping, and broadcast/LIST routing. (af817192)

Improvements

  • markdown-preview block documented in the rich-output reference — The markdown-preview block (shipped in v0.9.6) is now covered in apps/online-docs/go-further/rich-output.mdx alongside the existing html-preview / pdf-preview / image-preview entries, so users discovering the block in-chat can find usage examples and the src / items field reference in the docs. (2d9693b1, 70c2955f)

  • Browser-bridge wiring is observable from server logs — Three sessionLog.info lines in SessionManager now confirm whether setRpcServer ran at bootstrap and whether the browser-pane-forwarding block executed at agent init. Makes it possible to diagnose remote workspaces that still hit "Browser window controls are not available" from server logs alone, without attaching a debugger. (ad26e61d)

Bug Fixes

  • Renderer accepts both local AND remote workspace ids when filtering tabs — When connected to a remote workspace, a renderer has two relevant workspace ids: activeWorkspaceId (the LOCAL Craft Agents window's identity, used for locally-opened manual tabs) and activeWorkspace.remoteServer.remoteWorkspaceId (the REMOTE server's id, used by the remote agent when it stamps tabs through the WS bridge). The first iteration of the workspace-isolation filter only matched activeWorkspaceId, so tabs stamped with the remote id (which is what every remote agent-opened browser carries) got filtered out — the TopBar tab strip and the toolbar status badge became invisible for remote browsers, and opening one then hiding it left it inaccessible. Replaced the atomFamily with a plain helper filterInstancesForWorkspace(local, remote): tabs match if either id matches (or if workspaceId is null/undefined for back-compat). (bf8429fa)

  • STATE_CHANGED broadcasts to all clients again; the visibility filter lives in the renderer — The Phase-4 server-side workspace filter (on both STATE_CHANGED routing and the LIST handler) was wrong for remote-mirror workspaces: a renderer's transport-level workspaceId is the LOCAL window's identity, while remote-bridged browser tabs carry the REMOTE server's workspace id. The two never match, so STATE_CHANGED targeted at the remote id got dropped by the WS routing layer (no local renderer reports itself as being in the remote workspace) and LIST returned empty for the same reason. Workspace isolation now lives entirely in the renderer (filterInstancesForWorkspace), which knows both ids via activeWorkspace.remoteServer.remoteWorkspaceId. The handler reverts to { to: 'all' } broadcasts and a full LIST response. Privacy is unchanged — every locally-connected renderer belongs to the same user — and remote tabs finally show up in the TopBar of the workspace that owns them. (f831bb42)

  • No window reuse on the remote-bridge lifecycle path (closes the cross-workspace hijack) — The capability dispatcher correctly set allowReuseManual=false for createForSession but flowed getOrCreateForSession and focusBoundForSession through the public helpers, which default allowReuseManual=true. The remote agent's browser_tool open maps to focusBoundForSession, so it could adopt an unbound window left behind by a local session — exactly the cross-workspace hijack the workspaceId filter was meant to block. The workspaceId filter still helps but it's best-effort: windows created before workspace stamping (or via paths that never set workspaceId) have workspaceId=null and remain universally adoptable. Belt-and-brace fix: every remote lifecycle call now passes allowReuseManual=false, so remote sessions always create fresh windows unless they already own one. (ce3340a1)

  • Unbound-window reuse scoped to the owning workspace (no more "tab moved from workspace A to B") — When a session's turn ends, unbindAllForSession() clears boundSessionId and flips ownerType to 'manual' so the next turn of the same session can re-bind the window; the workspaceId stamped at creation is preserved. findReusableUnboundInstance() was matching ANY unbound 'manual' window regardless of workspace, so a session in workspace B would happily pick up the leftover window from workspace A — bindSession() would then overwrite workspaceId to B, effectively "moving" the window from A to B and making workspace A's tab strip lose the entry while B's gained it. Reuse is now allowed only when the candidate's workspaceId is null (truly user-opened manual window — adoptable by anyone) or matches the caller's workspaceId. Same-workspace next-turn reuse still works. (ceb24603)

  • TopBar-opened manual windows inherit the workspace they were opened in — The browserPane.CREATE handler created manual windows with workspaceId defaulting to null, which the workspace-isolation filter intentionally treats as "visible to all workspaces" — so a TopBar-opened tab leaked into every workspace's tab strip. The renderer that fires CREATE always has ctx.workspaceId set, so the handler now passes it through to createInstance / createForSession. CLI / agent-harness callers with no workspace context (ctx.workspaceId === null) still get the broadcast-to-all behavior as a safe fallback. (7dfcaeac)

  • BrowserInstance projected to a plain snapshot before IPC return — The local BrowserPaneManager.getInstance(id) returns the live BrowserInstance which embeds Electron native references (window: BrowserWindow, pageView: BrowserView, ...). When the __browser:invoke dispatcher returned that object over IPC, Electron's structured-clone serializer threw An object could not be cloned and the remote agent's logging-side getInstanceAsync call failed. Added toSnapshot(instance) that emits only the IBPM-declared fields (ownerType, ownerSessionId, isVisible, title, currentUrl) and routed the dispatcher's getInstance branch through it. (8e2534b5)

  • source_test base64-encodes basic-auth credentialstestApiConnectionWithAuth was interpolating the raw vault value into the Authorization header, so basic-auth sources got Basic {"username":"...","password":"..."} and 401'd against every provider. The vault stores source_basic credentials as JSON (written by source_credential_prompt / the WebUI); the runtime path in api-tools.ts buildHeaders already parses and base64-encodes — the validator path was just left out. Now parses the token as JSON when it has username+password; falls through to pass-through behavior for legacy / hand-edited base64 entries and any non-JSON string, mirroring buildHeaders(). Three regression tests cover the JSON form, the legacy already-encoded form, and a non-JSON garbage token. Fixes #824. (96dd7c0d)

Breaking Changes

  • None. The workspaceId field on the BrowserInstanceInfo DTO is optional, so older renderers and older agents tolerate missing values (treats undefined as null → passes the visibility filter, equivalent to pre-0.10.0 behavior).

Don't miss a new craft-agents-oss release

NewReleases is sending notifications on new releases.