ArchiveBox/ArchiveBox v0.9.31-rc on GitHub

Warning

This is a pre-release for the upcoming ArchiveBox v0.9.x line. The latest stable release is still v0.7.4. Please test on a backup first and report regressions before using this for a production collection.

v0.9.x is a large architectural upgrade from v0.8.x: plugin execution has moved out into a standalone plugin ecosystem, archiving runs through the new one-shot abx-dl CLI, and the old pluggy path has been replaced by an event-driven, append-only-log architecture designed to support future browser-extension capture and p2p sync.

⬇️ RC Instructions: use the :dev Docker image with the repo docker-compose.yml

git clone https://github.com/ArchiveBox/ArchiveBox && cd ArchiveBox
ARCHIVEBOX_IMAGE=archivebox/archivebox:dev docker compose up -d

Highlights

🔌 New plugin system split out into abx-plugins
The extractor/plugin catalog now lives in github.com/ArchiveBox/abx-plugins, with per-plugin config, dependencies, hooks, docs, and install metadata.
⚡ New one-shot CLI powered by abx-dl
ArchiveBox now builds on the standalone downloader at github.com/ArchiveBox/abx-dl, so plugin-based archiving can run inside ArchiveBox or independently as a focused CLI.
🧱 No more pluggy; new event-driven runtime
v0.9.x replaces the old pluggy-style in-process plugin system with an event-driven, append-only-log flow. This gives us cleaner resumability, auditability, and a path toward future browser extension capture, distributed workers, and p2p sync.
🗃️ Safer data layout and migrations
The new layout keeps user-owned snapshot/crawl data under data/archive/users/{username}/..., creates crawl records for migrated snapshots, and preserves legacy ArchiveResult data instead of rediscovering everything from slow filesystem scans.
🖥️ Improved browser isolation and output serving
Chrome-based extractors now use crawl-scoped browser sessions, cleanup cloned profiles after each crawl, and serve snapshot outputs through web.archivebox.io / snap-*.archivebox.io style isolation for safer replay.
🧭 Tons of UI, CLI, REST API, and database improvements
Faster snapshot/admin list views, better live progress visibility, improved ArchiveResult detail pages, cleaner REST API paths, and more durable Process/Crawl/Snapshot database state.

What's Changed

🔌 Extractor plugins moved into abx-plugins, with generated plugin docs at archivebox.github.io/abx-plugins.
⚡ Archiving execution now goes through abx-dl, which can also be used directly:
```
uvx abx-dl --plugins=title,screenshot,singlefile 'https://example.com'
```
🧾 Runtime execution now writes structured append-only JSONL records for Crawls, Processes, Snapshots, and ArchiveResults.
🔄 Migrations now preserve legacy Snapshot, Tag, ArchiveResult, and filesystem output data across 0.7.x / 0.8.x → 0.9.x upgrades.
📁 Heavy crawl/snapshot data stays under data/archive/users/... for Docker volume compatibility.
🌐 Added subdomain-aware replay for public web, admin, API, and isolated snapshot hosts.
🔒 Added stricter public/admin/API separation and safer default demo deployment options, including disabling public add flows by default.
🧩 Browser extractors now share crawl-scoped Chrome sessions and centralize profile cleanup / lock handling through chrome_utils.js.
📸 SingleFile, screenshot, DOM, readability, media, git, gallery, forum, and other extractors now run as plugin hooks with explicit outputs.
🛠️ Improved archivebox add, archivebox run, archivebox update, archivebox version, archivebox status, and dependency install flows.
🚀 Docker image now includes the new unified runtime stack, with Sonic search managed inside the main container.
🧪 Validated against large migrated collections, including the public demo dataset, with resumable migration behavior and preserved DB/file outputs.

Helpful related projects and resources:

SingleFile for self-contained HTML captures
yt-dlp for media extraction
gallery-dl for gallery/media sites
Django Ninja for the REST API
Sonic for lightweight search indexing

Full Changelog: v0.8.5-rc...v0.9.31-rc

ArchiveBox/ArchiveBox v0.9.31-rc v0.9.31-rc: New abx-dl runtime, split-out plugin ecosystem, resumable migrations, and isolated snapshot replay on GitHub

Highlights

What's Changed

ArchiveBox/ArchiveBox v0.9.31-rc
v0.9.31-rc: New abx-dl runtime, split-out plugin ecosystem, resumable migrations, and isolated snapshot replay

on GitHub