github ArchiveBox/ArchiveBox v0.9.31-rc
v0.9.31-rc: New abx-dl runtime, split-out plugin ecosystem, resumable migrations, and isolated snapshot replay

pre-release5 hours ago

Warning

This is a pre-release for the upcoming ArchiveBox v0.9.x line. The latest stable release is still v0.7.4. Please test on a backup first and report regressions before using this for a production collection.

v0.9.x is a large architectural upgrade from v0.8.x: plugin execution has moved out into a standalone plugin ecosystem, archiving runs through the new one-shot abx-dl CLI, and the old pluggy path has been replaced by an event-driven, append-only-log architecture designed to support future browser-extension capture and p2p sync.

⬇️ RC Instructions: use the :dev Docker image with the repo docker-compose.yml
git clone https://github.com/ArchiveBox/ArchiveBox && cd ArchiveBox
ARCHIVEBOX_IMAGE=archivebox/archivebox:dev docker compose up -d

Highlights

  • 🔌 New plugin system split out into abx-plugins
    The extractor/plugin catalog now lives in github.com/ArchiveBox/abx-plugins, with per-plugin config, dependencies, hooks, docs, and install metadata.

  • New one-shot CLI powered by abx-dl
    ArchiveBox now builds on the standalone downloader at github.com/ArchiveBox/abx-dl, so plugin-based archiving can run inside ArchiveBox or independently as a focused CLI.

  • 🧱 No more pluggy; new event-driven runtime
    v0.9.x replaces the old pluggy-style in-process plugin system with an event-driven, append-only-log flow. This gives us cleaner resumability, auditability, and a path toward future browser extension capture, distributed workers, and p2p sync.

  • 🗃️ Safer data layout and migrations
    The new layout keeps user-owned snapshot/crawl data under data/archive/users/{username}/..., creates crawl records for migrated snapshots, and preserves legacy ArchiveResult data instead of rediscovering everything from slow filesystem scans.

  • 🖥️ Improved browser isolation and output serving
    Chrome-based extractors now use crawl-scoped browser sessions, cleanup cloned profiles after each crawl, and serve snapshot outputs through web.archivebox.io / snap-*.archivebox.io style isolation for safer replay.

  • 🧭 Tons of UI, CLI, REST API, and database improvements
    Faster snapshot/admin list views, better live progress visibility, improved ArchiveResult detail pages, cleaner REST API paths, and more durable Process/Crawl/Snapshot database state.

What's Changed

  • 🔌 Extractor plugins moved into abx-plugins, with generated plugin docs at archivebox.github.io/abx-plugins.
  • ⚡ Archiving execution now goes through abx-dl, which can also be used directly:
    uvx abx-dl --plugins=title,screenshot,singlefile 'https://example.com'
  • 🧾 Runtime execution now writes structured append-only JSONL records for Crawls, Processes, Snapshots, and ArchiveResults.
  • 🔄 Migrations now preserve legacy Snapshot, Tag, ArchiveResult, and filesystem output data across 0.7.x / 0.8.x0.9.x upgrades.
  • 📁 Heavy crawl/snapshot data stays under data/archive/users/... for Docker volume compatibility.
  • 🌐 Added subdomain-aware replay for public web, admin, API, and isolated snapshot hosts.
  • 🔒 Added stricter public/admin/API separation and safer default demo deployment options, including disabling public add flows by default.
  • 🧩 Browser extractors now share crawl-scoped Chrome sessions and centralize profile cleanup / lock handling through chrome_utils.js.
  • 📸 SingleFile, screenshot, DOM, readability, media, git, gallery, forum, and other extractors now run as plugin hooks with explicit outputs.
  • 🛠️ Improved archivebox add, archivebox run, archivebox update, archivebox version, archivebox status, and dependency install flows.
  • 🚀 Docker image now includes the new unified runtime stack, with Sonic search managed inside the main container.
  • 🧪 Validated against large migrated collections, including the public demo dataset, with resumable migration behavior and preserved DB/file outputs.

Helpful related projects and resources:

Full Changelog: v0.8.5-rc...v0.9.31-rc

Don't miss a new ArchiveBox release

NewReleases is sending notifications on new releases.