Release Notes

This release introduces pre-trained dictionary compression - Train a dictionary from representative samples and ship dramatically smaller archives for small-block / many-small-file workloads - together with a shared dictionary Huffman table, a static context API for zero-allocation caller-managed workspaces, and a custom reader interface for seekable archives. The container format advances to v6 and the shared library SOVERSION bumps from 3 to 4.

⚠️ Breaking change. This is an ABI and format break. The format version is now v6 (the legacy NUM block type is removed), so v0.12.0 cannot read archives written by v0.11.0 and earlier, and older decoders reject v6 archives. Re-link against libzxc.so.4.
Migrating v5 → v6 archives is a one-time transcode, decompress with your old (v5) build and recompress with the new one:
zxc-v5 -dc old.zxc | zxc-v6 -z -c > new.zxc
Keep a v5 build around until your data at rest is converted (it is the only thing that can read v5). Full guide: bulk migration, dictionaries, verification: docs/MIGRATION.md.

New Features

Pre-Trained Dictionary Compression

For workloads compressed in small blocks (4 KB–128 KB), a pre-trained dictionary prefills the LZ77 sliding window at the start of every block, so even the earliest bytes have representative history to match against. This helps any time the block size is small enough that early bytes would otherwise lack history: a single small payload, or a large payload split into many small blocks.

Train from a corpus, serialize to a self-describing .zxd, and compress/decompress against it. Dictionaries are capped at 64 KB (ZXC_DICT_SIZE_MAX).
New public API in zxc_dict.h: zxc_train_dict / zxc_train_dict_huf (train), zxc_dict_save / zxc_dict_save_bound (serialize), zxc_dict_train (one-shot train + serialize), zxc_dict_load / zxc_dict_huf (load), and zxc_dict_id / zxc_dict_get_id (identity).
Each archive records its 32-bit dict_id; supplying the wrong dictionary (or none) fails cleanly with a dictionary-required / mismatch error instead of silently corrupting output. (#261)

Shared Dictionary Huffman Table

The .zxd carries a 128-byte packed literal Huffman code-lengths table trained on the same corpus. Small blocks no longer each pay to embed their own literal table — the decoder reuses the dictionary's shared table, which is where most of the small-block ratio gain comes from. The dict_id covers both the content and the table, so a single id pins the complete decode state. (#275)

Static Context API (Caller-Managed Workspaces)

New zero-allocation path for embedded and tightly-budgeted integrators: query the required workspace size up front, then hand libzxc a buffer you own.

zxc_static_cctx_workspace_size / zxc_static_dctx_workspace_size and zxc_init_static_cctx / zxc_init_static_dctx initialise compression/decompression contexts in caller-provided memory: no malloc on the hot path. (#242)

Custom Reader Interface for Seekable Archives

zxc_seekable_open_reader lets seekable random-access decompression run over an arbitrary user-supplied reader callback (network, memory map, custom VFS), not just FILE*. zxc_seekable_set_dict wires dictionary support into the seekable path. (#240)

Dictionary API in Every Wrapper

Full dictionary training, serialization, and dictionary-backed compress/decompress exposed idiomatically across Go, Rust, Python, Node.js, and WASM, backed by an overhauled, hardened dictionary fuzzer. (#269, #270)
The seekable random-access decompression API is now exposed across Go, Rust, Node.js, and WASM via the umbrella header, and added to the Python wrapper. (#224, #250)

CLI: Dictionaries and the `unzxc` Alias

--train trains a dictionary from the input files (output path via -o, defaulting to dictionary_<dict_id>.zxd). This renames the former --train-dict PATH.
-D, --dict FILE compresses or decompresses against a .zxd; -l, --list reports an archive's Dict ID and inspects .zxd files.
unzxc is a new decompression alias: installed as a symlink to zxc, it defaults to decompress mode (equivalent to zxc -d). (#272)
Native wildcard expansion on Windows: the CLI expands glob patterns itself on Windows (where the shell doesn't), so zxc *.log behaves as on POSIX. (#284)

Performance & Memory

Measured versus v0.11.0:

~10 % faster compression at level 6, from the repeat-offset seed feeding the optimal parser (see below).
~3 % faster decompression at levels 1 and 2.

Repeat-offset seed at L6: the optimal parser seeds match finding with the current repeat offset, accelerating LZ77 search on repetitive data at maximum density. (#257)
Bucket-sort Huffman leaf ordering: replaces comparison sorting of literal frequencies with a bucket sort during table construction. (#244)
Dedicated SSE2 SIMD path for x86-64: a sub-AVX2 tier so older / baseline x86-64 CPUs get vectorised decode and match finding instead of falling back to scalar. (#259)
Short-offset LZ run decoding: faster run/overlap copies for small match offsets, improving decode on highly repetitive data (neutral on Silesia). (#276)

Bug Fixes & Robustness

Harden decompressor buffer bounds checks against malformed input. (#229)
Decompressor output-buffer tail padding adjusted to match the fast decoder's overwrite contract. (#249)
Empty-data compression/decompression handled correctly end-to-end across the wrappers. (#265)
Enhanced format validation with stricter rejection of malformed frames. (#271)
Snyk findings addressed and general robustness improvements. (#273)
Gate AVX2/AVX-512 detection on OS-enabled vector state: feature detection now also checks OSXSAVE/XCR0, so the AVX2/AVX-512 decode paths are used only when the OS has enabled the YMM/ZMM state, preventing illegal-instruction faults on misconfigured systems. (#283)

Development & CI

Decoder conformance test suite plus a make target to run it, locking in cross-version decode behaviour. (#246)
Golden-file format-stability tests to catch unintended on-disk format drift. (#256)
Native Meson build system support alongside CMake. (#245)
Automated ABI stability check workflow diffs every change against a committed libabigail baseline. (#222)
Multi-compiler matrix extended with GCC 15 and 16. (#274)
Format advanced to v6 with the NUM block type removed (#264); the Sans-IO API and frame primitives are now internalised (#225). Codecov action updated to v7 (#267).
Consolidated ClusterFuzzLite fuzzer builds and runs into a single workflow, simplifying continuous fuzzing CI. (#287)

Documentation

README: dictionary guide (when small-block workloads benefit and why), specific decode-performance claims, and a format conformance & stability overview.
FORMAT / man page / EXAMPLES: document the .zxd dictionary format, --train, -D/--dict, and the unzxc alias.
Doxygen output now includes the README and brief per-symbol descriptions.

Acknowledgements

Special thanks to @jeanga for the help, support, and testing that went into the new pre-trained dictionary mode.

Changelog

api: Implements shared dictionary Huffman table (#275)
api: Enhances format validation (#271)
api: Overhauls dictionary fuzzer for robust testing (#270)
api: Adds comprehensive dictionary API to all wrappers (#269)
api: Introduces pre-trained dictionary compression (#261)
api: bump node-addon-api from 8.7.0 to 8.8.0 in /wrappers/nodejs (#253)
api: Adds conformance test suite and improves empty frame handling (#246)
api: Harden decompressor buffer bounds checks (#229)
api: Introduces static context API for caller-managed workspaces (#242)
api: Adds custom reader interface for seekable archives (#240)
api: wrappers: Adds seekable random-access decompression API (#224)
api: Internalizes Sans-IO API and frame primitives (#225)
perf: Optimize LZ run decoding for short offsets (#276)
perf: Add make target to run decoder conformance suite
perf: Improve Huffman leaf sorting with bucket sort (#244)
cli: enable native wildcard expansion for CLI on Windows (#284)
cli: Adds unzxc alias and renames dictionary training option (#272)
cli: Remove Snyk policy ignore
cli: Addresses Snyk scan findings and improves robustness (#273)
cli: Adds native Meson build system support (#245)
build: bump tar from 7.5.13 to 7.5.16 in /wrappers/nodejs (#286)
build: bump vite from 8.0.14 to 8.0.16 in /wrappers/nodejs (#285)
build: Add GCC 15 and 16 to CI multi-compiler matrix (#274)
build: Update LZbench branch for benchmark workflow
build: Enables empty data compression/decompression (#265)
build: Update package descriptions
build: Introduce dedicated SSE2 SIMD optimization path for x86-64 (#259)
build: tests: Enforces golden file format stability (#256)
build: bump vitest from 4.1.6 to 4.1.7 in /wrappers/nodejs (#254)
build: Restrict SBOM generation to tag pushes
build: Add qemu cpu targeting for simd dispatch coverage (#238)
build: Bump cibuildwheel from 3.3.1 to 3.4.1 in /wrappers/python (#233)
build: Bump vitest from 4.1.5 to 4.1.6 in /wrappers/nodejs (#234)
build: Use upstream LZbench for benchmarks
build: Add automated ABI stability check workflow (#222)
build: Generates SBOM for GitHub releases (#226)
build: Automate CHANGELOG generation for releases (#220)
build: Standardizes release artifact structure and naming (#223)
build: Pins Python wrapper build dependencies (#221)
build: Pin Python wrapper dependencies with hashes (#219)
build: Strengthens CI/CD security and reliability (#218)
build: Updates Node.js and Rust wrapper CI & scope heavy workflows to source paths (#217)
portability: Abstracts memory and sorting for portability (#228)
doc: Enhance API documentation with full Doxygen comments (#280)
doc: Removes NUM block type and advances format to v6 (#264)
doc: Enhance README with specific decode performance claims
doc: Include README and add brief descriptions to Doxygen output
doc: Add format conformance and stability overview to README
doc: Add OpenSSF Scorecard badge to README
misc: Gate AVX2/AVX512 detection on OS-enabled YMM/ZMM state (#283)
misc: Support empty shared Huffman tables (#278) (#279)
misc: Update Codecov action to v7 (#267)
misc: Update Ubuntu version badge to 26.10
misc: Rename SBOM template to lowercase
misc: Define LZ77 search limit as symbolic constant (#262)
misc: bump oss-fuzz-base/base-builder in /.clusterfuzzlite (#260)
misc: Accelerate LZ77 match finding with repeat offset seed at level 6 (#257)
misc: bump oss-fuzz-base/base-builder in /.clusterfuzzlite (#255)
misc: Add seekable random-access decompression to Python (#250)
misc: bump github/codeql-action from 4.35.5 to 4.36.0 (#252)
misc: Pin Meson and Ninja dependencies with hashes (#204) (#251)
misc: Adjust decompressor output buffer tail padding (#249)
misc: exclude conformance directory from Snyk Code scanning
misc: Move Doxyfile to docs directory (#243)
misc: Bump actions/attest-build-provenance from 3.0.0 to 4.1.0 (#232)
misc: Bump codecov/codecov-action from 6.0.0 to 6.0.1 (#231)
misc: Update ClusterFuzzLite base image and enable Dependabot (#230)

Full Changelog: v0.11.0...v0.12.0

hellobertrand/zxc v0.12.0 ZXC v0.12.0 on GitHub