Release Notes
This release introduces pre-trained dictionary compression - Train a dictionary from representative samples and ship dramatically smaller archives for small-block / many-small-file workloads - together with a shared dictionary Huffman table, a static context API for zero-allocation caller-managed workspaces, and a custom reader interface for seekable archives. The container format advances to v6 and the shared library SOVERSION bumps from 3 to 4.
⚠️ Breaking change. This is an ABI and format break. The format version is now v6 (the legacy
NUMblock type is removed), so v0.12.0 cannot read archives written by v0.11.0 and earlier, and older decoders reject v6 archives. Re-link againstlibzxc.so.4.Migrating v5 → v6 archives is a one-time transcode, decompress with your old (v5) build and recompress with the new one:
zxc-v5 -dc old.zxc | zxc-v6 -z -c > new.zxcKeep a v5 build around until your data at rest is converted (it is the only thing that can read v5). Full guide: bulk migration, dictionaries, verification: docs/MIGRATION.md.
New Features
Pre-Trained Dictionary Compression
For workloads compressed in small blocks (4 KB–128 KB), a pre-trained dictionary prefills the LZ77 sliding window at the start of every block, so even the earliest bytes have representative history to match against. This helps any time the block size is small enough that early bytes would otherwise lack history: a single small payload, or a large payload split into many small blocks.
- Train from a corpus, serialize to a self-describing
.zxd, and compress/decompress against it. Dictionaries are capped at 64 KB (ZXC_DICT_SIZE_MAX). - New public API in
zxc_dict.h:zxc_train_dict/zxc_train_dict_huf(train),zxc_dict_save/zxc_dict_save_bound(serialize),zxc_dict_train(one-shot train + serialize),zxc_dict_load/zxc_dict_huf(load), andzxc_dict_id/zxc_dict_get_id(identity). - Each archive records its 32-bit
dict_id; supplying the wrong dictionary (or none) fails cleanly with a dictionary-required / mismatch error instead of silently corrupting output. (#261)
Shared Dictionary Huffman Table
The .zxd carries a 128-byte packed literal Huffman code-lengths table trained on the same corpus. Small blocks no longer each pay to embed their own literal table — the decoder reuses the dictionary's shared table, which is where most of the small-block ratio gain comes from. The dict_id covers both the content and the table, so a single id pins the complete decode state. (#275)
Static Context API (Caller-Managed Workspaces)
New zero-allocation path for embedded and tightly-budgeted integrators: query the required workspace size up front, then hand libzxc a buffer you own.
zxc_static_cctx_workspace_size/zxc_static_dctx_workspace_sizeandzxc_init_static_cctx/zxc_init_static_dctxinitialise compression/decompression contexts in caller-provided memory: nomallocon the hot path. (#242)
Custom Reader Interface for Seekable Archives
zxc_seekable_open_reader lets seekable random-access decompression run over an arbitrary user-supplied reader callback (network, memory map, custom VFS), not just FILE*. zxc_seekable_set_dict wires dictionary support into the seekable path. (#240)
Dictionary API in Every Wrapper
Full dictionary training, serialization, and dictionary-backed compress/decompress exposed idiomatically across Go, Rust, Python, Node.js, and WASM, backed by an overhauled, hardened dictionary fuzzer. (#269, #270)
The seekable random-access decompression API is now exposed across Go, Rust, Node.js, and WASM via the umbrella header, and added to the Python wrapper. (#224, #250)
CLI: Dictionaries and the unzxc Alias
--traintrains a dictionary from the input files (output path via-o, defaulting todictionary_<dict_id>.zxd). This renames the former--train-dict PATH.-D, --dict FILEcompresses or decompresses against a.zxd;-l, --listreports an archive'sDict IDand inspects.zxdfiles.unzxcis a new decompression alias: installed as a symlink tozxc, it defaults to decompress mode (equivalent tozxc -d). (#272)- Native wildcard expansion on Windows: the CLI expands glob patterns itself on Windows (where the shell doesn't), so
zxc *.logbehaves as on POSIX. (#284)
Performance & Memory
Measured versus v0.11.0:
- ~10 % faster compression at level 6, from the repeat-offset seed feeding the optimal parser (see below).
- ~3 % faster decompression at levels 1 and 2.
- Repeat-offset seed at L6: the optimal parser seeds match finding with the current repeat offset, accelerating LZ77 search on repetitive data at maximum density. (#257)
- Bucket-sort Huffman leaf ordering: replaces comparison sorting of literal frequencies with a bucket sort during table construction. (#244)
- Dedicated SSE2 SIMD path for x86-64: a sub-AVX2 tier so older / baseline x86-64 CPUs get vectorised decode and match finding instead of falling back to scalar. (#259)
- Short-offset LZ run decoding: faster run/overlap copies for small match offsets, improving decode on highly repetitive data (neutral on Silesia). (#276)
Bug Fixes & Robustness
- Harden decompressor buffer bounds checks against malformed input. (#229)
- Decompressor output-buffer tail padding adjusted to match the fast decoder's overwrite contract. (#249)
- Empty-data compression/decompression handled correctly end-to-end across the wrappers. (#265)
- Enhanced format validation with stricter rejection of malformed frames. (#271)
- Snyk findings addressed and general robustness improvements. (#273)
- Gate AVX2/AVX-512 detection on OS-enabled vector state: feature detection now also checks
OSXSAVE/XCR0, so the AVX2/AVX-512 decode paths are used only when the OS has enabled the YMM/ZMM state, preventing illegal-instruction faults on misconfigured systems. (#283)
Development & CI
- Decoder conformance test suite plus a
maketarget to run it, locking in cross-version decode behaviour. (#246) - Golden-file format-stability tests to catch unintended on-disk format drift. (#256)
- Native Meson build system support alongside CMake. (#245)
- Automated ABI stability check workflow diffs every change against a committed libabigail baseline. (#222)
- Multi-compiler matrix extended with GCC 15 and 16. (#274)
- Format advanced to v6 with the
NUMblock type removed (#264); the Sans-IO API and frame primitives are now internalised (#225). Codecov action updated to v7 (#267). - Consolidated ClusterFuzzLite fuzzer builds and runs into a single workflow, simplifying continuous fuzzing CI. (#287)
Documentation
- README: dictionary guide (when small-block workloads benefit and why), specific decode-performance claims, and a format conformance & stability overview.
- FORMAT / man page / EXAMPLES: document the
.zxddictionary format,--train,-D/--dict, and theunzxcalias. - Doxygen output now includes the README and brief per-symbol descriptions.
Acknowledgements
Special thanks to @jeanga for the help, support, and testing that went into the new pre-trained dictionary mode.
Changelog
- api: Implements shared dictionary Huffman table (#275)
- api: Enhances format validation (#271)
- api: Overhauls dictionary fuzzer for robust testing (#270)
- api: Adds comprehensive dictionary API to all wrappers (#269)
- api: Introduces pre-trained dictionary compression (#261)
- api: bump node-addon-api from 8.7.0 to 8.8.0 in /wrappers/nodejs (#253)
- api: Adds conformance test suite and improves empty frame handling (#246)
- api: Harden decompressor buffer bounds checks (#229)
- api: Introduces static context API for caller-managed workspaces (#242)
- api: Adds custom reader interface for seekable archives (#240)
- api: wrappers: Adds seekable random-access decompression API (#224)
- api: Internalizes Sans-IO API and frame primitives (#225)
- perf: Optimize LZ run decoding for short offsets (#276)
- perf: Add make target to run decoder conformance suite
- perf: Improve Huffman leaf sorting with bucket sort (#244)
- cli: enable native wildcard expansion for CLI on Windows (#284)
- cli: Adds
unzxcalias and renames dictionary training option (#272) - cli: Remove Snyk policy ignore
- cli: Addresses Snyk scan findings and improves robustness (#273)
- cli: Adds native Meson build system support (#245)
- build: bump tar from 7.5.13 to 7.5.16 in /wrappers/nodejs (#286)
- build: bump vite from 8.0.14 to 8.0.16 in /wrappers/nodejs (#285)
- build: Add GCC 15 and 16 to CI multi-compiler matrix (#274)
- build: Update LZbench branch for benchmark workflow
- build: Enables empty data compression/decompression (#265)
- build: Update package descriptions
- build: Introduce dedicated SSE2 SIMD optimization path for x86-64 (#259)
- build: tests: Enforces golden file format stability (#256)
- build: bump vitest from 4.1.6 to 4.1.7 in /wrappers/nodejs (#254)
- build: Restrict SBOM generation to tag pushes
- build: Add qemu cpu targeting for simd dispatch coverage (#238)
- build: Bump cibuildwheel from 3.3.1 to 3.4.1 in /wrappers/python (#233)
- build: Bump vitest from 4.1.5 to 4.1.6 in /wrappers/nodejs (#234)
- build: Use upstream LZbench for benchmarks
- build: Add automated ABI stability check workflow (#222)
- build: Generates SBOM for GitHub releases (#226)
- build: Automate CHANGELOG generation for releases (#220)
- build: Standardizes release artifact structure and naming (#223)
- build: Pins Python wrapper build dependencies (#221)
- build: Pin Python wrapper dependencies with hashes (#219)
- build: Strengthens CI/CD security and reliability (#218)
- build: Updates Node.js and Rust wrapper CI & scope heavy workflows to source paths (#217)
- portability: Abstracts memory and sorting for portability (#228)
- doc: Enhance API documentation with full Doxygen comments (#280)
- doc: Removes NUM block type and advances format to v6 (#264)
- doc: Enhance README with specific decode performance claims
- doc: Include README and add brief descriptions to Doxygen output
- doc: Add format conformance and stability overview to README
- doc: Add OpenSSF Scorecard badge to README
- misc: Gate AVX2/AVX512 detection on OS-enabled YMM/ZMM state (#283)
- misc: Support empty shared Huffman tables (#278) (#279)
- misc: Update Codecov action to v7 (#267)
- misc: Update Ubuntu version badge to 26.10
- misc: Rename SBOM template to lowercase
- misc: Define LZ77 search limit as symbolic constant (#262)
- misc: bump oss-fuzz-base/base-builder in /.clusterfuzzlite (#260)
- misc: Accelerate LZ77 match finding with repeat offset seed at level 6 (#257)
- misc: bump oss-fuzz-base/base-builder in /.clusterfuzzlite (#255)
- misc: Add seekable random-access decompression to Python (#250)
- misc: bump github/codeql-action from 4.35.5 to 4.36.0 (#252)
- misc: Pin Meson and Ninja dependencies with hashes (#204) (#251)
- misc: Adjust decompressor output buffer tail padding (#249)
- misc: exclude conformance directory from Snyk Code scanning
- misc: Move Doxyfile to docs directory (#243)
- misc: Bump actions/attest-build-provenance from 3.0.0 to 4.1.0 (#232)
- misc: Bump codecov/codecov-action from 6.0.0 to 6.0.1 (#231)
- misc: Update ClusterFuzzLite base image and enable Dependabot (#230)
Full Changelog: v0.11.0...v0.12.0