Release Notes
This release introduces Level 6 (ZXC_LEVEL_DENSITY) with Huffman-coded literals and an optimal LZ77 parser, a push-based streaming API for non-blocking integrations, and idiomatic streaming I/O adapters across every official wrapper. It raises the default block size from 256 KB to 512 KB for better ratio and decode throughput.
No ABI breaks: SOVERSION stays at 3.
New Features
Level 6: ZXC_LEVEL_DENSITY
New maximum compression preset combining:
- Canonical Huffman literal encoding (8-bit length-limited). Decoder uses a 2048-entry multi-symbol lookup table over an 11-bit window, each entry decodes 1 or 2 symbols, with a 4-way interleaved bitstream. Selected only when Huffman saves >= 3 % over RAW/RLE.
- Optimal LZ77 parser based on dynamic programming over per-position cost estimates. Scratch is allocated lazily per cctx and reused across blocks.
The Huffman codec is integrated into the function-multi-versioning system (default / NEON / AVX2 / AVX-512). Format spec updated with enc_lit=2. (#208, #214)
Push-Based Streaming API (zxc_pstream.h)
Caller-driven, single-threaded streaming API for environments where blocking on FILE* is not possible: async event loops, callback-driven libraries, network protocols. Uses explicit zxc_inbuf_t / zxc_outbuf_t descriptors, allowing input/output in arbitrary chunks. Output is bit-identical to the Buffer and Stream APIs. Bindings provided in Go, Rust, Python, Node.js, and WASM, with a dedicated fuzzer. (#204)
Streaming I/O Adapters in Every Wrapper
Idiomatic streaming adapters across all wrappers, plus a magic-word detection utility for content-type sniffing.
- Go:
io.Reader/io.Writer/io.Closer. (#209) - Python:
ZxcReader/ZxcWriter(io.RawIOBase). (#210) - WHATWG
TransformStream: forfetch().body.pipeThrough(), Deno, Bun, Node.js. (#211) - Node.js
stream.Transform. (#212) - Rust
std::io:Encoder/Decoder. (#213)
Block API Safety Helpers
Introduces zxc_decompress_block_bound (minimum destination capacity required by the fast decoder) and zxc_decompress_block_safe (strict-sized variant for page-aligned or exactly-sized buffers where tail padding is impossible). (#198)
Compression Context Memory Estimation
zxc_estimate_cctx_size lets integrators with tight memory budgets compute the full context footprint before allocation. Exposed too in the Rust and Go wrappers. (#198)
Automatic Block Size in Block API
zxc_compress_block now automatically rounds the effective block size up to the next power of two that fits the source when no explicit block_size is provided. (#198)
Benchmarks vs v0.10.0
Silesia corpus, lzbench 2.2.1, single-threaded, -march=native.
Compression Ratio
Smaller output at every level:
- -1: 61.56 % → 61.50 % (−0.10 %)
- -2: 54.00 % → 53.61 % (−0.72 %)
- -3: 46.35 % → 45.79 % (−1.21 %)
- -4: 43.14 % → 42.65 % (−1.14 %)
- -5: 40.67 % → 40.27 % (−0.98 %)
- -6 (new): 36.28 %. ~10.8 % smaller than L5
Decompression Throughput
- Apple M2 (ARM64): L1 12 195 → 12 530 MB/s (+2.7 %), L2 +3.1 %, L3–L5 +0.6 to +1.4 %.
- Google Axion (ARM64): L1 8 924 → 9 067 MB/s (+1.6 %), L2 +0.8 %, L3–L5 within ±0.3 %.
- AMD EPYC 9B45 (x86_64): L1 10 803 → 10 844 MB/s (+0.4 %), L2 +0.6 %, L4–L5 −1.1 to −1.3 %.
- AMD EPYC 7763 (x86_64): L1 6 921 → 7 077 MB/s (+2.3 %), L2 +2.1 %, L5 +1.7 %.
Compression Throughput
Fast levels accelerate sharply on Axion / x86 thanks to the tag-first hash filter; mid levels trade a few % for the ratio gains:
- L1: M2 -3.1 %, Axion +7.2 %, EPYC 9B45 +9.1 %, EPYC 7763 +6.5 %.
- L2: M2 -2.3 %, Axion +12.0 %, EPYC 9B45 +16.1 %, EPYC 7763 +11.1 %.
- L3–L5: -1 to -5 % across all platforms, cost of the tighter ratio target and DP-based decisions feeding the parser.
- L6 (new): ~9-12 MB/s. ~10× slower than L5 by design, paying compression CPU once to ship a smaller artifact to millions of decoders.
Performance & Memory
- Tag-first hash filter for fast levels: at ZXC_LEVEL_FAST and below, an early hash-tag check short-circuits the position lookup on tag mismatch, with speculative prefetch of the next entries. (#215)
- NEON nibble masks via
SHRN: 128-bit byte comparison masks consolidated into 64-bit nibble masks. (#208) - Vectorized optimal-parser DP updates: AVX-512 / AVX2 / NEON for constant-cost match-length updates. (#208)
- Bulk-flush Huffman bit writer + cache-line-aligned tables. (#208)
- Chain table as a 64 KB ring buffer: entries beyond the 64 KB
uint16_tdelta reach were never read back. Saves over 3.8 MB percctxat the 2 MB block size with no change in ratio or throughput. (#198) - Aliased sequence/offset/token buffers: GLO and GHI paths are mutually exclusive per block; the three pointers now share a single region. (#198)
- Tighter max-sequence bound:
chunk_size / ZXC_LZ_MIN_MATCH_LEN. (#198) - Unified fast/safe decoders via a compile-time constant, eliminating duplicated logic with zero-cost rollback in the fast path. (#198)
Bug Fixes & Robustness
- TSan data race in stream engine shutdown: the async writer read
job->result_szoutside the lock while the main thread wrote the shutdown sentinel under it. Fixed by snapshottingresult_szandin_szunder the lock in the writer. (#201) - Varint truncation for chunk > 2 MB:
zxc_write_varintwas capped at 3 bytes, silently truncating literal runs or match lengths above 2 MB. Extended to 5 bytes to match the decoder. (#198) - Block API auto-resize: no longer rejects inputs larger than the explicit
block_size; rounds up to the next power of two. (#198) - Block API re-init safety: guards against zero-capacity destinations to prevent underflow. (#198)
- Harden decompression logic against integer overflows on 32-bit platforms. (#202)
- Pstream input validation: comprehensive bounds and NULL-pointer checks; rejects malformed buffer descriptors; footer validation distinguishes corruption from checksum failures. (#204)
- Wrapper decompression flush: Python, Node.js, and WASM dstream loops now drain internal buffers after user input is exhausted. Output buffer growth checks max-capacity overflow. (#204)
Development & CI
- Seekable API fuzzer (
fuzz_seekable.c) exercising range reads, metadata getters, multi-threaded decompression, and raw SEK block parsing. All fuzzers now use persistent static buffers and a 4 MB input cap. (#196) - Pstream API fuzzer targeting the state machine, chunk boundaries, and parser logic of the push-based streaming API. (#204)
- Windows ARM64 Go CI:
windows-11-armrunner with llvm-mingw toolchain (aarch64-w64-mingw32-gcc). (#195) - Monolithic test suite split: per-API files, individual CTest entries (96 instead of one aggregate). (#200)
Documentation
- Block API safety helpers: documents
zxc_decompress_block_bound,zxc_decompress_block_safe, andzxc_estimate_cctx_size, explaining the tail-pad requirement of the fast decoder. (#198) - Format spec:
enc_lit=2Huffman literal section. - API: documents the push-based streaming API and the new
ZXC_LEVEL_DENSITYpreset. - Nim bindings: community-maintained
zxc-nimadded to the README's examples and language support table, thanks to @georgelemon - Free Pascal bindings: community-maintained
Free-Pascal-port-of-ZXC-compressor-decompressoradded to the README, thanks to @Xelitan - TurboBench verification: documents TurboBench as an industry-standard benchmark suite that has officially merged zxc into its master branch.
Changes
Full Changelog: v0.10.0...v0.11.0