github hellobertrand/zxc v0.11.0
ZXC v0.11.0

4 hours ago

Release Notes

This release introduces Level 6 (ZXC_LEVEL_DENSITY) with Huffman-coded literals and an optimal LZ77 parser, a push-based streaming API for non-blocking integrations, and idiomatic streaming I/O adapters across every official wrapper. It raises the default block size from 256 KB to 512 KB for better ratio and decode throughput.
No ABI breaks: SOVERSION stays at 3.

New Features

Level 6: ZXC_LEVEL_DENSITY

New maximum compression preset combining:

  • Canonical Huffman literal encoding (8-bit length-limited). Decoder uses a 2048-entry multi-symbol lookup table over an 11-bit window, each entry decodes 1 or 2 symbols, with a 4-way interleaved bitstream. Selected only when Huffman saves >= 3 % over RAW/RLE.
  • Optimal LZ77 parser based on dynamic programming over per-position cost estimates. Scratch is allocated lazily per cctx and reused across blocks.

The Huffman codec is integrated into the function-multi-versioning system (default / NEON / AVX2 / AVX-512). Format spec updated with enc_lit=2. (#208, #214)

Push-Based Streaming API (zxc_pstream.h)

Caller-driven, single-threaded streaming API for environments where blocking on FILE* is not possible: async event loops, callback-driven libraries, network protocols. Uses explicit zxc_inbuf_t / zxc_outbuf_t descriptors, allowing input/output in arbitrary chunks. Output is bit-identical to the Buffer and Stream APIs. Bindings provided in Go, Rust, Python, Node.js, and WASM, with a dedicated fuzzer. (#204)

Streaming I/O Adapters in Every Wrapper

Idiomatic streaming adapters across all wrappers, plus a magic-word detection utility for content-type sniffing.

  • Go: io.Reader / io.Writer / io.Closer. (#209)
  • Python: ZxcReader / ZxcWriter (io.RawIOBase). (#210)
  • WHATWG TransformStream: for fetch().body.pipeThrough(), Deno, Bun, Node.js. (#211)
  • Node.js stream.Transform. (#212)
  • Rust std::io: Encoder / Decoder. (#213)

Block API Safety Helpers

Introduces zxc_decompress_block_bound (minimum destination capacity required by the fast decoder) and zxc_decompress_block_safe (strict-sized variant for page-aligned or exactly-sized buffers where tail padding is impossible). (#198)

Compression Context Memory Estimation

zxc_estimate_cctx_size lets integrators with tight memory budgets compute the full context footprint before allocation. Exposed too in the Rust and Go wrappers. (#198)

Automatic Block Size in Block API

zxc_compress_block now automatically rounds the effective block size up to the next power of two that fits the source when no explicit block_size is provided. (#198)

Benchmarks vs v0.10.0

Silesia corpus, lzbench 2.2.1, single-threaded, -march=native.

Compression Ratio

Smaller output at every level:

  • -1: 61.56 % → 61.50 % (−0.10 %)
  • -2: 54.00 % → 53.61 % (−0.72 %)
  • -3: 46.35 % → 45.79 % (−1.21 %)
  • -4: 43.14 % → 42.65 % (−1.14 %)
  • -5: 40.67 % → 40.27 % (−0.98 %)
  • -6 (new): 36.28 %. ~10.8 % smaller than L5

Decompression Throughput

  • Apple M2 (ARM64): L1 12 195 → 12 530 MB/s (+2.7 %), L2 +3.1 %, L3–L5 +0.6 to +1.4 %.
  • Google Axion (ARM64): L1 8 924 → 9 067 MB/s (+1.6 %), L2 +0.8 %, L3–L5 within ±0.3 %.
  • AMD EPYC 9B45 (x86_64): L1 10 803 → 10 844 MB/s (+0.4 %), L2 +0.6 %, L4–L5 −1.1 to −1.3 %.
  • AMD EPYC 7763 (x86_64): L1 6 921 → 7 077 MB/s (+2.3 %), L2 +2.1 %, L5 +1.7 %.

Compression Throughput

Fast levels accelerate sharply on Axion / x86 thanks to the tag-first hash filter; mid levels trade a few % for the ratio gains:

  • L1: M2 -3.1 %, Axion +7.2 %, EPYC 9B45 +9.1 %, EPYC 7763 +6.5 %.
  • L2: M2 -2.3 %, Axion +12.0 %, EPYC 9B45 +16.1 %, EPYC 7763 +11.1 %.
  • L3–L5: -1 to -5 % across all platforms, cost of the tighter ratio target and DP-based decisions feeding the parser.
  • L6 (new): ~9-12 MB/s. ~10× slower than L5 by design, paying compression CPU once to ship a smaller artifact to millions of decoders.

Performance & Memory

  • Tag-first hash filter for fast levels: at ZXC_LEVEL_FAST and below, an early hash-tag check short-circuits the position lookup on tag mismatch, with speculative prefetch of the next entries. (#215)
  • NEON nibble masks via SHRN: 128-bit byte comparison masks consolidated into 64-bit nibble masks. (#208)
  • Vectorized optimal-parser DP updates: AVX-512 / AVX2 / NEON for constant-cost match-length updates. (#208)
  • Bulk-flush Huffman bit writer + cache-line-aligned tables. (#208)
  • Chain table as a 64 KB ring buffer: entries beyond the 64 KB uint16_t delta reach were never read back. Saves over 3.8 MB per cctx at the 2 MB block size with no change in ratio or throughput. (#198)
  • Aliased sequence/offset/token buffers: GLO and GHI paths are mutually exclusive per block; the three pointers now share a single region. (#198)
  • Tighter max-sequence bound: chunk_size / ZXC_LZ_MIN_MATCH_LEN. (#198)
  • Unified fast/safe decoders via a compile-time constant, eliminating duplicated logic with zero-cost rollback in the fast path. (#198)

Bug Fixes & Robustness

  • TSan data race in stream engine shutdown: the async writer read job->result_sz outside the lock while the main thread wrote the shutdown sentinel under it. Fixed by snapshotting result_sz and in_sz under the lock in the writer. (#201)
  • Varint truncation for chunk > 2 MB: zxc_write_varint was capped at 3 bytes, silently truncating literal runs or match lengths above 2 MB. Extended to 5 bytes to match the decoder. (#198)
  • Block API auto-resize: no longer rejects inputs larger than the explicit block_size; rounds up to the next power of two. (#198)
  • Block API re-init safety: guards against zero-capacity destinations to prevent underflow. (#198)
  • Harden decompression logic against integer overflows on 32-bit platforms. (#202)
  • Pstream input validation: comprehensive bounds and NULL-pointer checks; rejects malformed buffer descriptors; footer validation distinguishes corruption from checksum failures. (#204)
  • Wrapper decompression flush: Python, Node.js, and WASM dstream loops now drain internal buffers after user input is exhausted. Output buffer growth checks max-capacity overflow. (#204)

Development & CI

  • Seekable API fuzzer (fuzz_seekable.c) exercising range reads, metadata getters, multi-threaded decompression, and raw SEK block parsing. All fuzzers now use persistent static buffers and a 4 MB input cap. (#196)
  • Pstream API fuzzer targeting the state machine, chunk boundaries, and parser logic of the push-based streaming API. (#204)
  • Windows ARM64 Go CI: windows-11-arm runner with llvm-mingw toolchain (aarch64-w64-mingw32-gcc). (#195)
  • Monolithic test suite split: per-API files, individual CTest entries (96 instead of one aggregate). (#200)

Documentation

  • Block API safety helpers: documents zxc_decompress_block_bound, zxc_decompress_block_safe, and zxc_estimate_cctx_size, explaining the tail-pad requirement of the fast decoder. (#198)
  • Format spec: enc_lit=2 Huffman literal section.
  • API: documents the push-based streaming API and the new ZXC_LEVEL_DENSITY preset.
  • Nim bindings: community-maintained zxc-nim added to the README's examples and language support table, thanks to @georgelemon
  • Free Pascal bindings: community-maintained Free-Pascal-port-of-ZXC-compressor-decompressor added to the README, thanks to @Xelitan
  • TurboBench verification: documents TurboBench as an industry-standard benchmark suite that has officially merged zxc into its master branch.

Changes

Full Changelog: v0.10.0...v0.11.0

Don't miss a new zxc release

NewReleases is sending notifications on new releases.