github hellobertrand/zxc v0.10.0
ZXC v0.10.0

13 hours ago

Release Notes

This release introduces seekable archives for random-access decompression, WebAssembly support for browser and server-side JavaScript, a block-level API for filesystem integrations, a runtime library information API, and an official Go wrapper. It improves stream engine robustness with thread-safety fixes and includes several breaking CLI changes (default extension, checksum behavior, pkg-config module name). It also delivers significant encoder optimizations (split hash table, semi-branchless varint decoding, unified literal copy fast path, BMI/LZCNT extensions, parameterized lazy matching, SIMD register optimization) and internal code quality improvements. SOVERSION is incremented to 3.

Breaking Changes

  • Default file extension changed to .zxc: Compressed files now use the .zxc extension instead of the previous .xc suffix. File detection, recursive processing, and documentation have been updated accordingly. (#143)
  • pkg-config module renamed to libzxc: Follows the naming convention used by libzstd, liblz4, liblzma, libbrotli. Consumers should update their build files to use pkg_check_modules(LIBZXC IMPORTED_TARGET libzxc). (#144)
  • Checksums enabled by default: The CLI now enables checksums for all operation modes except --bench, where they remain disabled to avoid measuring checksum overhead. (#142)
  • SOVERSION incremented to 3: The shared library SOVERSION is bumped from 2 to 3 due to ABI-breaking changes (new seekable API, block-level API, runtime info API). Consumers linking against the shared library (libzxc.so.3 / libzxc.3.dylib) must rebuild. Static library users are unaffected.

New Features

Seekable Archives

Introduces seekable ZXC archives for random-access decompression. When seekable = 1 is set in zxc_compress_opts_t, the compressor appends a seek table block (ZXC_BLOCK_SEK) containing per-block compressed sizes, enabling O(1) block lookup and byte-range decompression without scanning the full stream.

New APIs: zxc_seekable_open, zxc_seekable_open_file, zxc_seekable_read, zxc_seekable_close, zxc_seekable_get_num_blocks, zxc_seekable_get_decompressed_size. (#188)

WebAssembly (WASM) Support

Adds an Emscripten-based WebAssembly build target, enabling browser and server-side JavaScript compression. Includes a JavaScript API wrapper for buffer-based operations, Node.js roundtrip tests, and a GitHub Actions workflow for automated WASM builds on push and release events. (#189)

Block-Level API

Introduces zxc_compress_block, zxc_decompress_block, and zxc_compress_block_bound for single-block compression and decompression without file headers, footers, or EOF blocks. This API is designed for filesystem integrations (DwarFS, EROFS, SquashFS) where the caller manages its own block indexing. (#148)

Runtime Library Information API

Introduces zxc_min_level, zxc_max_level, zxc_default_level, and zxc_version_string to query the supported compression level range and library version string at runtime. This allows callers, such as filesystem integrations, to discover metadata and capabilities without relying on compile-time constants alone. (#163)

Go Wrapper

Introduces the official Go package for zxc, located in wrappers/go. Built on CGo, it exposes two idiomatic APIs:

  • Buffer API: for in-memory compression and decompression.
  • Streaming API: for large files, backed by the multi-threaded stream engine with separate reader, worker, and writer threads.

Typed sentinel errors map every C error code to a named Go error value. Cross-platform CI (ubuntu-latest, ubuntu-24.04-arm, macos-latest, windows-latest) runs both tests and benchmarks on every release via the Test Go Package workflow. (#149)

Scalar-Only Build Option

Introduces the ZXC_DISABLE_SIMD CMake option to bypass all hand-written SIMD intrinsics and inline assembly, forcing the library into a scalar-only execution path. Compiler auto-vectorization remains unaffected. This is useful for baseline performance benchmarking, portability testing, and security auditing of the scalar code paths. (#174)

Performance

  • Split hash table architecture: Separates the LZ77 hash table into a position array (uint32_t, 128 KB) and an 8-bit tag array (uint8_t, 32 KB) with 15-bit addressing (32 768 buckets, up from 16 384). The compact tag array fits in L1 cache and rejects mismatches before loading positions, while the doubled bucket count shortens chain walks. (#179)
  • Semi-branchless varint decoding: Replaces conditional branching in the variable-byte integer decoder with look-up tables and bitwise operations, reducing branch mispredictions in the decompression hot path. Also standardizes bit-width constants to use CHAR_BIT from <limits.h>. (#186)
  • Unified 32-byte literal copy: Replaces conditional 16-byte and 32-byte literal copies with a single 32-byte copy when buffer padding allows, reducing branching in the compression hot path. (#181)
  • Varint encoding simplification: Removes unreachable 4-byte and 5-byte varint encoding paths (bounded by the 2 MB block size limit), replaces hardcoded integer limits with bit-shift constants, and simplifies lazy evaluation naming for clarity. (#183)
  • Decoupled LZ77 lazy evaluation: Improves the parallelism of lazy evaluation checks within zxc_lz77_find_best_match by evaluating ip+1 and ip+2 positions independently before a single decision, enhancing match finding efficiency. (#177)
  • Refined numeric data detection: The zxc_probe_is_numeric function now employs multi-region sampling, analyzing both the start and middle of a block to more accurately determine if data is suitable for numeric compression. (#177)
  • Optimized decompression loop: The safe loop for sequence decoding in zxc_decode_block_ghi is now 4x unrolled to process multiple sequences concurrently, significantly boosting decompression speed. (#177)
  • SIMD register optimization: Eliminates GPR round-trips in SIMD batch loops across all platforms. On ARM64, uses vdupq_laneq_u32 for direct NEON lane broadcast. On AVX2, uses _mm256_permute2x128_si256 + _mm256_shuffle_epi32. On AVX-512, uses _mm512_shuffle_i32x4 + _mm512_shuffle_epi32. This keeps the running sum in vector registers throughout the decode batch loop, reducing instruction latency. (#164)
  • ARM64 encoder vectorization: Leverages vminvq_u8 and vmaxvq_u8 vector reduction intrinsics for 16-byte block uniformity checks during RLE and literal scanning, avoiding premature extraction to general-purpose registers. Also removes an unnecessary prefetch in the LZ77 match-finding loop. (#164)
  • Encoder micro-optimization: Extracts offset_bits and offset_mask from the context structure into local constants in zxc_encode_block_glo and zxc_encode_block_ghi, reducing repeated member access in performance-critical paths. (#141)
  • BMI/LZCNT extensions: Optimizes zxc_log2_u32 with platform-specific intrinsics (_BitScanReverse, __builtin_clz) and enables BMI1/BMI2/LZCNT instruction set flags for AVX2 and AVX512 build variants. Fixes potential undefined behavior in 64-bit shift calculations for offset mask and epoch values. (#159)
  • Parameterized lazy matching threshold: Replaces the hardcoded match length limit (128) for lazy evaluation with a configurable parameter in zxc_lz77_params_t, allowing each compression level to independently tune the trade-off between search depth and encoding speed. (#158)
  • I/O buffer alignment: Replaces hardcoded buffer constants in the CLI with the ZXC_IO_BUFFER_SIZE macro (1 MB), ensuring consistent I/O buffer sizing during file processing. (#162)
  • Decompression macro refactor: Introduces DECODE_COPY_LITERALS and DECODE_COPY_MATCH sub-macros to deduplicate the literal and match copying logic across SAFE and FAST decoding paths in both glo and ghi block decoders. (#150)

Bug Fixes & Robustness

  • Improved bit reader robustness: Corrects edge-case handling in the bit reader's zxc_br_ensure function, ensuring proper behavior when working with 64-bit masks and shifts.
  • Stream engine synchronization: Destroys mutexes and condition variables after worker threads have joined to prevent resource leaks. (#146)
  • Thread creation failure handling: Properly destroys synchronization primitives and joins started threads when thread creation or context initialization fails, preventing resource leaks and deadlocks. (#146)
  • Worker thread data races: Job status and result size updates in worker threads are now performed while holding the context lock. The writer thread correctly locks the context and signals the reader thread upon I/O errors. (#146)
  • Integer overflow in allocation: Adds a check for integer overflow during the job memory block allocation size calculation; limits maximum threads to 512. (#146)
  • CLI output path buffer: Increases the out_path buffer to 4096 bytes, matching the input/resolved path buffers to prevent truncation with long file paths. (#146)
  • Block API state safety: Resets the initialized flag when freeing an internal context during re-initialization, preventing inconsistent state or double-free if subsequent allocation fails. (#148)
  • Go wrapper Windows fix: Fixes file handle conversion on Windows where os.File.Fd() returns a Win32 HANDLE instead of a CRT file descriptor, using _open_osfhandle to bridge correctly for C file operations. (#152)
  • CLI error handling: Reports an error when the output filename cannot be automatically determined from the input path, displays help when an unknown CLI option is encountered, and prevents a division-by-zero when calculating compression ratios for empty outputs. (#170)
  • Stream engine deadlock prevention: Condition variable wait loops in the async writer and main stream engine now check the I/O error flag, preventing threads from hanging when an I/O failure occurs elsewhere. The stream engine also initializes the full allocated memory block (input and output buffers) before use. (#170)
  • Decompression overflow hardening: Updates all boundary checks in the unrolled sequence decoding loops of zxc_decode_block_glo and zxc_decode_block_ghi to account for ZXC_PAD_SIZE, ensuring that fast multi-byte and unaligned writes cannot exceed the allocated buffer. The dispatch layer now requires 2 * ZXC_PAD_SIZE headroom before entering the fast path and caps the destination capacity to chunk_size + ZXC_PAD_SIZE, guaranteeing that a full unrolled block of four sequences can safely complete without mid-iteration boundary checks. (#178)

Development & CI

  • Windows ARM64 release build: Adds a native Windows ARM64 build target (windows-11-arm runner) to the release workflow, producing zxc-windows-arm64.zip with NEON optimizations. Adds explicit -A architecture flags to the CMake configure step for both Windows x64 and ARM64 targets. (#194)
  • Profile-Guided Optimization (PGO): Centralizes PGO support into a CMake helper macro, extends instrumentation coverage to FMV (function multi-versioning) variants, and documents the multi-step PGO build workflow in the README. (#185)
  • clang-format integration: New .clang-format configuration, format / format-check CMake targets, and a top-level Makefile for common development tasks. A CI job in the quality workflow enforces formatting. (#147)
  • Unified CI build: Workflows refactored to use CMake directly for all platforms, reducing duplication and ensuring CI matches the local environment. (#147)
  • macOS 26 and latest compilers: Pins all CI runners to macos-26 (ARM64 and Intel), configures GCC 14 on Linux and Xcode 26.4 on macOS across build, multi-architecture, and all language wrapper workflows. (#169)
  • Fuzzing coverage report: New CI job that replays existing fuzzing corpora with Clang source-based coverage instrumentation, merges profile data, and uploads an HTML report to identify untested code paths. (#160)
  • Expanded security analysis: CodeQL workflow now covers Go and Node.js wrappers in addition to C/C++. (#151, #152)
  • Benchmark suite updates: Uses pre-packaged Silesia tarball for deterministic corpus input, pins GCC 14 for Linux and Xcode 26.4 for macOS, adds lzav and zlib to the comparison set, reduces codec list for PR benchmarks to speed up CI feedback. (#166)
  • CI runner optimization: Migrates lightweight jobs to ubuntu-slim runners and removes redundant package installations. (#166)
  • Dependabot expansion: Adds monthly dependency update checks for pip, cargo, npm, and gomod ecosystems within the wrapper subdirectories. (#166)
  • Dependency updates: Bumps codecov/codecov-action from v5 to v6 (#167), updates rand crate from 0.9 to 0.10 in the Rust wrapper (#168).
  • Internal encoding cleanup: Replaces magic numbers with named constants for literal encoding, removes unused reserved encoding types (BITPACK, FSE), and fixes legacy header compatibility for larger block sizes. (#175)
  • Code coverage improvements: Excludes hard-to-test error paths (allocation failures, thread creation failures, progress callbacks) from coverage metrics with LCOV markers.
  • Tests: New test cases for mid-block truncation and output write failures in both single and multi-threaded stream engine modes.

Documentation

  • Format specification: Added normative sections defining versioning policy, compatibility rules, and mandatory error handling requirements for conforming decoders.
  • Whitepaper: Added compression ratio benchmark tables for zxc levels 1–5 vs. competitors across standard corpora. Updated benchmark methodology to reference silesia.tar from the Silesia compression corpus.
  • API: Updated API documentation with new methods zxc_compress_block, zxc_decompress_block, zxc_compress_block_bound, zxc_min_level, zxc_max_level, zxc_default_level, zxc_version_string, and the seekable API (zxc_seekable_open, zxc_seekable_read, etc.). Enhanced Doxygen comments for zxc_lz77_params_t parameters.
  • README: Updated security badges (Code Security workflow, Snyk). Updated benchmark methodology to reference silesia.tar. Added PGO build workflow documentation.
  • Doxygen: Fixed source path resolution and added missing @param description for chunk_size in zxc_sans_io.h.

Changes

Full Changelog: v0.9.1...v0.10.0

Don't miss a new zxc release

NewReleases is sending notifications on new releases.