GNR v2, ARM64 Optimizations & Hardened Safety

This release introduces the v2 generation of the internal GNR (General Block) decoder, bringing performance improvements through branchless logic and SIMD vectorization. It also includes a comprehensive security hardening pass, adding rigorous bounds checking and validation to all decoding paths.

Highlights

GNR v2 Decoder Engine

The core decoding loop has been rewritten to maximize instruction-level parallelism:

Branchless Design: Implemented branchless wild copies and match checks to minimize pipeline flushes.
SIMD Acceleration: Added native NEON (ARM64) and AVX2/SSSE3 (x86) implementations for overlapping copy routines.
Hybrid Decoding Strategy: Decompression uses a two-phase approach: careful bounds checking for the first 64KB, then an optimized unchecked path once all possible 16-bit offsets are mathematically guaranteed to be valid. This removes a branch per sequence for 75% of each chunk.

Fuzzing-Driven Safety Hardening

Following extensive fuzzing, multiple layers of protection have been added to prevent malformed streams from causing crashes:

Offset & Size Validation: Added rigorous checks for out-of-bounds reads during variable byte decoding and numeric referencing.
Overflow Protection: Implemented detection for integer overflows in VByte reads and destination buffer writes.
Infinite Loop Prevention: Added size limits to variable byte decoding sequences.

Encoder Optimizations

Fibonacci Hashing: Switched to a faster Fibonacci hash function for better distribution and speed.
Speculative Prefetching: Added memory prefetching for hash table entries to reduce cache miss latency.
Branchless Match Finder: Refactored the encoder's match checker to use bitwise masks instead of conditional branches.

Special Thanks

Thanks to @tansy for rewriting the CLI client, including the addition of short options and standardizing compression level flags (-1 to -5).

Add short options to help, version by @tansy in #15
Use -1..-5 as compression levels by @tansy in #16

Performance

Level -1: Add new compression level -1
GNR v2: Replaced legacy decoder with v2; utilizes single-sequence processing and streamlined loops.
SIMD: Added 32-byte copy routines and NEON shuffle optimizations for small offsets (2-15 bytes).
Prefetching: Implemented speculative prefetching for hash chain entries.
Hashing: Replaced masking with ZXC_LZ_HASH_BITS derived shifts (Fibonacci variant).
Memory: Used RESTRICT keyword on critical hot paths to aid compiler optimization.

Safety & Integrity

Validation: Added destination bounds checks before writing literals in generic number decoding.
VByte: Added strict bounds checking and overflow detection to variable byte decoding.
Stream: Validated stream sizes against sequence counts for early error detection.
Sanitization: Fixed potential out-of-bounds reads in the fast path by falling back to the safe path when remaining data is small.

Internals & Refactoring

Cleanup: Removed dead code for the original v1 GNR decoder.
Formatting: Renamed variables for clarity and updated internal documentation.
CI: Made fuzzer build scripts dynamic and updated benchmark workflows.

Full Changelog: v0.2.0...v0.3.0

hellobertrand/zxc v0.3.0 ZXC v0.3.0 on GitHub