Release candidate 1
This is a feature release that introduces several optimizations and improvements.
The biggest addition is the Chorba CRC32 code, this is a major improvement to crc32 calculation speed for pre-CLMUL cpus. For now, we have 3 variants of Chorba: Generic, SSE2 and SSE4.1.
We have also removed our detection and usage of the various aligned alloc functions, because we need to support an application-provided alloc function, we have to check and fix buffer alignments anyway, so now we just use malloc() if none is provided.
The gzopen-related init code has been rewritten to clean up and unify the gzread and gzwrite behavior. Several malloc calls removed, places in the gz* code with malloc calls is down from 7 to 4 places (using gzopen will now only result in 2-3 calls to malloc total).
The reason for releasing 2.3.x instead of another 2.2 release is the introduction of Chorba CRC32, rewritten gzopen init code, the increased CMake version requirement, and the removal of NMake project files. There should not be any API/ABI changes (other than on the previously failing platforms fixed by #1980).
2.3.0-rc1
Important fixes/changes
- Remove NMake build projects #1899
- Increase minimum supported CMake version from 3.5.1 to 3.12 #1973
- Don't build C-fallback functions that never get used on x86_64 #1984
- Fix type mismatch on platforms where int32_t and uint32_t use long instead of int #1980
Optimizations
- Implement Chorba CRC32 #1837
- Fix a big endian bug on the 32k and larger specializations of chorba #1891
- Explicit SSE2 vectorization of Chorba CRC method #1872
- SSE4.1 optimized chorba #1893
- Fix function declaration for chorba_small_nondestructive_sse2 #1907
- Fix 32bit large chorba #1912
- [WebAssembly] Fix stack overflow in crc32_chorba_118960_nondestructive. #1915
- Minor optimization of insert_string #1951
- Optimize compress_block() and build_tree() #1954
- Inline bi_reverse #1955
- Inline read_buf and flush_pending #1952
- Inline the CHUNKSIZE function #1974
Arch-Specific improvements/optimizations
- x86
- RiscV
- ARM
- Provide an inline asm fallback for the ARMv8 intrinsics #1697
- ARM Neon: Fold a copy into the adler32 function for UPDATEWINDOW #1870
- Remove volatile keyword from ARM inline assembler #1908
- Disable NEON workaround on Clang 20 and above, and enable it for non-mobile platforms #1942
- Synchronise ARMv8 and Loongarch CRC32 implementations #1969
- Loongarch64
- Port SSE/AVX optimization to Loongarch64 LSX/LASX Vector Intrinsics #1925
Buildsystem
- Fix -Wunused-command-line-argument warnings on Mac OS X #1967
- Fix -Wstrict-prototypes warnings #1968
- Initial support for nVidia toolchain #1993
- CMake: Make test options dependent on ZLIB_ENABLE_TESTS #1933
- CMake: Allow C17 for newer CMake versions #1958
- CMake: Rename targets to avoid clashes when used as a subproject #1970
- CMake: Rename target files to avoid overwrite of PACKAGE_VERSION #1988
- Configure: Added --installnamedir #1867
- Configure: Add support for RISC-V ZBC extension #1917
Tests/Benchmarks
- Bench: Add benchmark for insert_string. #1956
- Tests: Fix type mismatch with Windows GCC. #1965
- Tests: Fix cast and truncation warnings. #1978 #1979
CI
- CI: Minor fix for s390x CI runner version selection #1886
- CI: Fix broken actions-runner #1929
- CI: Update MacOS toolchain. #1962
- CI: Install Windows 11 SDK 10.0.22621 for 32-bit ARM. #1964
- CI: Use MacOS 14 for GCC UBSAN. #1963
- CI: Update macOS CI images #1971
- CI: Update s390x actions runner #1981
Misc
- Clean up crc32_braid. #1873
- fix the url of the s390x actions worker patch #1882
- port: Use memalign or _aligned_malloc only, when available, fallback to malloc. #1863
- port: Use __cpuid only, when available. #1887
- Use 'block-list' and 'allow-list' terms #1976
- Verify pointers during functable init #1983