This PR entirely refactors the codebase, separating the single-header implementation into separate headers. Moreover, it brings faster kernels for:
- Sorting of string sequences and pointer-sized integers 🔀
- Levenshtein edit distances for fuzzy matching of UTF-8 or DNA 🧬
- Needleman-Wunsch and Smith-Waterman scoring, see affine-gaps ☣️
- Multi-pattern sketching and fingerprinting on GPUs 🔍
- AES-based portable general-purpose hashing functions #️⃣
And more community contributions:
- Detecting CPU capabilities 👏 @GoWind - #196
- Windows cross-compilation 👏 @ashbob999 - #169
- CMake refactor 👏 @friendlyanon - #85
- Charset initialization 👏 @alexbarev - #200
- Benchmarking sorting algorithms 👏 @ashbob999 #209
- Allocator initialization and unknown pragmas on MSVC 👏 @GerHobbelt #231
- Big-endian SWAR substring-search backends 👏 @SammyVimes #75
- NodeJS and GoLang bindings groundwork 👏 @MarkReedZ #151
- New C++23 APIs 👏 @PleaseJustDont #225
- Repeated help with Rust 👏 @mikayelgr and @grouville #215
Huge thanks to our partners at Nebius for their continued support and endless stream of GPU installations for the most demanding computational workloads in both AI and beyond!
Why Split the Files? Matching SimSIMD Design
Sadly, most modern software development tooling is subpar. VS Code is just as slow and unresponsive as the older Atom and the other web-based technologies, while LSP implementations for C++ are equally slow and completely mess up code highlighting for files over 5,000 Lines Of Code (LOCs). So, I've unbundled the single-header solution into multiple headers, similar to SimSIMD.
Also, similar to SimSIMD, CPU feature detection has been reworked to separate serial implementations, Haswell, Skylake, Ice Lake, NEON, and SVE.
Faster Sequence Alignment & Scoring on GPUs
Biology is set to be one of the driving forces of the 21st century, and biological DNA/RNA/protein data is already one of the fastest-growing data modalities, outpacing Moore's Law. Still, most of the BioInformatics software today is flawed and pretty slow. Last year, I helped several BioTech and Pharma companies scale up their data-processing capacity with specialized optimizations for various niche use cases. Still, I also wanted to include baseline kernels for the most crucial algorithms to StringZilla, covering:
- Levenshtein distance computation for binary & UTF-8 data for general-purpose fuzzy string matching on CPUs and GPUs,
- Needleman-Wunsch and Smith-Waterman global and local sequence alignment with both linear and affine Gotoh gap extensions, so common in BioInformatics, on both CPUs and GPUs.
These kernels are hardly state-of-the-art at this point, but should provide a good baseline, ensuring correctness and equivalent outputs across different CPU & GPU brands.
Faster Sorting
Our old algorithm didn't perform any memory allocations and tried to fit too much into the provided buffers. The new breaking change in the API allows passing a memory allocator, making the implementation more flexible. It now works fine on both 32-bit and 64-bit systems.
The new serial algorithm is often 5 times faster than the std::sort function in the C++ Standard Template Library for a vector of strings. It's also typically 10 times faster than the qsort_r function in the GNU C library. There are even quicker versions available for Ice Lake CPUs with AVX-512 and Arm CPUs with SVE.
Faster Hashing Algorithms
Our old algorithm was a variation of the Karp-Rabin hash and was designed more for rolling hashing workloads. Sadly, such hashing schemes don't pass SMHasher and similar hash-testing suites, and a better solution was needed. For years, I have been contemplating designing a general-purpose hash function based on AES instructions, which have been implemented in hardware for several CPU generations now. As discussed with @jandrewrogers, and can be seen in his AquaHash project, those instructions provide an almost unique amount of mixing logic per CPU cycle of latency.
Many popular hash libraries, such as AHash in the Rust ecosystem, cleverly combine AES instructions with 8-bit shuffles and 64-bit additions. However, they rarely harness the full power of the CPU due to the constraints of Rust tooling and the complexity of using masked x86 AVX-512 and predicated Arm SVE2 instructions. StringZilla does that and ticks a few more boxes:
- Outputs 64-bit hashes and passes the SMHasher
--extratests. - Is fast for both short (velocity) and long strings (throughput).
- Supports incremental (streaming) hashing, when the data arrives in chunks.
- Supports custom seeds for hashes and has it affecting every bit of the output.
- Provides dynamic-dispatch for different architectures to simplify deployment.
- Documents its logic and guarantees the same output across different platforms.
Implementing this logic, which provides both fast and high-quality hashes, often capable of computing four hashes simultaneously, made these kernels handy not only for hashing itself, but also for higher-level operations like database-style hash joins and set intersections, as well as advanced sequence alignment algorithms for bioinformatics.
Fingerprinting, Sketching, and Min-Hashing... using 52-bit Floats?!
Despite deprecating Rabin-Karp rolling hashes for general-purpose workloads, it's hard to argue with their usability in "fingerprinting" or "sketching" tasks, where a fixed-size feature vector is needed to compare contents of variable-length strings. The features should be as different as possible, covering various substring lengths, so polynomial rolling hashes fit nicely!
That said, implementing modulo-arithmetic over 64-bit integers is extremely expensive on Intel CPUs:
VPMULLQ (ZMM, ZMM, ZMM)for_mm512_mullo_epi64:- on Intel Ice Lake: 15 cycles on port 0.
- on AMD Zen4: 3 cycles on ports 0 or 1.
VPMULLD (ZMM, ZMM, ZMM)for_mm512_mullo_epi32:- on Intel Ice Lake: 10 cycles on port 0.
- on AMD Zen4: 3 cycles on ports 0 or 1.
VPMULLW (ZMM, ZMM, ZMM)for_mm512_mullo_epi16:- on Intel Ice Lake: 5 cycles on port 0.
- on AMD Zen4: 3 cycles on ports 0 or 1.
VPMADD52LUQ (ZMM, ZMM, ZMM)for_mm512_madd52lo_epu64for 52-bit multiplication:- on Intel Ice Lake: 4 cycles on port 0.
- on AMD Zen4: 4 cycles on ports 0 or 1.
It's even pricier on Nvidia GPUs, but as you can see above, we may have a way out! x86 has a cheap instruction for 52-bit integer multiplication & addition. Moreover, 52 is precisely the number of bits we can safely use to exactly store an integer inside a double-precision floating-point number, opening doors to a really weird, but equally as implementation of rolling hash functions across CPUs and GPUs, implemented using 64-bit floats, and Barrett's reductions for modulo-arithmetics, to avoid division!
Nest Steps
- Ditch using constant memory in CUDA. The original design was to load the substitution tables for NW and SW into shared memory, but a simpler design with constant memory was eventually accepted. That proved to be a mistake. The performance numbers are way lower than expected.
- Use Distributed Shared Memory for 10x larger NW and SW inputs. That would require additional variants of
linear_score_on_each_cuda_warp_andaffine_score_on_each_cuda_warp_, which scale to 16 blocks of (228 - 1) KB shared memory on each, totaling 3.5 MB of shared memory that can be used for alignments. Withu32scores and 3 DP diagonals (in case of non-affine gaps) that should significantly accelerate the alignment of ~300K-long strings. But keep in mind thatcluster.sync()is still very expensive - 1300 cycles - only 40% less than 2200 cycles forgrid.sync(). - Add asynchronous host callbacks for updating the addressable memory region in NW and SW.
- Potentially bring back
stringzilla_barebuilds on MSVC, if needed.
Major
- Break: Output error messages via C API (fc94890)
- Break: New wording for incremental hashers (66898ad)
- Break: Rust namespaces layout (b3338bb)
- Break: Refactor
Strsops (009b975) - Break: Rename again (fb56b60)
- Break: Drop fingerprinting bench (3e04157)
- Break:
sz::edit_distance-> Levenshtein (d44beb4) - Break: C++
lookupandfill_random(1ce830b) - Break:
charset/generate->byteset/fill_random(2ce2b49) - Break: New calling convention in
similarity.h(095bc2d) - Break: Return error-codes in sort functions (944804e)
- Break:
look_up_transformtolookupAPI (e0055d5) - Break:
checkumtobytesum, new hash, and PRNG (71f1f4b) - Break: Pointer-sized N-gram Sorting (0c38bff)
- Break:
sz_sortnow takes allocators (ec81663) - Break: Deprecate old fingerprinting (38014ee)
- Break: Replace
char_setconstructor with literals (2c49eae)
Minor
- Add:
Str.count_bytesetfor Python (802d699) - Add: GoLang official support (234b758)
- Add:
HashMaptraits for Rust (e555cc3) - Add:
try_resize_and_overwrite(e00a98b) - Add: Hashers for Swift (be363c9)
- Add: Big-endian SWAR backends (607dd14)
- Add: Zero-copy JS wrapper for buffers (4bf2dd6)
- Add:
sz.fill_randomfor Python (3565f9b) - Add: Allow resetting the dispatch table (68172a0)
- Add: Ephemeral GPU executors if no device is passed (6431900)
- Add: Capabilities getters for SW (86c89b7)
- Add: StringZillas for Rust draft (48a1120)
- Add: Fingerprinting benchmarks (471649e)
- Add: On GPU fingerprints in Python (6f26629)
- Add:
basic_rolling_hashersCUDA port (a25e3f2) - Add: Unrolled fingerprinting backends (9a04c95)
- Add: NW and SW scoring classes (958c9e1)
- Add: SW scoring (3e8a3a6)
- Add: NW scoring (079a8c1)
- Add: Capability-constrained Py constructors (d0dfa0e)
- Add:
LevenshteinDistancesUTF8in Py (4cd6c7d) - Add: Wrap
DeviceScopefor Python (30b8fd3) - Add: Make
Strsfrom lists, tuples, generators (05a5434) - Add: StringZillas Python tests (978d13d)
- Add:
Strslayout conversion tests (4e5df62) - Add: Exportable
_sz_py_apicapsules (9d5935b) - Add: Levenshtein kernels in shared lib (42a815f)
- Add:
Strs.from_arrowconversion (0441b32) - Add: Draft fingerprinting C binding (3f5e004)
- Add: Fingerprinting baselines (f997dad)
- Add:
SZ_NOINLINE(4e6e1af) - Add: Draft fingerprinting on GPUs (11af644)
- Add: Haswell rolling fingerprints (25d3ee6)
- Add: Draft exploration of
floatfingerprints (3955eee) - Add:
lock_guardto avoid STL (e6cdb93) - Add: Parallel fingerprinting (4ccac06)
- Add:
to_span,to_viewhelpers (4fb9283) - Add: Min-Hashing
basic_rolling_hashers(63447d3) - Add: Fingerprinting benchmarks (00c68d5)
- Add: 64-bit
doublefingerprinting (9ed6006) - Add: Test rolling hashes (a065aa0)
- Add: Fingerprinting drafts are back :) (9ab4a2b)
- Add: Draft parallel library backend (94147ed)
- Add: Long haystack CUDA kernels for find-many (51c9171)
- Add:
find_manyminimal counter & benchmarks (3c615d9) - Add:
bench_find_many.cpp(2d594e4) - Add: CUDA Aho Corasick placeholder (d087afa)
- Add: Parallel Ice Lake variants (6aaf16c)
- Add: NW and SW for Hopper (e53c8d5)
- Add: Hopper Levenshtein kernels (cb7c48d)
- Add: Affine gaps Levenshtein on Kepler (f09bbf9)
- Add: Ice Lake Affine Levenshtein kernels (51afad5)
- Add: Affine Levenshtein variants on GPU (3e1df93)
- Add:
sz_similarity_gaps_tenums (5b410b8) - Add: Concepts & new multithreading scheme (6be0da5)
- Add: Draft StringCuZilla C API (1975351)
- Add: Draft executors for StringCuZilla (e1a37de)
- Add: C++20 concepts (f7b091c)
- Add: Unmasked Ice Lake Levenshtein (2b4fa09)
- Add: Affine Levenshtein-Gotoh variants (25ab3b6)
- Add: Draft global scoring in CUDA (07196d0)
- Add: Affine gap extensions baseline (945613c)
- Add: Draft device-wide similarity kernels (871d7bc)
- Add: Levenshtein on Kepler (938bf6f)
- Add: Parallel multi-needle search with OpenMP (c8dc314)
- Add: Draft parallel substring search (cf53b9b)
- Add: Overflow-risk error codes (783cfa8)
- Add: Random access ranges (0a26059)
- Add:
safe_vector,safe_array(ac6218a) - Add: Immutable iterators for Arrow tapes (b70096b)
- Add:
indexed_container_iterator(a5ca2ac) - Add: Multi-pattern exact substring search (a83cdb5)
- Add: NW benchmarks on GPU (1879aeb)
- Add: Multi-flag
sz_caps(f76fa40) - Add: Repeat similarity benchmarks on CPU (f64fc56)
- Add:
gpu_specsandcuda_status_t(c36d2b8) - Add: Ice Lake similarity kernels (3c8d181)
- Add: Fetching Nvidia GPU specs (d0bdd17)
- Add: New batch similarity benchmarks (aca3301)
- Add: Warp-shuffle optimizations (4a62715)
- Add: Mem consumption CUDA tests (53d3e0d)
- Add: New NW and SW GPU kernels (ca0fece)
- Add: Blosum62 and NUC.4.4 matrices (8a99373)
- Add: Separate parallel & serial tests (3cb8dd0)
- Add: Parallel SW & NW scoring in OpenMP (85b9675)
- Add: Thrust-like
constant_iterator(6751e79) - Add:
arrow_strings_tape::try_append(6d7f221) - Add: CUDA scoring benchmarks (7ac0fdd)
- Add: Baseline NW and SW alignment with ~O(NM) space (044c7cc)
- Add: Horizontal scoring in OpenMP (02a2481)
- Add: Local alignment adaptation (109d8de)
- Add: Parallel GPU kernels (2861a85)
- Add: OpenMP
score_diagonally(427d5b5) - Add:
lookuptransform in Rust (3090472) - Add: OpenMP C++ draft (6f8cdb9)
- Add: Stateful hashing in Rust (763538e)
- Add: Expose
find_bytesetin Rust (667ea91) - Add: Set intersections in Rust (06784fc)
- Add: Sorting in Rust (e467649)
- Add: New memory benchmarks (dfd6ddf)
- Add: Draft
sz_find_sve(d7ede5d) - Add: Faster
find_bytewith SVE (56a49c8) - Add: New string similarity benchmarks (c12e6c4)
- Add: Short string hashing in SVE2 (a007c7c)
- Add: SVE & SVE2 bytesum (d6f87d7)
- Add: SVE2 macros (9fbdc9a)
- Add: Intersection benchmarks (e860af0)
- Add: SVE backend for sorting (148b615)
- Add: All new benchmarking suite (4744406)
- Add: Comparisons in SVE (c31020d)
- Add: Arm NEON hashing (4b3847d)
- Add: Missing SVE placeholder definition (63f0368)
- Add: C++
argsort,intersect(5ea0698) - Add:
status_tfor errors in C++ (63daa5f) - Add: Feature-extraction placeholder (de62723)
- Add: Intersections on Ice Lake (ea5dc76)
- Add: Serial JOINs (c7b841e)
- Add: Dispatched version API (9a32744)
- Add: Fetching dynamic library version in C (3538e97)
- Add: PRNG for Haswell & serial backend (6659aa0)
- Add: Streaming hashing on Ice Lake & Skylake X (2607d45)
- Add: Streaming hash benchmarks (8ac3a23)
- Add: Hashing on Haswell & Skylake-X (3c345bc)
- Add: Missing
sz_sequence_thelpers (dc7c109) - Add: Sorting placeholders & dispatch (cc98389)
- Add:
sz_sequence_argsort_ice(69d4ecb) - Add: AES-based hash placeholders (cb18c78)
- Add: Smaller Sorting Networks (cd6859a)
- Add: String sorting tests for different lengths (c670ccd)
- Add: Separate Skylake-X & Ice Lake checksums (554f50d)
- Add: New Levenshtein distance kernels (43471aa)
- Add: Missing Rust interfaces (1765f33)
Patch
- Docs: Pre-release stats update (c08c6c3)
- Make: Upgrade
setuptoolsin CI (492ecc0) - Improve: Allow
RuntimeErrorfor engine calls (f9cbb00) - Docs: Section titles (755f583)
- Improve: PyTest invalid input arguments (f5e46d5)
- Improve: All new similarity-scoring benchmarks (3713e72)
- Make: Option to disable sanitizers for masked IO (f880621)
- Make: Default to Python 3.12 for better
itertools(f2704d7) - Fix: Naming scorers in Python like in Rust (d0bc604)
- Fix: Avoid SWAR on big-endian (230e354)
- Improve: More big-endian SWAR tests (7a4b78f)
- Fix: Drop old similarity APIs in benchmarks (230fc13)
- Make: Packaging for Py & Go (4c819d6)
- Improve: Byte-set counting PyTests (bd692d2)
- Docs: What to know about CUDA (15349c6)
- Improve:
Str_like_*naming convention (5fdd8ee) - Fix: Merge artifacts & lifetime annotations (c71e67e)
- Docs: Wording & AI dashes (4a06f3f)
- Make: Packaging for NPM (0d95f13)
- Make: Drop long-deprecated
.releaserc(8b5af6e) - Make: Enable SIMD in NodeJS builds (c8f6a49)
- Improve: Polish parallel string test names (61cc860)
- Make: Reuse SIMD compilation flags in
build.rs(d129f95) - Make: Missing AES definitions for lib builds (e951c4d)
- Docs: Hashing sections for each SDK (cb4fe1b)
- Improve: Expose
.capabilitiesto JS (c557a62) - Improve: Test incremental hashers (7c6d37d)
- Fix: Self-move construction of
basic_string(5fb0e74) - Fix: Strict aliasing violation (a5a2421)
- Fix:
__cpp_lib_string_resize_and_overwritetest guards (cb2d6a8) - Docs: How to use parallel algorithms (ecad156)
- Fix: Stricter following of
SZ_AVOID_STL(c3040b7) - Docs: JS Quick Start (c923ed2)
- Make: Publish to NPM (b2cedc4)
- Make: Drop CodeQL noise (7c9caac)
- Docs: Describe dynamic dispatch & linking (e6f2c02)
- Improve: NodeJS groundwork & corner-case tests (#151) (00d75f5)
- Fix:
_MSC_VERto__GNUC__conditions (83492ac) - Fix: Unknown pragmas in MSVC (#231) (78bbc11)
- Make: Parallel algorithms CI/CD for PyPI (9b8c466)
- Make: Ignore formatting blames (b69206f)
- Make: JSON & YAML uniform formatting (320bddd)
- Make: Explicit CodeQL coverage in CI (e9fb38d)
- Fix: Match new C-level
DeviceScopebehavior (f217b86) - Fix: Sorting on big-endian
s390x(8d2d9c8) - Fix: Ruff-statically suggested issues (edaff0e)
- Improve: Disable
E722import exception warning (3aedfb2) - Fix: Check for immutable Py buffers (e8c437e)
- Improve: Test PRNG in Py and boundary Strs sizes (3d76e8f)
- Fix:
np.randomv1 vs v2 compatibility also inszs.(5f8ac03) - Fix: Prevent PyTest from parsing invalid UTF-8 (30c8935)
- Fix: Don't repeat seed-ed fuzzy tests (efceb0b)
- Fix:
np.randomv1 vs v2 compatibility (aea096b) - Make: Forwarding
SZ_IS_QEMU_(d433db0) - Fix: Minor logical inconsistencies & unused vars (679f7b9)
- Improve: Disable SVE in QEMU runs (072ee2e)
- Fix: Avoid
sys.getrefcounttests on PyPy (80fa90e) - Improve: Check rich comparisons before sorting (a21c44d)
- Improve: Session-scope fixture for PyTest env logs (3d84812)
- Improve: Fuzz PyTests and log environment (6545a04)
- Make: Respect env-vars for
-arch(6a450f6) - Make: Upgrade GitHub actions (dec93d0)
- Make: Avoid universal builds defaults for
pip install .(c462f42) - Improve: Guard compiler pragmas (42ad14d)
- Make: Bump FU to avoid missing
+wfxttarget (337257b) - Make: Detect CPU AES support on Arm (ae74d44)
- Make: Reinstall pre-packaged CMake on macOS-14 (24967c7)
- Fix: Fall-back CPU alloc for fingerprints (b582d5d)
- Make: Drop macOS Universal builds (e5cfb08)
- Fix: Sorting difference on 32/64 bit machines (0e17a3d)
- Make: Respect
MACOSX_DEPLOYMENT_TARGET(95a95fe) - Make: Avoid
fail-fastfor Python pre-release wheels (a7c3f04) - Fix: Win32 compilation issues (f792366)
- Improve: Test against Affine Gaps (65d323e)
- Improve: Avoid many unified memory re-allocs (d0db2d4)
- Fix: Intersect scopes HW capabilities (e4245b7)
- Make: Drop Python 3.7, require 3.8+ (c65bf5e)
- Make: Verbose PyTest logging in CI (c273d52)
- Fix: OS-feature-gate AVX checks (cd81b4a)
- Fix: Linking to 64-bit symbols (7da4dac)
- Make: Log HW caps in CI before tests (2c02d5b)
- Fix: Type-casting on MSVC (a347dcd)
- Make: Irrelevant links & comments (d6221e6)
- Make: Target Hopper
90ain Py & C (c317280) - Make: Outdated 64-bit detection envs (72d0f4b)
- Docs: Wording typos (cf1623c)
- Make: Missing
affine-gapsdep (fa52363) - Docs: No Alpine flow on release CI (f214169)
- Fix: Argument order (10dff6c)
- Make: Can't read
SWIFT_VERSIONforcontainer(a4015f1) - Improve: Differentiate
capabilities_modein PyTest (2bee1e4) - Fix: Dispatch serial code for
bytes_per_cell <= 2(48b4406) - Improve: New error codes for CPU/GPU interop (6e597ca)
- Fix: Avoid UB assigning i8x256x256 matrix (7ac3b83)
- Make: Bump FU due to sign conversions warnings (cbd160a)
- Fix: Detect missing GPUs at runtime (0bc0f8a)
- Improve: Formatting Swift (3d6669c)
- Improve: Test against
affine-gaps(48a538c) - Improve: Reduce sign-casting issues (a340131)
- Fix: Check CUDA in
szs_capabilities(42f043c) - Improve: Build & test reproducibility (446e14e)
- Make: Override
/std:c++for MSVC (930b1f0) - Make: Add CUDA to GitHub CI (2a4552c)
- Make: Workaround CI issues (46410c6)
- Make: No
barebuilds on Windows & macOS (b42a340) - Fix: MSVC compilation issues (5f60fe4)
- Improve: Simplify setting thread-counts (20bbe22)
- Make: Install
szbeforeszsin CI (9033649) - Make: Skip x86 intrinsics in
universalbuilds (ce6cab2) - Fix: Inferring Ice Lake similarity kernels (a65dc99)
- Fix: Disambiguate
szs_symbols (1fd5db8) - Make: Override VS Code compiler choice on
osx(d06d4e4) - Make: Avoid SVE builds on macOS (abb970b)
- Fix: Wrong
SZ_DYNAMIC_DISPATCHcheck (acdab3f) - Make: Preinstall
wheelin CI (62f8c98) - Make: Bump tapes to 4.0 (72fd6be)
- Fix: Avoid forced inlining for HW flags (4cef520)
- Fix: Feature-checking STL (b1418b5)
- Fix: Missing
allocator_traitsinclude (44b8d72) - Fix: Workaround for
static_cast(b6a4cc6) - Fix: Avoid unaligned XMM loads (0f840bb)
- Fix:
static_castto standard for MSVC (43c953f) - Make: Avoid
uvin GitHub CI (89ed74d) - Fix: Unused variable in
group_by(3761107) - Fix: Unused
qsorton MacOS (cd263ae) - Improve: Unaligned loads in serial hashes (5e9d488)
- Make: Parallel backends CI (352b48d)
- Make: Referencing old tests (7c4fe32)
- Make: Caps introspection flags on Arm (d6c7cf3)
- Fix: Converting to string views (6b00ab9)
- Fix: Unused symbols (e825ed8)
- Fix: Fetching engines
::capability_k(ddc640b) - Fix: uninitialized intersection
count(ef3ca96) - Make: Disable NUMA by default (b27552d)
- Fix: Guard SVE checks for cross-compilation (dac3941)
- Make: Install Git on Alpine (1ff0c15)
- Make: Log Alpine version (7ea327e)
- Fix: Type-casting seed on Clang (4598f42)
- Make: Bump Fork Union to 2.2.2 (e674413)
- Fix: Deprecate
levenshteinDistancein Swift (70c4add) - Fix: Passing StringZillas doctests (529fb76)
- Fix: Allow NULL allocator args (b0c33bd)
- Make: NVCC flags for Rust (6b38ea2)
- Fix: Report requesting 1 CPU core (4ef0464)
- Improve: Passing StringZillas.rs tests (34bb89a)
- Improve: Use StringTape for GPU backends (87a0767)
- Docs: StringZillas C API (34f4137)
- Fix: Memalloc initialization on MSVC (#230) (a6e0a77)
- Docs: Drop OpenMP and old name (7f45118)
- Improve: Infer
capabilitiesfromDeviceScope(a807eba) - Improve: Introspect
sz_device_scope_t(d9100a3) - Make: Consistent
-O2optimization (5fcde22) - Improve: Drop unused
info1(f4d4a76) - Fix: Rendering byte strings in Python (cf73d79)
- Fix: Forward errors from
sz_rune_parse(474dec4) - Improve: PyTest different MinHash dimensions (b5060a8)
- Fix: Expose
value_typefor CUDA fingerprinter (7724886) - Fix: Fingerprinting memory management (f8dea13)
- Fix:
to_spancompilation (e7fdd98) - Improve: Wrap high-dim fingerprints (bbf30d2)
- Fix: Handling empty strings in arrays (517757c)
- Improve: More readable PyTest (a6d3ed2)
- Fix: Checking for Ice Lake caps (4f7649a)
- Improve: Simpler & slower Py args parsing (36745f4)
- Improve: Propagate error message to Py (711bd63)
- Improve: Comparing 2 mem-allocators (7870cd3)
- Fix: Skip missing
affine_levenshtein_utf8_ice_t(1e112c6) - Make: Custom
CudaBuildExtensionfor Python (4abf63c) - Make: Option to disable CUDA builds (84492e5)
- Fix: Refer to
prong_tin executor concepts (57d4ec8) - Fix: MSVC & Clang compilation errors (31fcdb3)
- Make: Avoid OpenMP in builds (183ea96)
- Fix: Announce
LevenshteinDistancesUTF8Type(cb81c00) - Improve: Cache hardware capabilities (b9a1109)
- Improve: Export capabilities as a tuple (0223b62)
- Improve: Printing CUDA caps (94d8c36)
- Fix: Match Apache Arrow layout (38c87b4)
- Improve: Type-casting
seeds inStrs.sample(5ac53c3) - Improve: Constructing
Strsfrom PyArrow (7aebef4) - Fix: Track ownership of
Strsoffsets (65380ca) - Improve: Expose
sz_capabilitiesin non-dynamic builds (6206eb4) - Fix: Avoid depending SZS -> SZ (c411e37)
- Docs: Mark programming languages correctly (d1fc68c)
- Make: Bump C++ & CUDA to 20 for libs (320f3da)
- Fix:
rebind_allocin C++20 (492d726) - Fix: Tautological compare check (b924d90)
- Improve: Random-access similarity outputs (c5e778d)
- Make: Building parallel Python packages (0f44c25)
- Fix: Clang build warnings (7741272)
- Fix: Using braces for Clang builds (325cedb)
- Make: Pull submodules in CI (5bb90b3)
- Fix: Avoid
_mm256_cvtepi64_epi32on Haswell (014002e) - Make: Separate parallel library sources (ff019b1)
- Make: Forward
marchflags through NVCC (496ae84) - Make: Move CUDA lib into header (d4a66c5)
- Make: FMA flag for Haswell (2e1daa4)
- Fix: Compilation of all C targets (a2b228c)
- Make: Compiling StringZillas shared libs (6ac80e8)
- Improve:
gpu_specs_fetch& GPU args order (3d7b491) - Docs: Sync description one-liner (ab9b617)
- Improve: Draft parallel fingerprinting API (e49f570)
- Improve: Runtime variable window widths (031d067)
- Make: Rename
lib.rs(5d82454) - Docs: Reuse operators state (e5e2702)
- Improve: Naming multi-input processors (a3c3510)
- Fix: Scramble results between fingerprint benchmarks (fbf7203)
- Make: Format Python to 120 columns (640b7c4)
- Improve: Naming internal symbols (3b48c93)
- Improve: Unroll CUDA fingerprints (37f3d80)
- Docs: Refresh Python benchmarking suite (49cf4ea)
- Fix: Weird compiler bug related to
cuda_status_t(0a0955e) - Fix: Fingerprinting in CUDA (52b1d73)
- Fix: Estimating hash counts in fingerprints (058af71)
- Improve: Unroll & parallelize fingerprinting (cda36fd)
- Fix: Inferring the prong type of executors (531b1e9)
- Improve: Align thread-pool within stack-frame (b1077a4)
- Improve: Wording inconsistencies (afaf11b)
- Make: Launchers for Parallel C++ benchmarks (9f3beac)
- Fix: Fingerprinting via Skylake extensions (08c1e86)
- Fix: Passing fingerprinting builds (44a058d)
- Improve: Include hash counts in fingerprints (0d9ba5b)
- Improve: Consistent kernel naming without underscore prefixes (05725b2)
- Improve: Expose floating-point SIMD states (4a57789)
- Fix: Consistent
barrett_modin C++ & Python (1fc1cab) - Docs: Using
uvfor tests (f86dfaa) - Fix: Choosing co-primes with
std::gcd(a1b3001) - Improve: Using fast calling convention for CPython (97ab23c)
- Docs: Show higher recall with better hashes (255d443)
- Improve: Ensure
seedaffects hashes (78d39f9) - Improve: Separate StringZillas Python code (883a3cd)
- Fix: Fingerprinting compilation (19f92c4)
- Improve: Explore Min-Hashes (46dd7d0)
- Improve: Test fingerprint equivalence (e4aa3f7)
- Fix:
is_same_typeusage overstd::is_same(0ab5710) - Improve: Ignore previous UB commit in blame (7fc7323)
- Fix: Avoid UB with underscore prefixes (74e3b6f)
- Fix:
sz_bitcaststrict aliasing (80b97de) - Fix: Avoid
std::swapin device code (c0aea26) - Fix: C++17 compatibility issues (7ea685d)
- Fix: Guard C++20 concepts use (60763b3)
- Fix: Backport
std::remove_cvrefto C++17 (ffee12b) - Improve: Move
safe_vector(78d8c96) - Fix: Limit
constexpruse in C++11 (866e2f2) - Fix: Minor build issues (46a6d63)
- Improve: Extend dummy executors API (498d72a)
- Fix: Wrong Fork Union class name (5def3af)
- Improve: Compile-time-known
spanextents (8355b6e) - Improve: Move
arrays_equality(e96d26f) - Improve: Merge fingerprinting drafts (cf6077e)
- Make: Deprecate Find Many kernels (a6f799f)
- Improve: Extend
find_manytests (c80ce60) - Improve: Upgrade Fork Union (fb5f429)
- Docs: Similar wording in "Explore Levenshtein" (766e250)
- Fix: Correct namespaces for scripts (aafcbbd)
- Fix: Replace
+gwith+m,rlike GB (40bd3ed) - Fix: Wrong boundary conditions for
count_many_parallel(0b22dd9) - Improve: Switch to "StringParaZilla" naming (0f3f928)
- Improve: Cleaner haystack splitting (3e93b75)
- Improve: Prioritize find-many tests (7a433a9)
- Improve:
SZ_DYNAMICattributes (2f0334a) - Fix: Forwarding dataset
nothrow-copyable views in tests (170b61b) - Improve: Use CUDA Atomics to aggregate globally (cf7d9a5)
- Make: Missing
bench_searchtarget (1133cce) - Fix:
find_many.cuhcompilation issues (72affc2) - Fix: AC dictionary
try_assignwith different alloc (4f9d2db) - Fix: Warning around immutable span conversion (d88842a)
- Docs: Target names (503e470)
- Fix: Match C++ class names (966dd09)
- Docs: Sections & links (f5a94b1)
- Make: Bump
fork_union(3f5de03) - Fix: Revert to separate
find_manyalgos for different length (0acbeeb) - Improve:
__reduce_max_syncin SW on Hopper (c45017d) - Fix:
unified_allocpropagation (13e0201) - Docs: Move segmenting & features drafts (3972cb3)
- Fix: Compiler over-optimizing
bench_find_many(16f5fa4) - Fix: Prefix length in parallel counting of short needles (69c7b6a)
- Fix: Including the entire haystack into match (3ee1676)
- Fix: Buffer overflow due to wrong thread count (887c16b)
- Improve: Skip vocabulary duplicates (5302a4d)
- Improve: New multi-pattern search APIs (20c9135)
- Fix: Missing
std::generateinclude (853a0fa) - Improve: Multi-byte characters support (85c5bf8)
- Improve: Propagate allocators in
safe_vector(0c442d1) - Improve: Custom validators for nullary benchmarks (0cf994a)
- Improve: Benchmark early exit (aaa6927)
- Make: Rename
bench_search->bench_find(6f65624) - Docs: Aho-Corasick CUDA design (496e55a)
- Improve: New caching primitives (df1f7ef)
- Fix: Checking
STRINGWARS_STRESSenv-var (ade74f6) - Improve: Support executors in multi-pattern search (35fad3d)
- Improve: Divergent branches on
i16SW on Hopper (817b15c) - Fix: OOB impact on SW scoring (cd9ff1e)
- Fix: NW & SW on Hopper (4134e44)
- Improve: More noticeable signaling in tests (ece416d)
- Improve: Scheduling speculative kernels (8a6a185)
- Improve: Run multiple warps per block (3b2f263)
- Fix: Comparing Affine benchmark results (64d8f4d)
- Improve: Fuzzy test Ice Lake kernels (d603f7d)
- Improve: Shrink Affine Ice Lake kernels (79e7a2f)
- Fix: Affine top row initialization (b9e4160)
- Fix: Levenshtein w. Affine costs on GPU for zero-length ins (076e58a)
- Improve: Test weird affine gaps (3bbebc8)
- Improve: Match
fork_unionAPI in executors (1d6f58d) - Improve: Use
fork_unionpools (81463d4) - Docs: Inconsistent naming (8b62f2c)
- Make: Add
fork_uniondependency (bd3d341) - Improve: Duplicate
bytesumassignment (bfcd10e) - Make: Bump CUDA version (c709621)
- Fix: Ice Lake calls for empty inputs (f59ba5e)
- Fix: Correcting blends on Kepler (563a73d)
- Improve: Scheduling Levenshtein in CUDA (77b1087)
- Improve: Avoid
views::group_bywith callback (6a61e1b) - Improve:
bytes_per_cell_tenum (c6e907a) - Fix: Inconsistent timing in
bench_unary(60b99ab) - Improve: Bounded methods not supported (23a7f58)
- Improve: Use
requiresclause (22c691e) - Improve: Naming "executor" interfaces (7e778d9)
- Improve: Generalize Levenshtein in CUDA (29da524)
- Fix: Inclusion guard macro names (8e79fb7)
- Docs: More datasets on HuggingFace (0b33ac3)
- Improve: Aligned ZMM diagonal stores (10b5405)
- Improve: Branchless K-mask calculation (3998c1f)
- Improve: Measure gap magnitude (b4eb6a4)
- Fix: Avoid horizontal walker overflow (9b2b4e5)
- Fix: All Gotoh baselines (84397ae)
- Fix: Initializing affine DP matrices (640853f)
- Improve: Differentiate unary & uniform costs (8c15447)
- Fix: Horizontal Affine Walkers (e5d85f3)
- Improve: Alloc type-size check in
safe_vector(9c5a56c) - Improve: Fetch warp-size dynamically (83dd5fb)
- Improve: Warp-shuffle reductions in SW (914e98d)
- Fix: Over-estimating number of overlapping matches (b85d3ca)
- Improve: Faster multi-needle tests (0895d28)
- Improve: Splitting jobs in baseline multi-search (d47d849)
- Fix: Slicing corner-cases in OpenMP (7bcc803)
- Improve: Use
std::executionfor baseline tests (f228264) - Improve: Parallel baseline for substring search (6ebc7b0)
- Fix:
bytes_per_core_optimalestimate (83bc966) - Improve: Pointer-constructible spans (a0fd136)
- Fix: Passing multi-needle tests (faad971)
- Fix: Calculating
find_many_match_tproperties (d0ebee8) - Fix: Indexing needle IDs (42fa08d)
- Fix: Use smaller types in BFS queue (0c6dd00)
- Fix: Aho Corasick construction (e27f86f)
- Improve: Consistent shuffling behavior in benchmarks (d4d55fa)
- Fix:
sz_copy_skylaketail handling on large input (#222) (6da5e1e) - Fix: Propagate substitutions to benchmarks (eabe605)
- Fix: Underutilized 99% of the H100 (7a9a243)
- Fix: Using OpenMP directives (41e1a6e)
- Fix: Forward GPU specs in CUDA tests (9d86d4c)
- Improve: Generalize memory requirements estimates (7bd92c5)
- Fix: Missing STL includes (f7d365a)
- Improve: Use CUDA constant memory (cebd180)
- Improve: Forward-declare substituters (0a04960)
- Improve: Show signed integers in SIMD types (382c05d)
- Fix:
i16Ice Lake NW/SW alignment (8cc9794) - Improve: Catch & log exceptions (cb739e2)
- Improve: Better UTF8 tests for similarity (b04e934)
- Make: Parallel test launchers (1ece547)
- Fix: Build issues (9db0287)
- Fix: Included filename (ee38a22)
- Fix: Uniform costs for UTF-32 runes (6edcf7f)
- Improve: New template SFINAE in
similarity.hpp(10279f9) - Fix: Initializing horizontal aligner (91df0ce)
- Improve: Use tagged C
enums (35ba76c) - Improve: Allow custom validators in benchmarks (bd2a21d)
- Fix: Compiling
constant_iteratorin CUDA (bc59ee3) - Fix:
std::allocator::rebinddeprecated (4c87404) - Fix: Avoid
std::iteratordependency (b3db596) - Make: Move similarity benchmarks (90bfcb6)
- Fix: Report error codes in tests (05f17a6)
- Make: Separate Parallel C++ and CUDA tests (b175103)
- Make: Rename test files (2eeab83)
- Fix: Passing CUDA similarity tests (4c75d81)
- Fix: NW/SW test correspondence (fea39ed)
- Improve: Differentiate min/max-imizers (c96c5ed)
- Improve: Annotate throwing exceptions (2ef667c)
- Fix: Missing
sz_i16_tdefinition (238c86d) - Fix: Avoid similarity scoring references (e4b1bbd)
- Improve: Shorter type aliases (3ef1d26)
- Make: Revert to C++ for core tests (0c6ff1f)
- Docs: StringCuZilla design choices (f42aa85)
- Make: Separate StringCuZilla (c5fd4bc)
- Improve: Move capabilities to
types.h(1c1582f) - Make: Compile with OpenMP (1671b0f)
- Improve: Allow datasets in VRAM (b1c9a74)
- Fix: Shuffle datasets with over 4B tokens (247c6ec)
- Fix: Overflow
mean_token_lengthcalculation (efcadd1) - Make: NVCC kernel debugging symbols (0c0ff42)
- Fix: Arrow-like string array (fa4b0f4)
- Fix: Overwriting alignment scores (668a386)
- Fix: Synchronizing CUDA kernel launch (3b85a00)
- Fix: Shared memory requirements (27ad8f5)
- Make: NVCC can't handle
fsanitize(ea7647f) - Make: Draft CUDA compilation (1b96ef4)
- Fix: Hardening
malloc(0)behavior (7524882) - Improve: Share C++ macros and typedefs (e82d045)
- Fix: Accounting for different gap costs (e4e517f)
- Fix: NVCC warning for negative size field (a6c0fa2)
- Fix: Track capacity in fixed buffer alloc (64b40a9)
- Fix: Sign-cast warning in
_mm256_set1_epi64x(aac2e8f) - Fix: Overriding
SZ_DEBUGmacro (bca734a) - Fix: Calling unused helper struct unit tests (d369170)
- Improve: Cleaner API for OpenMP (bc311b3)
- Fix: Shifting Levenshtein diagonals in OpenMP (6b5ef98)
- Fix: Unaligned loads/stores of hash state (37863c9)
- Improve: Expose rune-parsing headers (2727a87)
- Fix: Diagonals depth (ef53f75)
- Docs: Showcase indexing diagonals (b55c696)
- Fix:
sz::lookupexamples in Rs (778d4f0) - Fix: Compiling SVE on MacOS (42270c8)
- Improve: Inline cheap calls (28282d2)
- Make: List
scripts/deps foruv(1907d2b) - Docs: Formatting (dd57536)
- Docs:
uvinstructions (9460fd4) - Fix: Unaligned
sz_hash_state_tstores (7e65a1e) - Improve: Align inner hash-states (1b3cdd5)
- Improve: Use
GiBoverGB(811fc59) - Improve: Construct
Byteset::from_bytes(34660f2) - Fix: Remove missing Ice-Lake benchs (0c08564)
- Fix: Compiling Py bindings (7e08180)
- Make: CMake formatting (44485fb)
- Docs: Formatting and references (928fd79)
- Fix: No return (4cb096b)
- Docs: Listing bench details (343b858)
- Make: Patch passing
SZ_USE_SVEdefinitions (45b15b0) - Fix: Expanding feature-detecting macros (75ef77e)
- Improve: Bold benchmark names in CLI (2ab635e)
- Improve:
fill_randomchecksums in benchmarks (0bba772) - Improve: Simpler SVE find nested loop (a1604b7)
- Improve: Unrolled serial hashing (77e482e)
- Fix:
find_svemask update on long needles (a493ab8) - Improve: More simple substring search tests (68449fd)
- Fix:
bench_sequenceCMake target (905749c) - Fix: Drop double negation in logging (3d77ec6)
- Improve: Use
SQINCPin SVE for increments (efda23b) - Fix: Dispatching SVE kernels (06ea5f7)
- Docs: SVE2 intersects TODO (fafb8b0)
- Improve:
do_not_optimizetoken-level results (d1a3779) - Make: Rename
bench_sort(991a78b) - Improve: Naming benchmark names (0074ad7)
- Improve: Better sorting benchmarks (7d534fb)
- Fix: Computing improvement percent (a5de795)
- Improve: Faster equality checks on NEON/SVE (20f35c7)
- Improve: New token-level benchmarks (12e1edd)
- Docs: Describe trivial types (af686dd)
- Fix: Naming byteset signature (9676cdb)
- Improve: Naming "vtable" entries (b9794e5)
- Make: Upgrade to C++20 for benchmarks (aa7f275)
- Improve: New-style "container" benchmarks (3f1c723)
- Fix: Reverse order
std::searchoffsets (244e605) - Docs: Ignore formatting CMake (366816e)
- Make: Formatting CMakeLists.txt (467b4b8)
- Fix: Extra comma in
printf(298d214) - Docs: Outdated function naming & spelling (92b9a56)
- Improve: Token benchmarks (3b1897e)
- Improve: Logging in container benchmarks (4d955d3)
- Fix: No intersect for Skylake (48d70ea)
- Fix: Revert to XMM on Haswell (4bec1e5)
- Fix: Composing STL collections (f9da4ed)
- Fix:
std::string::datais mutable only since C++17 (ff23c3d) - Improve: Discard state in streaming hash (828263f)
- Improve: Discarding buffer in streaming hashes (c4f7a0e)
- Improve: Separate PRNG backends in benchmarks (af54e93)
- Fix: Guard Skylake benchmarks (2965502)
- Fix: Unused
_sz_capabilitiessymbols (8bb90e5) - Fix:
sz_intersectsignature (f712de3) - Make: Don't build
stringzilla_bareon MacOS (a7b35ba) - Fix: Variable in C++14
constexpr(feb415f) - Fix: Unused Levenshtein tests (90540d3)
- Fix:
find_1bytesignature compatibility (d19e8b8) - Improve: Fix minor inconsistencies (f656577)
- Docs: Exploring perfect Unicode hashing (197cd87)
- Improve: Test set intersections (1d95601)
- Fix: Randomization benchmarks (8dc4a2c)
- Docs: Formatting (5c02c4e)
- Docs: Details on the Unicode range (b6e4406)
- Docs: Ignore C++ docstring updates blame (407dd2d)
- Docs: New formatting in C++ (0d982a4)
- Fix: Passing
sz_sequence_t::handle(75fabf1) - Improve: Remove redundant comments from sz_hash_state functions in Rust (9fe25df)
- Improve: Expose sz_lookup (471b002)
- Improve: Expose sz_hash_state_init, sz_hash_state_stream, and sz_hash_state_fold to Rust (1757e4e)
- Improve: Exposed sz_move, sz_fill, and sz_copy for Rust (b2085cc)
- Improve: Inline most common Rust APIs (a30b5b7)
- Make:
cibuildwheelenv variables (fbf256a) - Make: Decremental Rust builds (8877c82)
- Fix: Detecting caps in dynamic builds (d52bf63)
- Fix:
fill_randomtest condition (8b396c8) - Fix: Compilation of all bindings (2caefac)
- Make: Drop unused
build.sh(2bbafa1) - Improve: Testing hash functions (80688bb)
- Fix: Passing new hashing tests (268af53)
- Improve:
copy/moveon Haswell with interleaving (69dfa10) - Docs: Announce JOINs (2225488)
- Improve: Ordering includes (6e71536)
- Improve: Vectorize
sz_equal_haswell(7aad4bb) - Docs: Explaining
compare.hoperations (d7bab8d) - Improve: Clean
memory.hheader (7698392) - Improve: Use default allocator, when not provided (5a12c00)
- Docs: Disable sorting includes (8bc161f)
- Fix: Ice Lake partitioning logic (1da0e2b)
- Improve: Expose Insertion-sort helpers (a38867f)
- Fix: Merge-step bug in stable sort (db61d93)
- Improve: Introduce typed
_sz_swapmacro (dcf6c65) - Improve: Rename
sz_sorttosz_qsort(6191cc6) - Fix:
sz_sort_serialpasses tests (8bad799) - Fix:
uniform_int_distributionlower bound (bdee111) - Fix:
sz_sort_serialpasses for same length inputs (0fda5a5) - Improve: Drop hybrid sort code (50d8291)
- Fix: Underflow in serial sorting (5970fa4)
- Make: Recommend pretty-printing GDB symbols (a818f97)
- Fix:
uniform_int_distributionupper bound (17f28a3) - Fix: In C++11
constexprconstructor must be empty (13bace2) - Fix: Sorting benchmarks for new API (66f2ac9)
- Improve: Separate fingerprinting benchmarks (187e0bd)
- Make: Renamed temp-git-split-file -> scripts/bench_token.cpp (031bedf)
- Make: Renamed scripts/bench_token.cpp -> temp-git-split-file (07d2239)
- Make: Renamed scripts/bench_token.cpp -> scripts/bench_fingerprint.cpp (a0318eb)
- Docs: Signatures and typos (982dd4d)
- Improve: Wrap
std::accumulatefor checksums (bce107a) - Improve: Validate checksums in benchmark (abe8d07)
- Fix: Tail sum order in
checksum_haswell(b20d7cd) - Fix: Infer allocators
value_type(5bbd971) - Fix: Tail handling in
sz_checksum_haswell(84cb4c8) - Fix: Loops in AVX-512 checksums (4044855)
- Fix: Loop in
sz_checksum_haswell(509b58b) - Improve: Relax many
constexprs from C++20 to C++14 (0a3e363) - Make: Move drafts (1de3166)
- Make: Renamed temp-git-split-file -> include/stringzilla/hash.h (5a36cb7)
- Make: Renamed include/stringzilla/hash.h -> temp-git-split-file (7052266)
- Make: Renamed include/stringzilla/hash.h -> include/stringzilla/fingerprint.h (0ef7cf1)
- Docs: Spelling
usnigned(d18a159) - Improve: hybrid bench sort performance (9880f26)
- Fix: hybrid bench sorts loading initial stirng bytes incorrectly (455508f)
- Fix: stable sort bench tests failing (821d19e)
- Fix: Minor dispatch issues (d20e589)
- Improve: Faster
levenshtein_baseline(d9557d3) - Fix: BMI flags for
BZHI(fa47deb) - Fix: Masks back to using
BZHI(bd7054e) - Make: Library namespaced aliases (f3811d7)
- Fix:
sz_u512_vec_tmembers visibility (2007d49) - Fix: Bounded Levenshtein returns (749b0d8)
- Fix: Skylake dispatch (48e0913)
- Fix: Linking
stderr(084d653) - Docs: Formatting docstring (c99daf3)
- Fix: Initializing
basic_charset(864ee03) - Fix: Correct
basic_charsetoperator (#203) (e20d207) - Improve: Ignore 40 commits in blame (064829a)
- Fix: Overriding LibC in 32-bit Windows (645539b)
- Improve: C++ version macros naming (19c2ae9)
- Make: Detect Apple Universal builds (6d61c21)
- Make: Rename
stringzillitetostringzilla_bare(364e2ca) - Fix: Symbols names & visibility (406bf0f)
- Fix: Haswell compilation flag (00f27f6)
- Fix: Filter
compare.hfile (6512f1d) - Make: Split ./include/stringzilla/find.h to ./include/stringzilla/compare.h (fc9e5d6)
- Make: Split ./include/stringzilla/find.h to ./include/stringzilla/compare.h (49e8d9d)
- Make: Split ./include/stringzilla/find.h to ./include/stringzilla/compare.h (fc408fa)
- Fix: Partially filter
stringzilla.hfile (41e5917) - Fix: Minor macro mismatches (5f7ca59)
- Fix: Filter
types.hfile (b835051) - Fix: Filter
sort.hfile (1ba7982) - Fix: Filter
small_string.hfile (5b55e19) - Make: Separate builds for Skylake & Ice (4a1f03c)
- Improve: Platform-specific equality checks (8b44d6a)
- Fix: Filter
hash.hfile (be4c63d) - Fix: Filter
similarity.hfile (8b401bd) - Fix: Filter
memory.hfile (295d49a) - Fix: Filter
find.hfile (2a1fcd1) - Make: Split ./include/stringzilla/stringzilla.h to ./include/stringzilla/memory.h (2f76521)
- Make: Split ./include/stringzilla/stringzilla.h to ./include/stringzilla/memory.h (45e57ee)
- Make: Split ./include/stringzilla/stringzilla.h to ./include/stringzilla/memory.h (66778d6)
- Make: Split ./include/stringzilla/stringzilla.h to ./include/stringzilla/sort.h (c357c3e)
- Make: Split ./include/stringzilla/stringzilla.h to ./include/stringzilla/sort.h (cbfe5c7)
- Make: Split ./include/stringzilla/stringzilla.h to ./include/stringzilla/sort.h (085d2d3)
- Make: Split ./include/stringzilla/stringzilla.h to ./include/stringzilla/small_string.h (3464cb4)
- Make: Split ./include/stringzilla/stringzilla.h to ./include/stringzilla/small_string.h (89c4681)
- Make: Split ./include/stringzilla/stringzilla.h to ./include/stringzilla/small_string.h (3f9c248)
- Make: Split ./include/stringzilla/stringzilla.h to ./include/stringzilla/similarity.h (e23c35f)
- Make: Split ./include/stringzilla/stringzilla.h to ./include/stringzilla/similarity.h (10d829e)
- Make: Split ./include/stringzilla/stringzilla.h to ./include/stringzilla/similarity.h (d74e5dc)
- Make: Split ./include/stringzilla/stringzilla.h to ./include/stringzilla/hash.h (1f60e6d)
- Make: Split ./include/stringzilla/stringzilla.h to ./include/stringzilla/hash.h (08d0a20)
- Make: Split ./include/stringzilla/stringzilla.h to ./include/stringzilla/hash.h (9e9f256)
- Make: Split ./include/stringzilla/stringzilla.h to ./include/stringzilla/find.h (974ed78)
- Make: Split ./include/stringzilla/stringzilla.h to ./include/stringzilla/find.h (14ba3bf)
- Make: Split ./include/stringzilla/stringzilla.h to ./include/stringzilla/find.h (9e577be)
- Make: Split ./include/stringzilla/stringzilla.h to ./include/stringzilla/types.h (8cb0742)
- Make: Split ./include/stringzilla/stringzilla.h to ./include/stringzilla/types.h (22e3d1e)
- Make: Split ./include/stringzilla/stringzilla.h to ./include/stringzilla/types.h (ecb3775)
- Fix: Wrong env. variable names (d0678f8)
- Make: Inline ASM for detecting CPU features on ARM (0ee549a)
- Fix: Default Levenshtein upper bound (62ca6a0)
- Improve: Levenshtein functions for unicode (d3b423a)
- Docs: Levenshtein tutorial in Jupyter (715ad10)
- Fix:
sz_look_up_transform_avx512declaration (585f7d5) - Improve:
#pragma regiondashes (fe4449b)