github ashvardanian/StringZilla v2.0.0
v2: 5x swifter CPython bindings and first NodeJS bindings

latest releases: v4.6.0, v4.5.1, v4.5.0...
2 years ago

Python

So why would anyone replace the easy-to-use PyBind11 with almost 2,000 lines of pure CPython bindings?! Of course, to lower the latency! PyBind11 wraps every C++ object with a smart pointer, puts a hash table next to it, and addresses function pointers with std::string key lookups ๐Ÿคฏ

Let's see where it gets us if benchmarking with the "Leipzig1M" dataset. The bandwidth-oriented functions are just as fast as in the past:

  • Hashing the dataset: 77 ms for ๐Ÿ vs 16 ms for ๐Ÿฆ– ~ 4.5x faster
  • Counting the number of "the": 151 ms for ๐Ÿ vs 45 ms for ๐Ÿฆ– ~ 3.3x faster
  • Split all whitespace-delimited words: 782 ms for ๐Ÿ vs 338 ms for ๐Ÿฆ–~ 2.3x faster
  • Split around every "the": 240 ms for ๐Ÿ vs 48 ms for ๐Ÿฆ– ~ 5x faster

What about the latency-oriented ones?

  • Find the first whitespace: 1 ยตs for ๐Ÿ vs 3 ยตs for ๐Ÿฆ– ~ 3x slower, where previously it was 15ยต and 15x slower
  • Partition around the first whitespace: 73 ms for ๐Ÿ vs 33 ยตs for ๐Ÿฆ– ~ 2212x faster ๐Ÿฅณ

JavaScript

In an effort to bring faster string operations, together with @nairihar, we have started the NodeJS binding. It's just a skeleton, and has poor performance for now, but you can use it as a starting point to help us implement faster Str class for JavaScript ๐Ÿค—

Don't miss a new StringZilla release

NewReleases is sending notifications on new releases.