Even faster search for common search patterns, such as sets of words, using additional SIMD AVX and neon/AArch64 intrinsics.
The results of the original ugrep benchmarks listed in the README still hold strong, actually improving slightly when searching sets of 1000 words of different categories of word lengths:
Benchmarking on a MacBook M1 Pro 16GB LPDDR5
Searching for sets of words in an 100MB benchmark file. Elapsed time is the average of 100 searches with random word sets of increasing size per data point in the log-log graph (i.e. each point in the graph is the average of 100 trials). Ugrep easily beats ripgrep and GNU grep up to 256 words searched. For larger sets of words, ugrep and ripgrep are more close, with a slight advantage for ripgrep. Ripgrep uses letter frequency search heuristics, which work very well to search English text corpora, such as this large benchmark file (but otherwise may perform not as efficient):
All benchmark scripts are available upon request.
Benchmarking on a MacBook Pro 2.9 GHz Intel Core i7 16 GB 2133 MHz LPDDR3
Searching for sets of words in an 100MB benchmark file. Elapsed time is the average of 100 searches with random word sets of increasing size per data point in the log-log graph (i.e. each point in the graph is the average of 100 trials). Ugrep and ripgrep performance is similar. Ripgrep has a bug when searching 32 words and produces the wrong results which we discovered weeks ago due to ripgrep's use of letter frequency search heuristics, which work very well for English corpora, such as this large benchmark file, but can be buggy. These buggy data points were removed from this graph to be fair:
All benchmark scripts are available upon request.
What's next?
Stay tuned for more additions (and even faster searching) coming soon!