[3.0.0] - 2025-02-13
Highlights:
sample
: Five new sampling methods! In addition to reservoir & indexed - added bernoulli, systematic, stratified, weighted & cluster sampling. And they're all memory efficient so you should be able to sample arbitrarily large datasets!stats
: Added "sortiness" [-1 (Descending) to 1 (Ascending)] & "uniqueness_ratio" [0 (many repeated values) to 1 (All unique values)] stats (more info).
The qsv-stats engine was also optimized to squeeze out more performance, withstats
now 2.6x faster while using less memory despite the addition of these new stats.diff
: is now a "smart" command, so that it uses the stats cache to short-circuit diffs if files are identical per their fingerprint hashes, and to validate that the diff key column is all unique.- The stats cache has been refactored and improved performance for "smart" commands:
frequency
is not only 3.3x faster, it uses far less memory as it now doesn't need to maintain hashmaps for columns with all unique values.tojsonl
is 2.25x fasterschema
is 1.4x faster
luau
got a major performance boost with the v0.660 engine upgrade, taking advantage of several compiler optimizations.luau
is now up to 3.1x faster!validate
had a major performance regression - going down from 3.295 seconds in v2.1.0 to 13.159 seconds in v2.2.1 in the benchmarks. 4x slower! With the jsonschema 0.29 crate update,validate
now clocks in 3.022 seconds!template
also got a big boost and is now 2.9x faster with the minijinja 2.7 crate update.
Added
joinp
: additionaljoinp
asof
join sort and match options #2486stats
: add "sortiness" statistic #2499stats
add uniqueness_ratio #2521stats
&frequency
: add--vis-whitespace
option. Fulfills #2501 #2503sample
: add more sampling methods (in addition to indexed and reservoir - added bernoulli, systematic, stratified, weighted & cluster sampling) and made them all memory efficient so we can sample arbitrarily large datasets: #2507 & #2511diff
: makediff
a "smart" command. Fulfills #2493 and #2509 #2518benchmarks
: added new benchmarks forsample
for new sampling methods d758c54
Changed
luau
: bump from 0.653 to 0.660 and optimize for performance 4402df6 de429b4 07ff8b8 3211f5cstats
: compute string len stats only for string columns #2495contrib(completions)
: update qsv completions for qsv 2.2.1 by @rzmk in #2494- deps: bump polars to latest upstream after its py-1.22.0 release
- deps: backported csv-core 0.1.12 fix to our qsv-optimized csv-core fork dathere/rust-csv@5d0916e
- build(deps): bump actions/setup-python from 5.3.0 to 5.4.0 by @dependabot in #2488
- build(deps): bump bytes from 1.9.0 to 1.10.0 by @dependabot in #2497
- build(deps): bump data-encoding from 2.7.0 to 2.8.0 by @dependabot in #2512
- build(deps): bump geosuggest-core from 0.6.5 to 0.6.6 by @dependabot in #2520
- build(deps): bump geosuggest-utils from 0.6.5 to 0.6.6 by @dependabot in #2519
- build(deps): bump jsonschema from 0.28.3 to 0.29.0 by @dependabot in #2510
- build(deps): bump minijinja from 2.6.0 to 2.7.0 by @dependabot in #2489
- build(deps): bump mlua from 0.10.2 to 0.10.3 by @dependabot in #2485
- build(deps): bump qsv-stats from 0.27.0 to 0.28.0 by @dependabot in #2496
- build(deps): bump qsv-stats from 0.28.0 to 0.29.0 by @dependabot in #2498
- build(deps): bump qsv-stats from 0.29.0 to 0.30.0 by @dependabot in #2505
- chore: Bump rand to 0.9 #2504
- build(deps): bump simple-home-dir from 0.4.6 to 0.4.7 by @dependabot in #2515
- build(deps): bump uuid from 1.12.1 to 1.13.1 by @dependabot in #2500
- bumped numerous indirect dependencies to latest versions
- applied select clippy lint suggestions
- bumped MSRV to latest Rust stable - v1.84.1
Fixed
- docs: QSV_AUTOINDEX => QSV_AUTOINDEX_SIZE typo. Fixes #2479 #2484
- fix:
search
&searchset
off by 1 when using--flag
option. Fixes #2508 #2513
Full Changelog: 2.2.1...3.0.0