github dathere/qsv 3.0.0

8 days ago

[3.0.0] - 2025-02-13

Highlights:

  • sample: Five new sampling methods! In addition to reservoir & indexed - added bernoulli, systematic, stratified, weighted & cluster sampling. And they're all memory efficient so you should be able to sample arbitrarily large datasets!
  • stats: Added "sortiness" [-1 (Descending) to 1 (Ascending)] & "uniqueness_ratio" [0 (many repeated values) to 1 (All unique values)] stats (more info).
    The qsv-stats engine was also optimized to squeeze out more performance, with stats now 2.6x faster while using less memory despite the addition of these new stats.
  • diff: is now a "smart" command, so that it uses the stats cache to short-circuit diffs if files are identical per their fingerprint hashes, and to validate that the diff key column is all unique.
  • The stats cache has been refactored and improved performance for "smart" commands:
    • frequency is not only 3.3x faster, it uses far less memory as it now doesn't need to maintain hashmaps for columns with all unique values.
    • tojsonl is 2.25x faster
    • schema is 1.4x faster
  • luau got a major performance boost with the v0.660 engine upgrade, taking advantage of several compiler optimizations. luau is now up to 3.1x faster!
  • validate had a major performance regression - going down from 3.295 seconds in v2.1.0 to 13.159 seconds in v2.2.1 in the benchmarks. 4x slower! With the jsonschema 0.29 crate update, validate now clocks in 3.022 seconds!
  • template also got a big boost and is now 2.9x faster with the minijinja 2.7 crate update.

Added

  • joinp: additional joinp asof join sort and match options #2486
  • stats: add "sortiness" statistic #2499
  • stats add uniqueness_ratio #2521
  • stats & frequency: add --vis-whitespace option. Fulfills #2501 #2503
  • sample: add more sampling methods (in addition to indexed and reservoir - added bernoulli, systematic, stratified, weighted & cluster sampling) and made them all memory efficient so we can sample arbitrarily large datasets: #2507 & #2511
  • diff: make diff a "smart" command. Fulfills #2493 and #2509 #2518
  • benchmarks : added new benchmarks for sample for new sampling methods d758c54

Changed

  • luau: bump from 0.653 to 0.660 and optimize for performance 4402df6 de429b4 07ff8b8 3211f5c
  • stats: compute string len stats only for string columns #2495
  • contrib(completions): update qsv completions for qsv 2.2.1 by @rzmk in #2494
  • deps: bump polars to latest upstream after its py-1.22.0 release
  • deps: backported csv-core 0.1.12 fix to our qsv-optimized csv-core fork dathere/rust-csv@5d0916e
  • build(deps): bump actions/setup-python from 5.3.0 to 5.4.0 by @dependabot in #2488
  • build(deps): bump bytes from 1.9.0 to 1.10.0 by @dependabot in #2497
  • build(deps): bump data-encoding from 2.7.0 to 2.8.0 by @dependabot in #2512
  • build(deps): bump geosuggest-core from 0.6.5 to 0.6.6 by @dependabot in #2520
  • build(deps): bump geosuggest-utils from 0.6.5 to 0.6.6 by @dependabot in #2519
  • build(deps): bump jsonschema from 0.28.3 to 0.29.0 by @dependabot in #2510
  • build(deps): bump minijinja from 2.6.0 to 2.7.0 by @dependabot in #2489
  • build(deps): bump mlua from 0.10.2 to 0.10.3 by @dependabot in #2485
  • build(deps): bump qsv-stats from 0.27.0 to 0.28.0 by @dependabot in #2496
  • build(deps): bump qsv-stats from 0.28.0 to 0.29.0 by @dependabot in #2498
  • build(deps): bump qsv-stats from 0.29.0 to 0.30.0 by @dependabot in #2505
  • chore: Bump rand to 0.9 #2504
  • build(deps): bump simple-home-dir from 0.4.6 to 0.4.7 by @dependabot in #2515
  • build(deps): bump uuid from 1.12.1 to 1.13.1 by @dependabot in #2500
  • bumped numerous indirect dependencies to latest versions
  • applied select clippy lint suggestions
  • bumped MSRV to latest Rust stable - v1.84.1

Fixed

  • docs: QSV_AUTOINDEX => QSV_AUTOINDEX_SIZE typo. Fixes #2479 #2484
  • fix: search & searchset off by 1 when using --flag option. Fixes #2508 #2513

Full Changelog: 2.2.1...3.0.0

Don't miss a new qsv release

NewReleases is sending notifications on new releases.