github dathere/qsv 2.2.0

latest release: 2.2.1
4 days ago

[2.2.0] - 2025-01-26

Highlights:

  • stats - the ❤️ of qsv, got a little tune-up:
    • It got a tad faster now that we only compute string length stats for string types. Previously, we were also computing length for numbers, thinking it'll be useful for storage sizing purposes (as everything is stored as string with CSV). But as performance is goal number 1, we're no longer doing so. Besides, this sizing info can be derived using other stats.
    • Fixed the problem with the stats cache being deleted/ignored even when not necessary.
      This bug snuck in while implementing the --cache-threshold cache suppression option. With stats getting its cache mojo back - expect near-instant cache-backed response not only for stats but also other "automagical" smart commands 🪄.
  • diff - @janriemer squashed some bugs without sacrificing diff's ludicrous speed! 😉
  • validate: added dynamicEnum custom JSON Schema keyword column specifier support.
    You can now specify which column to validate against (by name or by 0-based column index), instead of always using the first column. This works for local & remote lookup files using the http/s://, ckan:// and dathere:// URL schemes.
  • extdedup now actually uses a proper memory-mapped backed on-disk hash table.
    Previously, it was only deduping in-memory as the odht crate was not properly wired to a memory mapped file 🤦 (I took the name of the odht crate literally and thought it was handling it 🤷). Thanks for the detailed bug report @Svenskunganka!
  • JSON query parsing overhaul.
    The fetch, fetchpost & json commands now use the latest jaq engine, making for faster performance especially now that we're precompiling and caching the jaq filter.
  • Polars engine upgraded. 🐻‍❄️
    By two versions! py-polars 1.20.0 and 1.21.0 - giving the sqlp, joinp, pivotp & count commands a little boost. 🚀

NOTE: qsv v2.2.0 is not available on crates.io as it does not allow enabling unreleased features as we await a new version of Polars. As soon as Polars 0.46.0 is published, a new qsv patch release will be published to crates.io.
This means that installation option 3 using cargo install will be limited to 1.0.0 - the last qsv version available on crates.io. All other installation and update options to install/update qsv 2.2.0 still work.


Added

  • diff: add --delimiter "convenience" option. Fulfills #2447 #2464
  • slice: add stdin and snappy compressed file support ab34a62
  • validate: add dynamicEnum column specifier support. Fulfills #2470 #2472

Changed

  • fetch, fetchpost & json: jaq dependency upgrade - from jaq-interpret & jaq-parse to jaq-core/jaq-json/jaq-std #2458
  • fetch & fetchpost: cache compiled jaq filter #2467
  • joinp: adjust asofby test to reflect Polars py-1.20.0 behavior 853a266
  • stats: compute string length stats for string type only #2471
  • sqlp: wordsmith fastpath explanation 4e3f853
  • refactor: standardize -q and -Q shortcut options. Fulfills #2466 #2468
  • deps: bump polars to 0.45.1 at py-polars-1.20.0 tag #2448
  • deps: bump polars to 0.45.1 at py-polars-1.21.0 tag 4525d00
  • deps: Bump csv-diff to 0.1.1 by @janriemer in #2456
  • deps: Bump csvlens to latest upstream 27a723e
  • deps: use latest strum upstream 2ca1b0d
  • build(deps): bump base62 from 2.2.0 to 2.2.1 by @dependabot in #2440
  • build(deps): bump chrono-tz from 0.10.0 to 0.10.1 by @dependabot in #2449
  • build(deps): bump data-encoding from 2.6.0 to 2.7.0 by @dependabot in #2444
  • build(deps): bump indexmap from 2.7.0 to 2.7.1 by @dependabot in #2461
  • build(deps): bump jsonschema from 0.28.1 to 0.28.2 by @dependabot in #2469
  • build(deps): bump jsonschema from 0.28.2 to 0.28.3 by @dependabot in #2473
  • build(deps): bump log from 0.4.22 to 0.4.25 by @dependabot in #2439
  • build(deps): bump semver from 1.0.24 to 1.0.25 by @dependabot in #2459
  • build(deps): bump serde_json from 1.0.135 to 1.0.136 by @dependabot in #2455
  • build(deps): bump serde_json from 1.0.136 to 1.0.137 by @dependabot in #2460
  • build(deps): bump simple-home-dir from 0.4.5 to 0.4.6 by @dependabot in #2445
  • build(deps): bump uuid from 1.11.1 to 1.12.0 by @dependabot in #2441
  • build(deps): bump uuid from 1.12.0 to 1.12.1 by @dependabot in #2465
  • tests: enabled Windows CI caching for faster CI tests
  • bumped numerous indirect dependencies to latest versions
  • applied select clippy lint suggestions

Fixed

  • count: Sometimes, polars count returns zero even if there are rows. Fixed by doing a regular csv reader count when polars count returns zero abcd365
  • diff: Fix name to index conversion by @janriemer. Fixes #2443 #2457
  • extdedup: refactor/fix to actually have on-disk hash table backed by a mem-mapped file. Fixes #2462 #2475
  • stats: fix stats caching as it was inadvertently deleting the stats cache even when not necessary 96e6d28

Removed

  • foreach: refactored to remove unmaintained local-encoding dependency #2454
  • remove polars feature from qsvdp binary variant. We'll use py-polars from DP+ directly.

Full Changelog: 2.1.0...2.2.0

Don't miss a new qsv release

NewReleases is sending notifications on new releases.