github dathere/qsv 9.1.0

one day ago

[9.1.0] - 2025-11-03

FAIRMetadataRocks-smaller

FAIRification continues to be a focus, as we tweak key commands that enable us to FAIRify raw data at blazing speed:

  • frequency received significant updates in this release, including several new options that make compiling frequency distribution tables easier.
  • describegpt now uses the much faster BLAKE3 hash as a cache key (10-20x faster than SHA256) and supports passing complex prompts more easily through the file system.
  • qsv-stats - the engine that powers both stats and frequency commands - has been further optimized with the 0.40.0 release, to compile summary statistics as fast as possible - even for very large files - often one to two orders of magnitude faster (10 to 100x faster) than typical Python-based tools.
  • Polars has been upgraded to 0.52.0. This vectorized query engine allows us to support more tabular formats & analyze/query millions of rows in seconds in situ - all without loading the data into a database.
  • the csv 1.4.0 crate has been tuned further to squeeze out even higher throughput - already ~2 million rows per second!1

These improvements prepare the ground for the upcoming MCP server on qsv pro, which will enable at-scale, configurable, interactive "Data Steward-in-the-loop", value-added FAIRification of privacy-sensitive files.

The qsv pro MCP server will handle not just CSVs but also other formats, including unstructured data - all processed locally on the desktop, without sending your raw data to the cloud.

It will produce AI-ready, standards-compliant metadata (starting with DCAT-US v3, Croissant and schema.org) - ideal context for AI applications and data governance efforts alike.


Added

  • frequency: add --pretty-json option c67fd06
  • frequency: add --rank-strategy option #3075
  • frequency: add -null-text option #3082

Changed

  • describegpt: explicitly use frequency's dense rank strategy dc3f270
  • describegpt: allow --prompt to be loaded from a text file b11a10c
  • describegpt: use much faster BLAKE3 hash for cache key
  • frequency: change default rank-strategy from min (AKA "1224" ranking) to dense (AKA "1223" ranking)
  • lens: bumped csvlens from 0.13.0 to 0.14.0
  • lens: automatically set to monochrome mode when using --find option 8539869
  • luau: bumped embedded Luau from 0.694 to 0.697 3e68e29
  • stats: fingerprint hash now uses much-faster, parallelizable BLAKE3 instead of SHA256
  • table: document that it also creates "aligned TSVs" and Fixed Width Format files aaa84b0
  • tests: change default Python to 3.13
  • docs: documented that Extended Input Support (🗄️) does .zip auto-decompression
  • docs: documented Limited Extended Input Support (🗃️)
  • use latest qsv-tuned csv crate with performance optimizations
  • build(deps): bump flate2 from 1.1.4 to 1.1.5 by @dependabot[bot] in #3071
  • build(deps): bump human-panic from 2.0.3 to 2.0.4 by @dependabot[bot] in #3077
  • deps: bump Polars from 0.51.0 at py-1.35.0-beta.1 to 0.52.0 618edf0
  • build(deps): bump qsv-stats from 0.39.1 to 0.40.0 by @dependabot[bot] in #3078
  • build(deps): bump actions/upload-artifact from 4 to 5 by @dependabot[bot] in #3074
  • applied several clippy lint suggestions
  • bumped several indirect dependencies
  • align nightly to 2025-10-24, the same nightly as Polars
  • bumped MSRV to Rust 1.91

Fixed

  • describegpt: add SQL escaping to eliminate SQL injection attack vector; add .csv extension to --sql-output when Polars SQL query runs successfully ad52a35
  • frequency: fix --select option always returning <ALL_UNIQUE> #3082
  • fixed some publishing workflows

Removed

  • Removed SHA256 and replaced with mush faster, parallelizable BLAKE3 hash #3072 and #3080
  • publish: removed maximize-build-space step in workflows as it was not working as advertised
  • tests: removed target-cpu=native RUSTFLAG in CI tests to avoid intermittent SIGILL (Illegal Instruction) faults

Full Changelog: 8.1.1...9.1.0

Footnotes

  1. see validate_no_schema benchmark

Don't miss a new qsv release

NewReleases is sending notifications on new releases.