[9.1.0] - 2025-11-03
FAIRification continues to be a focus, as we tweak key commands that enable us to FAIRify raw data at blazing speed:
frequencyreceived significant updates in this release, including several new options that make compiling frequency distribution tables easier.describegptnow uses the much faster BLAKE3 hash as a cache key (10-20x faster than SHA256) and supports passing complex prompts more easily through the file system.- qsv-stats - the engine that powers both
statsandfrequencycommands - has been further optimized with the 0.40.0 release, to compile summary statistics as fast as possible - even for very large files - often one to two orders of magnitude faster (10 to 100x faster) than typical Python-based tools. - Polars has been upgraded to 0.52.0. This vectorized query engine allows us to support more tabular formats & analyze/query millions of rows in seconds in situ - all without loading the data into a database.
- the csv 1.4.0 crate has been tuned further to squeeze out even higher throughput - already ~2 million rows per second!1
These improvements prepare the ground for the upcoming MCP server on qsv pro, which will enable at-scale, configurable, interactive "Data Steward-in-the-loop", value-added FAIRification of privacy-sensitive files.
The qsv pro MCP server will handle not just CSVs but also other formats, including unstructured data - all processed locally on the desktop, without sending your raw data to the cloud.
It will produce AI-ready, standards-compliant metadata (starting with DCAT-US v3, Croissant and schema.org) - ideal context for AI applications and data governance efforts alike.
Added
frequency: add--pretty-jsonoption c67fd06frequency: add--rank-strategyoption #3075frequency: add-null-textoption #3082
Changed
describegpt: explicitly usefrequency's dense rank strategy dc3f270describegpt: allow--promptto be loaded from a text file b11a10cdescribegpt: use much faster BLAKE3 hash for cache keyfrequency: change default rank-strategy from min (AKA "1224" ranking) to dense (AKA "1223" ranking)lens: bumped csvlens from 0.13.0 to 0.14.0lens: automatically set to monochrome mode when using--findoption 8539869luau: bumped embedded Luau from 0.694 to 0.697 3e68e29stats: fingerprint hash now uses much-faster, parallelizable BLAKE3 instead of SHA256table: document that it also creates "aligned TSVs" and Fixed Width Format files aaa84b0- tests: change default Python to 3.13
- docs: documented that Extended Input Support (🗄️) does
.zipauto-decompression - docs: documented Limited Extended Input Support (🗃️)
- use latest qsv-tuned csv crate with performance optimizations
- build(deps): bump flate2 from 1.1.4 to 1.1.5 by @dependabot[bot] in #3071
- build(deps): bump human-panic from 2.0.3 to 2.0.4 by @dependabot[bot] in #3077
- deps: bump Polars from 0.51.0 at py-1.35.0-beta.1 to 0.52.0 618edf0
- build(deps): bump qsv-stats from 0.39.1 to 0.40.0 by @dependabot[bot] in #3078
- build(deps): bump actions/upload-artifact from 4 to 5 by @dependabot[bot] in #3074
- applied several clippy lint suggestions
- bumped several indirect dependencies
- align nightly to 2025-10-24, the same nightly as Polars
- bumped MSRV to Rust 1.91
Fixed
describegpt: add SQL escaping to eliminate SQL injection attack vector; add.csvextension to--sql-outputwhen Polars SQL query runs successfully ad52a35frequency: fix--selectoption always returning<ALL_UNIQUE>#3082- fixed some publishing workflows
Removed
- Removed SHA256 and replaced with mush faster, parallelizable BLAKE3 hash #3072 and #3080
- publish: removed
maximize-build-spacestep in workflows as it was not working as advertised - tests: removed
target-cpu=nativeRUSTFLAG in CI tests to avoid intermittent SIGILL (Illegal Instruction) faults
Full Changelog: 8.1.1...9.1.0
Footnotes
