qsv v2.0.0 is here! 🎉
It took 193 releases to get to v1.0.0, and we're already at v2.0.0 a month later!?!
Yes! We wanted a running start for 2025, and qsv 2.0.0 marks qsv's biggest release yet!
- It fully enables the "Data Resource Upload First (DRUF)" workflow, allowing Datapusher+ to infer "automagical metadata" from the data itself. It exposes two Domain Specific Language (DSL) options - Luau and MiniJinja - to enable powerful data transformation and validation capabilities. This allows data stewards to upload data first, then use qsv's DSL capabilities inside DP+ to automatically generate rich metadata - including data dictionaries, field descriptions, data quality rules, and data validation schemas. This "automagical metadata" approach dramatically reduces the friction in compiling high-quality, high-resolution metadata (using the DCAT-US 3.0 specification as a reference) that would otherwise be a manual, laborious, and error-prone process.
Under the hood, thefetchpost
,template
,stats
,validate
andluau
commands now have the necessary scaffolding to fully support this workflow inside Datapusher+ and ckanext-scheming. - It adds a new "smart"
pivotp
command, powered by Polars, to enable fast pivot operations on large datasets. It's "smart" as it uses the stats cache to automatically suggest an aggregation based on a column's data type and summary statistics. You can now pivot your data in seconds by simply specifying the columns to pivot on while blowing past Excel's PivotTable limitations. stats
now computes geometric mean and harmonic mean and adds string length stats, all while getting a performance boost.join
andjoinp
got a lot of love in this release, with several new options:joinp
: non-equi join support! 🎉💯🥳
See "Lightning Fast and Space Efficient Inequality Joins" paper and this Polars non-equi join tracking issue.join
&joinp
:--right-anti
and--right-semi
joinsjoinp
:--ignore-leading-zeros
option for join keysjoinp
:--maintain-order
option to maintain the order of the either the left or right dataset in the outputjoinp
: expanded--cache-schema
options to makejoinp
smarter/faster by leveraging the stats cachejoin
:--keys-output
option to write successfully joined keys to a separate output file.
This release lays the groundwork for the outliers
"smart" command to quickly identify outliers using stats/frequency info.
It also sets the stage for an initial implementation of our "Data Concierge" that leverages all the high-quality, high-res metadata we automagically compile with DRUF to enable Metadata Gardening Agents to proactively link seemingly unrelated data and glean insights as it constantly grooms the Data Catalog - effectively making it a FAIR Data Factory.
Added
fetchpost
: add--globals-json
option #2357fixlengths
: add--remove-empty
option; refactored for performance. Fulfills #2391. #2411join
: add--keys-output
option. Fulfills #2407. #2408join
: add--right-anti
and--right-semi
options. Fulfills #2379. #2380joinp
: add non-equi join support! 🎉💯🥳 #2409joinp
: add--ignore-leading-zeros
option. Fulfills #2398. #2400joinp
: add--maintain-order
option #2338joinp
: add--right-anti
and--right-semi
options. Fulfills #2377. #2378luau
: addl helper functions. Fulfills #1782. #2362luau
: addqsv_writejson
helper #2375pivotp
: new polars polars-powered command. Fulfills #799. #2364pivotp
: "smart" pivotp. #2367stats
: add geometric mean and harmonic mean. Fulfills #2227. #2342stats
: add string length stats to set stage for upcomingoutliers
"smart" command to quickly identify outliers using stats/frequency info #2390template
: add--globals-json
option #2356tojsonl
: add--quiet
option. Fulfills #2335. #2336validate
: add--validate-schema
option to check if the JSON Schema itself is valid #2393contrib(completions)
: add joinp--ignore-case
and slice--invert
by @rzmk in #2322contrib(completions)
: add--quiet
totojsonl
by @rzmk in #2337ci
: add qsv_glibc_2.31-headless to action by @rzmk in #2330- Add license to MSI installer by @rzmk in #2321
Changed
lens
: optimized csvlens library usage, dropping clap dependency #2403pivotp
: an even smarterpivotp
#2368stats
: performance boost 51349ba- Update deb package by @tino097 in #2226
ci
: attempt using files-folder instead of files by @rzmk in #2320- Setting QSV_FREEMEMORY_HEADROOM_PCT to 0 disables memory availability check #2353
- build(deps): bump actix-governor from 0.7.0 to 0.8.0 by @dependabot in #2351
- build(deps): bump bytemuck from 1.20.0 to 1.21.0 by @dependabot in #2361
- build(deps): bump chrono from 0.4.38 to 0.4.39 by @dependabot in #2345
- build(deps): bump crossbeam-channel from 0.5.13 to 0.5.14 by @dependabot in #2354
- build(deps): bump flexi_logger from 0.29.6 to 0.29.7 by @dependabot in #2348
- build(deps): bump governor from 0.7.0 to 0.8.0 by @dependabot in #2347
- build(deps): bump itertools from 0.13.0 to 0.14.0 by @dependabot in #2413
- build(deps): bump jsonschema from 0.26.1 to 0.26.2 by @dependabot in #2355
- build(deps): bump jsonschema from 0.26.2 to 0.27.0 by @dependabot in #2371
- build(deps): bump jsonschema from 0.27.1 to 0.28.0 by @dependabot in #2389
- build(deps): bump jsonschema from 0.28.0 to 0.28.1 by @dependabot in #2396
- bump polars from 0.44.2 to 0.45 #2340
- build(deps): bump polars from 0.45.0 to 0.45.1 by @dependabot in #2344
- bump pyo3 from 0.22 to 0.23 now that Polars supports it #2352
- build(deps): bump redis from 0.27.5 to 0.27.6 by @dependabot in #2331
- build(deps): bump reqwest from 0.12.9 to 0.12.11 by @dependabot in #2385
- build(deps): bump reqwest from 0.12.11 to 0.12.12 by @dependabot in #2395
- build(deps): bump rfd from 0.15.1 to 0.15.2 by @dependabot in #2404
- build(deps): bump serde from 1.0.215 to 1.0.216 by @dependabot in #2349
- build(deps): bump serde from 1.0.216 to 1.0.217 by @dependabot in #2384
- build(deps): bump serde_json from 1.0.133 to 1.0.134 by @dependabot in #2365
- build(deps): bump sysinfo from 0.32.1 to 0.33.0 by @dependabot in #2334
- build(deps): bump sysinfo from 0.33.0 to 0.33.1 by @dependabot in #2383
- deps: bump tabwriter to 1.4.1 bbcbeba
- build(deps): bump tokio from 1.41.1 to 1.42.0 by @dependabot in #2333
- build(deps): bump xxhash-rust from 0.8.12 to 0.8.13 by @dependabot in #2359
- build(deps): bump xxhash-rust from 0.8.13 to 0.8.14 by @dependabot in #2372
- build(deps): bump xxhash-rust from 0.8.14 to 0.8.15 by @dependabot in #2392
- apply several clippy suggestions
- bumped numerous indirect dependencies to latest versions
- bumped Rust nightly from 2024-11-28 to 2024-12-19 (same version used by Polars)
Fixed
joinp
: refactor--cache-schema
option. Resolves #2369. #2370extsort
underflow in CSV mode. Resolves #2391. #2412- instantiate logger properly 9c0c1a7
- fix
util::get_stats_records()
to no longer infer boolean inStatsMode::PolarsSchema
. Resolves #2369. cebb664
Full Changelog: 1.0.0...2.0.0