Highlights:
-
⭐ New
template
command for rendering templates with CSV data.
This should allow users to generate very complex documents (Form letters, JSON/XML files, etc.) with the powerful MiniJinja template engine (Example template). -
⭐ New
lookup
module for fetching reference data from remote and local files.
In addition to the typicalhttp
/https
schemes for remote files, qsv adds two additional schemes -CKAN://
anddatHere://
, fetching lookup data from a CKAN site or datHere maintained reference data respectively. The lookup module has simple file-based caching as well to minimize repeated fetching of typically static reference data (default cache age: 600 seconds).
Thelookup
module is now being used by theluau
(for itsqsv_register_lookup
helper) andvalidate
(for itsdynamicEnum
custom JSON Schema keyword) commands. More commands will take advantage of this module over time (e.g.apply
,geocode
,template
,sqlp
, etc.) to do extended lookups (e.g. lookup Census information given spatiotemporal data - like demographic info of a Census tract). -
✨ Enhanced
fetchpost
with MiniJinja templating for payload construction.
Previously,fetchpost
was limited to posting url-encoded HTML Form data with content typeapplication/x-www-form-urlencoded
. Now with the new--payload-tpl
and--content-type
options, users can post request bodies rendered with MiniJinja and specify other content types (typicallyapplication/json
,text/plain
,multipart/form-data
) as well. -
✨ Improved Polars integration with automatic schema detection
Thejoinp
andsqlp
commands now use qsv's stats cache to automatically determine column data types, rather than having Polars scan a sample of rows. This provides two key benefits:- Faster execution by skipping Polars' schema inference step
- GUARANTEED data type inferencing since the stats cache analyzes the entire dataset, not just a sample
-
🏃
fast-float2
crate for faster float parsing
Casting string/bytes to float is now much faster (2 to 8x faster than Rust's standard library) withfast-float2
. -
💪 Major dependency updates including Polars 0.44.2, Luau 0.650, mlua 0.10.0 and jsonschema 0.26.1
These core crates underpin qsv's advanced commands. Using the latest version of these crates allow qsv to stay true to its goal of being the fastest and most comprehensive data-wrangling toolkit.
Added
- added lookup module - enabling fetching and caching of reference data from remote and local files #2262
fetchpost
: add--payload-tpl <file>
and--content-type
options to construct payload using MiniJinja with the appropriate content-type #2268 5921498joinp
: derive polars schema from stats cache 86fe22esqlp
: derive polars schema from stats cache #2256template
: new command to render MiniJinja templates with CSV data #2267validate
: adddynamicEnum
lookup support #2265contrib(completions)
: add template command and update fetchpost by @rzmk in #2269- add
fast-float2
dependency for faster bytes to float conversion 7590e4e 3ca30aa - added more benchmarks for new/updated commands f8a1d4f cd7e480
Changed
luau
: adapt to mlua 0.10 API changes 268cb45luau
: refactored stage management 31ef58aluau
: now uses the lookup module 2f4be34stats
: minor perf refactoring 6cdd6ea- build(deps): bump actions/setup-python from 5.2.0 to 5.3.0 by @dependabot in #2243
- build(deps): bump azure/trusted-signing-action from 0.4.0 to 0.5.0 by @dependabot in #2239
- build(deps): bump bytes from 1.7.2 to 1.8.0 by @dependabot in #2231
- build(deps): bump cached from 0.53.1 to 0.54.0 by @dependabot in #2272
- build(deps): bump flexi_logger from 0.29.3 to 0.29.4 by @dependabot in #2229
- build(deps): bump flexi_logger from 0.29.4 to 0.29.5 by @dependabot in #2261
- build(deps): bump flexi_logger from 0.29.5 to 0.29.6 by @dependabot in #2266
- build(deps): bump hashbrown from 0.15.0 to 0.15.1 by @dependabot in #2270
- build(deps): bump jsonschema from 0.24.0 to 0.24.1 by @dependabot in #2234
- build(deps): bump jsonschema from 0.24.1 to 0.24.2 by @dependabot in #2238
- build(deps): bump jsonschema from 0.24.2 to 0.24.3 by @dependabot in #2240
- build(deps): bump jsonschema from 0.25.0 to 0.25.1 by @dependabot in #2244
- build(deps): bump jsonschema from 0.26.0 to 0.26.1 by @dependabot in #2260
- build(deps): bump regex from 1.11.0 to 1.11.1 by @dependabot in #2242
- build(deps): bump reqwest from 0.12.8 to 0.12.9 by @dependabot in #2258
- build(deps): bump serde from 1.0.210 to 1.0.211 by @dependabot in #2232
- build(deps): bump serde from 1.0.211 to 1.0.213 by @dependabot in #2236
- build(deps): bump serde from 1.0.213 to 1.0.214 by @dependabot in #2259
- build(deps): bump simd-json from 0.14.1 to 0.14.2 by @dependabot in #2235
- build(deps): bump tokio from 1.40.0 to 1.41.0 by @dependabot in #2237
deps
: updated our fork of the csv crate with more perf optimizations eae7d76deps
: use calamine upstream with unreleased fixes 4cc7f37deps
: use our csvlens fork untl PR removing unneeded arboard features is merged bb32322deps
: bump jsonschema from 0.25 to 0.26 #2251deps
: bump embedded Luau from 0.640 to 0.650 8c54b87 aca30b0deps
: bump mlua from 0.9 to 0.10 #2249deps
: bump Polars from 0.43.1 at py-1.11.0 tag to latest 0.44.2 upstream #2255 0e40a44- apply select clippy lint suggestions
- updated indirect dependencies
- aligned Rust nightly to Polars nightly - 2024-10-28 - 245bcb5
Fixed
Removed
- removed need to set RAYON_NUM_THREADS env var and just call the Rayon API directly aa6ef89
- removed unneeded
create_dir_all_threadsafe
helper now that std::create_dir_all is threadsafe d0af83b
Full Changelog: 0.137.0...0.138.0