github jqnatividad/qsv 0.138.0

17 hours ago

Highlights:

  • ⭐ New template command for rendering templates with CSV data.
    This should allow users to generate very complex documents (Form letters, JSON/XML files, etc.) with the powerful MiniJinja template engine (Example template).

  • ⭐ New lookup module for fetching reference data from remote and local files.
    In addition to the typical http/https schemes for remote files, qsv adds two additional schemes - CKAN:// and datHere://, fetching lookup data from a CKAN site or datHere maintained reference data respectively. The lookup module has simple file-based caching as well to minimize repeated fetching of typically static reference data (default cache age: 600 seconds).
    The lookup module is now being used by the luau (for its qsv_register_lookup helper) and validate (for its dynamicEnum custom JSON Schema keyword) commands. More commands will take advantage of this module over time (e.g. apply, geocode, template, sqlp, etc.) to do extended lookups (e.g. lookup Census information given spatiotemporal data - like demographic info of a Census tract).

  • ✨ Enhanced fetchpost with MiniJinja templating for payload construction.
    Previously, fetchpost was limited to posting url-encoded HTML Form data with content type application/x-www-form-urlencoded. Now with the new --payload-tpl and --content-type options, users can post request bodies rendered with MiniJinja and specify other content types (typically application/json, text/plain, multipart/form-data) as well.

  • ✨ Improved Polars integration with automatic schema detection
    The joinp and sqlp commands now use qsv's stats cache to automatically determine column data types, rather than having Polars scan a sample of rows. This provides two key benefits:

    1. Faster execution by skipping Polars' schema inference step
    2. GUARANTEED data type inferencing since the stats cache analyzes the entire dataset, not just a sample
  • 🏃 fast-float2 crate for faster float parsing
    Casting string/bytes to float is now much faster (2 to 8x faster than Rust's standard library) with fast-float2.

  • 💪 Major dependency updates including Polars 0.44.2, Luau 0.650, mlua 0.10.0 and jsonschema 0.26.1
    These core crates underpin qsv's advanced commands. Using the latest version of these crates allow qsv to stay true to its goal of being the fastest and most comprehensive data-wrangling toolkit.


Added

  • added lookup module - enabling fetching and caching of reference data from remote and local files #2262
  • fetchpost: add --payload-tpl <file> and --content-type options to construct payload using MiniJinja with the appropriate content-type #2268 5921498
  • joinp: derive polars schema from stats cache 86fe22e
  • sqlp: derive polars schema from stats cache #2256
  • template: new command to render MiniJinja templates with CSV data #2267
  • validate: add dynamicEnum lookup support #2265
  • contrib(completions): add template command and update fetchpost by @rzmk in #2269
  • add fast-float2 dependency for faster bytes to float conversion 7590e4e 3ca30aa
  • added more benchmarks for new/updated commands f8a1d4f cd7e480

Changed

  • luau: adapt to mlua 0.10 API changes 268cb45
  • luau: refactored stage management 31ef58a
  • luau: now uses the lookup module 2f4be34
  • stats: minor perf refactoring 6cdd6ea
  • build(deps): bump actions/setup-python from 5.2.0 to 5.3.0 by @dependabot in #2243
  • build(deps): bump azure/trusted-signing-action from 0.4.0 to 0.5.0 by @dependabot in #2239
  • build(deps): bump bytes from 1.7.2 to 1.8.0 by @dependabot in #2231
  • build(deps): bump cached from 0.53.1 to 0.54.0 by @dependabot in #2272
  • build(deps): bump flexi_logger from 0.29.3 to 0.29.4 by @dependabot in #2229
  • build(deps): bump flexi_logger from 0.29.4 to 0.29.5 by @dependabot in #2261
  • build(deps): bump flexi_logger from 0.29.5 to 0.29.6 by @dependabot in #2266
  • build(deps): bump hashbrown from 0.15.0 to 0.15.1 by @dependabot in #2270
  • build(deps): bump jsonschema from 0.24.0 to 0.24.1 by @dependabot in #2234
  • build(deps): bump jsonschema from 0.24.1 to 0.24.2 by @dependabot in #2238
  • build(deps): bump jsonschema from 0.24.2 to 0.24.3 by @dependabot in #2240
  • build(deps): bump jsonschema from 0.25.0 to 0.25.1 by @dependabot in #2244
  • build(deps): bump jsonschema from 0.26.0 to 0.26.1 by @dependabot in #2260
  • build(deps): bump regex from 1.11.0 to 1.11.1 by @dependabot in #2242
  • build(deps): bump reqwest from 0.12.8 to 0.12.9 by @dependabot in #2258
  • build(deps): bump serde from 1.0.210 to 1.0.211 by @dependabot in #2232
  • build(deps): bump serde from 1.0.211 to 1.0.213 by @dependabot in #2236
  • build(deps): bump serde from 1.0.213 to 1.0.214 by @dependabot in #2259
  • build(deps): bump simd-json from 0.14.1 to 0.14.2 by @dependabot in #2235
  • build(deps): bump tokio from 1.40.0 to 1.41.0 by @dependabot in #2237
  • deps: updated our fork of the csv crate with more perf optimizations eae7d76
  • deps: use calamine upstream with unreleased fixes 4cc7f37
  • deps: use our csvlens fork untl PR removing unneeded arboard features is merged bb32322
  • deps: bump jsonschema from 0.25 to 0.26 #2251
  • deps: bump embedded Luau from 0.640 to 0.650 8c54b87 aca30b0
  • deps: bump mlua from 0.9 to 0.10 #2249
  • deps: bump Polars from 0.43.1 at py-1.11.0 tag to latest 0.44.2 upstream #2255 0e40a44
  • apply select clippy lint suggestions
  • updated indirect dependencies
  • aligned Rust nightly to Polars nightly - 2024-10-28 - 245bcb5

Fixed

Removed

  • removed need to set RAYON_NUM_THREADS env var and just call the Rayon API directly aa6ef89
  • removed unneeded create_dir_all_threadsafe helper now that std::create_dir_all is threadsafe d0af83b

Full Changelog: 0.137.0...0.138.0

Don't miss a new qsv release

NewReleases is sending notifications on new releases.