github jqnatividad/qsv 0.124.1

latest releases: 0.138.0, 0.137.0, 0.136.0...
8 months ago

Datapusher+ "Speed of Insight" Release! 🚀🚀🚀

This release is all about speed, speed, speed! We've made qsv even faster by leveraging Polars' multithreaded, mem-mapped CSV reader to get near-instant row counts of large CSV files, and near instant SQL queries and aggregations with Datapusher+ - automagically inferring metadata and giving you quick insights into your data in seconds!

We're demoing our qsv-powered Datapusher+ at the March 2024 installment of CKAN Montly Live on March 20, 2024, 13:00-14:00 UTC. Join us!

Beyond pushing data reliably at speed into your CKAN Datastore (it pushes real good! 😉), DP+ does some extended analysis, processing and enrichment of the data so it can be readily Used.

Both fetch and fetchpost commands now also have a --disk-cache option and are fully synched - forming the foundation for high-speed data enrichment from Web Services - including datHere's forthcoming, fully-integrated Data Enrichment Service.

🏇🏽 Hi-ho Quicksilver, away! 🏇🏽


Added

  • count: automatically use Polars multithreaded, mem-mapped CSV reader when polars feature is enabled to get near-instant row counts of large CSV files even without an index #1656
  • qsvdp: added polars support to Datapusher+-optimized binary variant, so we can do near instant SQL queries and aggregations during DP+ processing #1664
  • fetchpost: added --disk-cache options and synced usage options with fetch #1671
  • extended .infile-list to skip empty and commented lines, and to validate file paths
    20a45c8 and
    2650930

Changed

  • sqlp: automatically disable read_csv() fast path optimization when a custom delimiter is specified #1648
  • refactored util::count_rows() helper to also use polars if available 1e09e17 and 8d321fe
  • publish: updated Windows MSI publish GH Action workflow to use Wix 3.14 from 3.11 75894ef
  • deps: bump polars from 0.38.1 to 0.38.2 5faf90e
  • deps: update Luau from 0.614 to 0.616 eb197fe and 52331da
  • build(deps): bump sysinfo from 0.30.6 to 0.30.7 by @dependabot in #1650
  • build(deps): bump chrono from 0.4.34 to 0.4.35 by @dependabot in #1651
  • build(deps): bump strum from 0.26.1 to 0.26.2 by @dependabot in #1658
  • build(deps): bump qsv-stats from 0.12.0 to 0.13.0 by @dependabot in #1663
  • build(deps): bump anyhow from 1.0.80 to 1.0.81 by @dependabot in #1662
  • build(deps): bump reqwest from 0.11.25 to 0.11.26 by @dependabot in #1667
  • applied select clippy recommendations
  • updated several indirect dependencies
  • added several benchmarks for new/changed commands

Fixed

  • dedup: fixed #1665 dedup not handling numeric values properly by adding a --numeric option #1666
  • joinp: reenable join validation tests now that Polars 0.38.2 join validation is working again 5faf90e and fcfc75b
  • count: broken in unreleased 0.124.0. Polars-powered count require a "clean" CSV file as it infers the schema based on the first 1000 rows of a CSV. This will sometimes result in an invalid "error" (e.g. it infers a column is a number column, when its not). 0.124.1 fixes this by adding a fallback to the "regular" CSV reader if a Polars error occurs a2c0869

Removed

  • gender_guesser 0.2.0 has been released. Remove patch.crates-io entry
    97873a5

Full Changelog: 0.123.0...0.124.1

Don't miss a new qsv release

NewReleases is sending notifications on new releases.