Datapusher+ "Speed of Insight" Release! 🚀🚀🚀
This release is all about speed, speed, speed! We've made qsv even faster by leveraging Polars' multithreaded, mem-mapped CSV reader to get near-instant row counts of large CSV files, and near instant SQL queries and aggregations with Datapusher+ - automagically inferring metadata and giving you quick insights into your data in seconds!
We're demoing our qsv-powered Datapusher+ at the March 2024 installment of CKAN Montly Live on March 20, 2024, 13:00-14:00 UTC. Join us!
Beyond pushing data reliably at speed into your CKAN Datastore (it pushes real good! 😉), DP+ does some extended analysis, processing and enrichment of the data so it can be readily Used.
Both fetch
and fetchpost
commands now also have a --disk-cache
option and are fully synched - forming the foundation for high-speed data enrichment from Web Services - including datHere's forthcoming, fully-integrated Data Enrichment Service.
🏇🏽 Hi-ho Quicksilver, away! 🏇🏽
Added
count
: automatically use Polars multithreaded, mem-mapped CSV reader whenpolars
feature is enabled to get near-instant row counts of large CSV files even without an index #1656qsvdp
: added polars support to Datapusher+-optimized binary variant, so we can do near instant SQL queries and aggregations during DP+ processing #1664fetchpost
: added--disk-cache
options and synced usage options withfetch
#1671- extended
.infile-list
to skip empty and commented lines, and to validate file paths
20a45c8 and
2650930
Changed
sqlp
: automatically disableread_csv()
fast path optimization when a custom delimiter is specified #1648- refactored util::count_rows() helper to also use polars if available 1e09e17 and 8d321fe
- publish: updated Windows MSI publish GH Action workflow to use Wix 3.14 from 3.11 75894ef
- deps: bump polars from 0.38.1 to 0.38.2 5faf90e
- deps: update Luau from 0.614 to 0.616 eb197fe and 52331da
- build(deps): bump sysinfo from 0.30.6 to 0.30.7 by @dependabot in #1650
- build(deps): bump chrono from 0.4.34 to 0.4.35 by @dependabot in #1651
- build(deps): bump strum from 0.26.1 to 0.26.2 by @dependabot in #1658
- build(deps): bump qsv-stats from 0.12.0 to 0.13.0 by @dependabot in #1663
- build(deps): bump anyhow from 1.0.80 to 1.0.81 by @dependabot in #1662
- build(deps): bump reqwest from 0.11.25 to 0.11.26 by @dependabot in #1667
- applied select clippy recommendations
- updated several indirect dependencies
- added several benchmarks for new/changed commands
Fixed
dedup
: fixed #1665 dedup not handling numeric values properly by adding a --numeric option #1666joinp
: reenable join validation tests now that Polars 0.38.2 join validation is working again 5faf90e and fcfc75bcount
: broken in unreleased 0.124.0. Polars-powered count require a "clean" CSV file as it infers the schema based on the first 1000 rows of a CSV. This will sometimes result in an invalid "error" (e.g. it infers a column is a number column, when its not). 0.124.1 fixes this by adding a fallback to the "regular" CSV reader if a Polars error occurs a2c0869
Removed
gender_guesser
0.2.0 has been released. Remove patch.crates-io entry
97873a5
Full Changelog: 0.123.0...0.124.1