github jqnatividad/qsv 0.131.0

latest releases: 0.138.0, 0.137.0, 0.136.0...
3 months ago

Highlights

  • Refactored frequency to make it smarter and faster.
    frequency's core algorithm essentially compiles an in-memory hashmap to determine the frequency of each unique value for each column. It does this using multi-threaded, multi-I/O techniques to make it blazing fast.
    However, for columns with ALL unique values (e.g. ID columns), this takes a comparatively long time and consumes a lot of memory as it essentially compiles a hashmap of the ENTIRE column, with a hashmap entry for each column value with a count of 1.
    Now, with the new --stats-mode option (enabled by default), frequency can compile the dataset in a more intelligent way by looking up a column's cardinality in the stats cache.
    If the cardinality of a column is equal to the CSV's rowcount (indicating a column with ALL unique values), it short-circuits frequency calculations for that column - dramatically reducing the time and memory requirements for the ID column as it eliminates the need to maintain a hashmap for it.
    Practically speaking, this makes frequency able to handle "real-world" datasets of any size.
    To ensure frequency is as fast as possible, be sure to index and compute stats for your datasets beforehand.
  • Setting the stage for Datapusher+ v1 and...
    The "itches we've been scratching" the past few months have been informed by our work at several clients towards the release of Datapusher+ 1.0 and qsv pro 1.0 (more info below) - both targeted for release this month.
    DP+ is our third-gen, high-speed data ingestion/registration tool for CKAN that uses qsv as its data wrangling/analysis engine. It will enable us to reinvent the way data is ingested into CKAN - with exponentially faster data ingestion, metadata inferencing, data validation, computed metadata fields, and more!
    We're particularly excited how qsv will allow us to compute and infer high-quality metadata for datasets (with a focus on inferring optional recommended DCAT-US v3 metadata fields) in "near real-time", while dataset publishers are still entering metadata. This will be a game-changer for CKAN administrators and data publishers!
  • ...qsv pro 1.0
    qsv pro is datHere's enterprise-grade data wrangling/curation workbench that’s planned for v1.0 release this month.
    Building the core functionality of qsv pro's Workflow feature is one of the primary reasons for a v1.0 release.
    We feel qsv pro may be a game-changer for data wranglers and data curators who need to work with spreadsheets and large datasets to view statistical data and metadata while also performing complex data wrangling operations in a user-friendly way without having to write code.

Added

  • docs: added Shell Completion section 556a2ff
  • docs: add 🪄 emoji in legend to indicate "automagical" commands 2753c90
  • Add building deb package (WIP) by @tino097 in #2029
  • Added GitHub workflow to test debian package (WIP) by @tino097 in #2032
  • tests: added false positive to _typos.toml configuration d576af2
  • added more benchmarks
  • added more tests

Changed

  • fetch & fetchpost: remove expired diskcache entries on startup 9b6ab5d
  • frequency: smarter frequency compilation with new --stats-mode option #2030
  • json: refactored for maintainability & performance 62e9216 and 4e44b18
  • improved self-update messages 5c874e0 and 0aa0b13
  • contrib(completions): frequency updates & remove bashly/fish by @rzmk in #2031
  • Debian package update by @tino097 in #2017
  • publish: optimized enabled CPU features when building release binaries in all GitHub Actions "publishing" workflows
  • publish: ensure latest Python patch release is used when building qsvpy binary variants 2ab03a0 and ec6f486
  • tests: also enabled CPU features in CI tests
  • docs: wordsmith qsv "elevator pitch" cc47fe6
  • docs: point to https://100.dathere.com in Whirlwind tour fc49aef
  • deps: bump polars to latest upstream post py-1.41.1 release at the time of this release
  • build(deps): bump bytes from 1.6.1 to 1.7.0 by @dependabot in #2018
  • build(deps): bump bytes from 1.7.0 to 1.7.1 by @dependabot in #2021
  • build(deps): bump flate2 from 1.0.30 to 1.0.31 by @dependabot in #2027
  • build(deps): bump indexmap from 2.2.6 to 2.3.0 by @dependabot in #2020
  • build(deps): bump jaq-parse from 1.0.2 to 1.0.3 by @dependabot in #2016
  • build(deps): bump redis from 0.26.0 to 0.26.1 by @dependabot in #2023
  • build(deps): bump regex from 1.10.5 to 1.10.6 by @dependabot in #2025
  • build(deps): bump serde_json from 1.0.121 to 1.0.122 by @dependabot in #2022
  • build(deps): bump sysinfo from 0.30.13 to 0.31.0 by @dependabot in #2019
  • build(deps): bump sysinfo from 0.31.0 to 0.31.2 by @dependabot in #2024
  • build(deps): bump tempfile from 3.11.0 to 3.12.0 by @dependabot in #2033
  • build(deps): bump serde from 1.0.204 to 1.0.205 by @dependabot in #2036
  • apply select clippy suggestions
  • updated several indirect dependencies
  • made various usage text improvements
  • bumped MSRV to 1.80.1

Fixed

Removed

  • docs: "Quicksilver" is the name of the logo horse, not how you pronounce "qsv" e4551ae

New Contributors

Full Changelog: 0.130.0...0.131.0

Don't miss a new qsv release

NewReleases is sending notifications on new releases.