github jqnatividad/qsv 0.137.0

22 hours ago

Highlights:

  • extdedup & extsort now support two modes - LINE mode and CSV mode. Previously, both commands only sorted on a line-by-line basis (LINE mode).
    With the addition of CSV mode, you can now deduplicate or sort CSV files on a column-by-column basis, with the powerful --select option to specify which columns to deduplicate or sort on.
    This is especially useful for large CSV files with many columns, where you only want to deduplicate or sort on a subset of columns. And since both commands use disk-backed algorithms (an on-disk hash table for extdedup, and an external merge sort for extsort) - they can handle files larger than memory.
  • sqlp now has a --cache-schema option that caches the schema of the input CSV file, which can significantly speed up subsequent queries on the same file.
  • fetch and fetchpost have been updated to use the jaq (a jq-like tool for parsing JSON) crate instead of the jql crate. This change was made to improve performance and to make the commands more consistent with the json command which also uses jaq. Furthermore, jaq is a clone of jq - which is widely used and has a large community, so it should be more familiar to users.
  • stats is a tad faster as we keep squeezing more performance from this central command.
  • validate is now faster and more memory efficient due to optimizations in the jsonschema crate and minor performance improvements in the validate command itself.

Added

  • extdedup: now supports two modes - LINE mode and CSV mode #2208
  • extsort: now also has two modes - CSV mode and LINE mode #2210
  • sqlp: add --cache-schema option #2224
  • added sqlp --cache-schema benchmarks

Changed

  • apply & applydp: use smallvec for operations vector & other minor performance optimizations #2219 & bc837ae
  • apply & applydp: specify min_length for parallel iterators 7d6ce5e
  • fetch & fetchpost: replace jql with jaq #2222
  • stats: performance optimizations f205809 e26c27f 4579c1b
  • validate: specify min_length for parallel iterators a5b8185
  • deps: updated polars to 0.43.1 at the py-1.10.0 tag.
  • build(deps): bump calamine from 0.26.0 to 0.26.1 by @dependabot in #2204
  • build(deps): bump csvs_convert from 0.8.14 to 0.9.0 by @dependabot in #2215
  • build(deps): bump flexi_logger from 0.29.2 to 0.29.3 by @dependabot in #2209
  • build(deps): bump jsonschema from 0.23.0 to 0.24.0 by @dependabot in #2223
  • build(deps): bump pyo3 from 0.22.3 to 0.22.4 by @dependabot in #2207
  • build(deps): bump pyo3 from 0.22.4 to 0.22.5 by @dependabot in #2212
  • build(deps): bump redis from 0.27.3 to 0.27.4 by @dependabot in #2202
  • build(deps): bump redis from 0.27.4 to 0.27.5 by @dependabot in #2217
  • build(deps): bump serde_json from 1.0.129 to 1.0.130 by @dependabot in #2218
  • build(deps): bump serde_json from 1.0.131 to 1.0.132 by @dependabot in #2220
  • build(deps): bump uuid from 1.10.0 to 1.11.0 by @dependabot in #2213
  • apply select clippy lints
  • bumped indirect dependencies
  • bumped MSRV to 1.82

Fixed:

  • fix performance regression in batched commands by refactoring optimal_batch_size to require indexed CSV files #2206

Removed:

  • fetch & fetchpost: removed jql options; replaced with jaq #2222

Full Changelog: 0.136.0...0.137.0

Don't miss a new qsv release

NewReleases is sending notifications on new releases.