Highlights:
extdedup
&extsort
now support two modes - LINE mode and CSV mode. Previously, both commands only sorted on a line-by-line basis (LINE mode).
With the addition of CSV mode, you can now deduplicate or sort CSV files on a column-by-column basis, with the powerful--select
option to specify which columns to deduplicate or sort on.
This is especially useful for large CSV files with many columns, where you only want to deduplicate or sort on a subset of columns. And since both commands use disk-backed algorithms (an on-disk hash table forextdedup
, and an external merge sort forextsort
) - they can handle files larger than memory.sqlp
now has a--cache-schema
option that caches the schema of the input CSV file, which can significantly speed up subsequent queries on the same file.fetch
andfetchpost
have been updated to use thejaq
crate instead of thejql
crate. This change was made to improve performance and to make the commands consistent with thejson
command which also usesjaq
. Furthermore,jaq
is a clone of jq - a widely used JSON parsing tool, so it should be more familiar to users.stats
is a tad faster as we keep squeezing more performance from this central command.validate
is now faster and more memory efficient due to optimizations in thejsonschema
crate and minor performance improvements in thevalidate
command itself.
Added
extdedup
: now supports two modes - LINE mode and CSV mode #2208extsort
: now also has two modes - CSV mode and LINE mode #2210sqlp
: add--cache-schema
option #2224- added
sqlp --cache-schema
benchmarks
Changed
apply
&applydp
: use smallvec for operations vector & other minor performance optimizations #2219 & bc837aeapply
&applydp
: specify min_length for parallel iterators 7d6ce5efetch
&fetchpost
: replace jql with jaq #2222stats
: performance optimizations f205809 e26c27f 4579c1bvalidate
: specify min_length for parallel iterators a5b8185deps
: updated polars to 0.43.1 at the py-1.10.0 tag.- build(deps): bump calamine from 0.26.0 to 0.26.1 by @dependabot in #2204
- build(deps): bump csvs_convert from 0.8.14 to 0.9.0 by @dependabot in #2215
- build(deps): bump flexi_logger from 0.29.2 to 0.29.3 by @dependabot in #2209
- build(deps): bump jsonschema from 0.23.0 to 0.24.0 by @dependabot in #2223
- build(deps): bump pyo3 from 0.22.3 to 0.22.4 by @dependabot in #2207
- build(deps): bump pyo3 from 0.22.4 to 0.22.5 by @dependabot in #2212
- build(deps): bump redis from 0.27.3 to 0.27.4 by @dependabot in #2202
- build(deps): bump redis from 0.27.4 to 0.27.5 by @dependabot in #2217
- build(deps): bump serde_json from 1.0.129 to 1.0.130 by @dependabot in #2218
- build(deps): bump serde_json from 1.0.131 to 1.0.132 by @dependabot in #2220
- build(deps): bump uuid from 1.10.0 to 1.11.0 by @dependabot in #2213
- apply select clippy lints
- bumped indirect dependencies
- bumped MSRV to 1.82
Fixed:
- fix performance regression in batched commands by refactoring
optimal_batch_size
to require indexed CSV files #2206
Removed:
fetch
&fetchpost
: removed jql options; replaced with jaq #2222
Full Changelog: 0.136.0...0.137.0