github dathere/qsv 10.0.0

pre-release2 hours ago

[10.0.0] - 2025-11-23

Highlights:

  • Enhanced Data Dictionary: describegpt now features an expanded default prompt (v4.0) that generates more comprehensive data dictionaries.
  • Parallel Search/Replace Operations: search, searchset, and replace commands now support parallel execution when working with indexed CSV files, delivering significant performance improvements for large datasets.
  • Search/Replace Exact Match Options: Added --exact option to search, searchset, and replace commands for precise string matching without regex patterns.
  • Enhanced SQL Capabilities: sqlp now supports arbitrary expressions in SQL JOIN constraints, named window references, and new SQL functions including row_number, rank, dense_rank, and array_to_string.
  • Improved pivotp Performance: Updated to use Polars' new lazy pivot API with --maintain-order flag for predictable output ordering.
  • Luau 0.701: Updated embedded Luau from 0.697 to 0.701 with additional pattern matching documentation and tests.

Added

  • search & searchset: add --exact option for literal string matching #3094
  • search: parallel search when file is indexed #3096
  • searchset: parallel execution when indexed #3097
  • replace: add --exact option e73d9bf
  • replace: parallel execution when indexed #3098
  • sqlp: added support for arbitrary expressions in SQL JOIN constraints d47c44e & 0d2402b
  • sqlp: added support for row_number, rank, and dense_rank SQL window functions #3115
  • sqlp: added support for named window references #3118
  • sqlp: added support for array_to_string list evaluation 64cbf34
  • pivotp: added --maintain-order flag for predictable output ordering 02dca12
  • describegpt: default-prompt-file v4.0 with expanded Data Dictionary generation 4db0d18
  • luau: expanded documentation for string functions using pattern matching a7344e3 & 2dcc9a4
  • util::mem_file_check: added platform adjustment factor 421be84
  • benchmarks: v7.0 added search & searchset indexed parallel benchmarks 55df784
  • benchmarks: v7.1.0 added replace_indexed_parallel benchmark 05c89d8

Changed

  • describegpt: refactored for improved reliability 1433bf1 & b6190a4
  • frequency: special rank of 0 now assigned to <ALL_UNIQUE> rows effa13b
  • frequency: microoptimizations 775bb88 & 29ec7af
  • search, searchset & replace: now parallelizable with an index, with significant performance improvements 45fc83d
  • search: use faster, non-allocating par_sort_unstable_by_key for improved performance 5f50f23
  • search: optimize --quick option 1fc1b85
  • search: --preview-match option forces sequential search 017ca6f
  • search, searchset & replace: sort chunks instead of raw data for better performance 5b58cb8
  • searchset: microoptimizations for performance c4ce324
  • replace: remove unneeded index rebuild logic cfdba60
  • pivotp: refactored to adapt to Polars' new lazy pivot API #3102
  • excel: microoptimize hot loop and formula retrieval f141c1b & 17780b5
  • stats: cache repetitive expensive env_var access in hot path a6ad0ce
  • stats: multiple microoptimizations 2f41c33 & 9bf43e5 & 00958a1
  • validate: updated to jsonschema 0.37.x with improved error handling f45693d & c7ad5d2 & b9ea447
  • luau: updated embedded Luau from 0.697 to 0.701 8885dce
  • deps: bump polars to latest upstream with numerous SQL and LazyFrame improvements
  • deps: bump jsonschema from 0.34 to 0.37.1
  • deps: bump syn from 2.0.109 to 2.0.110 d207524
  • deps: bump quick-xml from 0.38.3 to 0.38.4 11a5ae4
  • deps: bump geosuggest-core from 0.8.1 to 0.8.2 baf3194
  • deps: bump geosuggest-utils from 0.8.1 to 0.8.2 c5bcd1b
  • deps: bump governor from 0.10.1 to 0.10.2 b0068ef
  • deps: bump gzp from 2.0.1 to 2.0.2 2a0b901
  • deps: bump indexmap from 2.12.0 to 2.12.1 afa9c1f
  • deps: bump mlua from 0.11.4 to 0.11.5 49eedb9
  • deps: bump signal-hook-registry from 1.4.6 to 1.4.7 5c2e705
  • deps: bump calamine to 0.32 (removed git dependency) 449f162
  • deps: bump cached to latest upstream (removed patched fork) 508d1ce
  • deps: bump actions/checkout from 5 to 6 f76e009
  • deps: removed hashbrown patched fork ad30460
  • deps: removed grex patched fork 88cd3fc
  • deps: updated Cargo.lock file multiple times with indirect dependency updates
  • docs: updated rust-version requirement to 1.91 c288d4d
  • docs: prebuilt binaries on Linux and Windows x86_64 are no longer compiled with target-cpu=native 5f892a1
  • docs: expanded note about Illegal Instruction (SIGILL) faults and portable builds e4df784
  • docs: describegpt update with expanded Data Dictionary example and link to defaults d722afd & cedcd41 & bba4f76
  • applied select clippy lint suggestions
  • bumped several indirect dependencies

Fixed

  • count: should still work with "broken" CSVs when polars feature is enabled #3104
  • describegpt: more robust SQL escaping to prevent SQL injection e958329
  • excel: formula retrieval bug on error b894515
  • excel: reverted mistaken alloc optimization for trim path b37361a
  • index: added check to confirm that only uncompressed CSV files can be indexed 1be485b
  • sqlp: unnest workaround for test compatibility 54d079b
  • sqlp: corrected array_to_string test 6c661ac
  • docs: fixed typo QSV_MEMORY_HEADROOM_PCT -> QSV_FREEMEMORY_HEADROOM_PCT f15d03e

Removed

  • deps: removed polars crates (polars-utils, polars-ops) that are no longer needed a7785f6
  • publish: removed target-cpu=native as it causes SIGILL on GitHub Action Runners fd74f8f

Full Changelog: 9.1.0...10.0.0

Don't miss a new qsv release

NewReleases is sending notifications on new releases.