github jqnatividad/qsv 0.126.0

latest release: 0.127.0
13 days ago

🤖 Expanded Metadata Inferencing 🤖

describegpt headlines this release, with its new ability to support other local Large Language Models (LLMs) using popular tools that serve them through APIs such as Ollama and Jan. This broadens the tool's utility in diverse AI environments. Beyond OpenAI, qsv can now use other popular LLMs like Llama 3, Mistral, and Gemma. It also unlocks expanded metadata inferencing capabilities in qsv pro.

Several commands got additional options: cat with --no-headers support in the rowskey subcommand; excel with new options like --error-format and short --metadata mode; and foreach with a --dry-run option. frequency also got new options, including --unq-limit for limiting unique counts, support for negative limits, and a --lmt-threshold option for compiling comprehensive frequencies below a threshold. slice now supports negative indices and new JSON output options, providing more flexibility in data slicing.

This is all rounded out with sqlp improvements, including support for single-line comments in SQL scripts and a special SKIP_INPUT value to skip input preprocessing when using table functions directly in Polars SQL (e.g. read_csv() and read_parquet()) - all while increasing performance thanks to the Polars engine being upgraded to 0.39.2.


New Features

  • cat: Added --no-headers support to the rowskey subcommand.
  • describegpt: Added compatibility for other local Large Language Models (LLMs) such as Ollama and Jan, broadening the tool's utility in diverse AI environments.
  • excel: Introduced new options in the excel command: --error-format for better error handling and a short --metadata JSON mode.
  • foreach: added a --dry-run option, allowing users to preview the results of scripts without executing them.
  • frequency: New options added such as --unq-limit for limiting unique counts; support for negative limits to only show frequencies >= abs(negative limit); and a --lmt-threshold option to allow the compilation of comprehensive frequencies below the threshold - all providing more detailed control over frequency analysis.
  • slice: Support for negative indices to slice from the end and new JSON output options.
  • sqlp: sqlp now supports single-line comments and includes a special SKIP_INPUT value for more efficient data loading. The Polars engine has also been upgraded to 0.39.2, providing enhanced performance and stability.

Changes and Optimizations

  • Performance Enhancements: Microoptimizations in datefmt and validate commands, and increased default length for --infer-len in sqlp for improved performance.
  • Dependency Updates: Numerous updates including bumping Luau, jql-runner, pyo3, and other dependencies to enhance stability and security.
  • Benchmarks Added: New performance benchmarks for sqlp vs duckdb added to ensure there are no performance regressions between releases. Right now, sqlp is faster than duckdb in most cases (thanks to Polars - see the latest TPC-H benchmarks), but we want to make sure that we keep it that way.

Security and Robustness

  • Security Fixes: Updated rustls to fix a specific CVE, and other minor fixes to enhance the security and robustness of network and data processing features.
  • Bug Fixes: Various bug fixes including improvements in error formatting in excel and robustness in fetch and fetchpost commands.

Added

  • cat: add --no-headers support to rowskey subcommand #1762
  • describegpt: add compatibility for other (local) LLMs (Ollama, Jan, etc.) by @rzmk in #1761
  • excel: add --error-format option #1721
  • excel: add --metadata short JSON mode #1738
  • foreach: add --dry-run option #1740
  • frequency: add --unq-limit option #1763
  • frequency: add support for negative --limits #1765
  • frequency: add --lmt-threshold option #1766
  • slice: add support for negative --index option values #1726
  • slice: implement --json output option #1729
  • sqlp: added support for single-line comments in SQL scripts bb52bce
  • sqlp: added SKIP_INPUT special value to short-circuit input processing if the user wants to
    load input files directly using table functions (e.g. read_csv(), read_parquet(), etc.) fe850ad
  • validate: add --valid-output option #1730
  • contrib: add sample Bashly completions implementation by @rzmk in #1731
  • benchmarks: added sqlp vs duckdb benchmarks.

Changed

  • datefmt: microoptimize formatting 0ee27e7
  • joinp: adapt to breaking change in Polars 0.39 for lazyframe sort c625ca9
  • sqlp: change --infer-len option default from 250 to 1000 for increased performance da1d215
  • validate: microoptimize to_json_instance() c2e4a1c
  • bump Luau from 0.616 to 0.622 9216ec3
  • build(deps): bump jql-runner from 7.1.6 to 7.1.7 by @dependabot in #1711
  • build(deps): bump pyo3 from 0.21.0 to 0.21.1 by @dependabot in #1712
  • build(deps): bump pyo3 from 0.21.1 to 0.21.2 by @dependabot in #1750
  • build(deps): bump strsim from 0.11.0 to 0.11.1 by @dependabot in #1715
  • build(deps): bump sysinfo from 0.30.7 to 0.30.8 by @dependabot in #1716
  • build(deps): bump sysinfo from 0.30.8 to 0.30.9 by @dependabot in #1732
  • build(deps): bump sysinfo from 0.30.9 to 0.30.10 by @dependabot in #1735
  • build(deps): bump sysinfo from 0.30.10 to 0.30.11 by @dependabot in #1755
  • build(deps): bump redis from 0.25.2 to 0.25.3 by @dependabot in #1720
  • build(deps): bump mlua from 0.9.6 to 0.9.7 by @dependabot in #1724
  • build(deps): bump reqwest from 0.12.2 to 0.12.3 by @dependabot in #1725
  • build(deps): bump reqwest from 0.12.3 to 0.12.4 by @dependabot in #1759
  • build(deps): bump anyhow from 1.0.81 to 1.0.82 by @dependabot in #1733
  • build(deps): bump robinraju/release-downloader from 1.9 to 1.10 by @dependabot in #1734
  • build(deps): bump chrono from 0.4.37 to 0.4.38 by @dependabot in #1744
  • bump polars from 0.38 to 0.39 #1745
  • build(deps): bump polars from 0.39.0 to 0.39.1 by @dependabot in #1746
  • build(deps): bump polars from 0.39.1 to 0.39.2 by @dependabot in #1752
  • build(deps): bump qsv-dateparser from 0.12.0 to 0.12.1 by @dependabot in #1747
  • build(deps): bump serde_json from 1.0.115 to 1.0.116 by @dependabot in #1749
  • build(deps): bump serde from 1.0.197 to 1.0.198 by @dependabot in #1751
  • build(deps): bump rustls from 0.22.3 to 0.22.4 by @dependabot in #1758
  • build(deps): bump simple-expand-tilde from 0.1.4 to 0.1.5 by @dependabot in #1767
  • build(deps): bump serial_test from 3.0.0 to 3.1.0 by @dependabot in #1768
  • build(deps): bump actions/setup-python from 5.0.0 to 5.1.0 by @dependabot in #1769
  • applied select clippy recommendations
  • updated several indirect dependencies
  • added several benchmarks for new/changed commands
  • pin Rust nightly to 2024-04-15 - the same nightly that Polars 0.39 is pinned to
  • bumped MSRV to 1.77.2

Fixed

  • Make init_logger more robust #1717
  • count: empty CSVs count as zero also for polars. Fixes #1741 #1742
  • excel: fix #1682 by adding --error-format option #1689
  • fetch & fetchpost: more robust JSON response validation ebc7287
  • slice: use write! macro to get rid of GH Advanced Security lint c739097
  • sqlp: fixed docopt defaults that were not being parsed correctly fe850ad
  • deps: bump h2 from 0.4.3 to 0.4.4 to fix HTTP2 Continuation Flood vulnerability 6af0da2
  • deps: bump rustls from 0.22.3 to 0.22.4 to fix https://nvd.nist.gov/vuln/detail/CVE-2024-32650 #1758

Removed

  • fetch & fetch post: remove jsonxf crate; use serde_json to prettify JSON strings #1727
  • reverse: remove kludgy expansion of read/write buffers 46095cd

Full Changelog: 0.125.0...0.126.0

Don't miss a new qsv release

NewReleases is sending notifications on new releases.