🤖 Expanded Metadata Inferencing 🤖
describegpt
headlines this release, with its new ability to support other local Large Language Models (LLMs) using popular tools that serve them through APIs such as Ollama and Jan. This broadens the tool's utility in diverse AI environments. Beyond OpenAI, qsv can now use other popular LLMs like Llama 3, Mistral, and Gemma. It also unlocks expanded metadata inferencing capabilities in qsv pro.
Several commands got additional options: cat
with --no-headers
support in the rowskey
subcommand; excel
with new options like --error-format
and short --metadata
mode; and foreach
with a --dry-run
option. frequency
also got new options, including --unq-limit
for limiting unique counts, support for negative limits, and a --lmt-threshold
option for compiling comprehensive frequencies below a threshold. slice
now supports negative indices and new JSON output options, providing more flexibility in data slicing.
This is all rounded out with sqlp
improvements, including support for single-line comments in SQL scripts and a special SKIP_INPUT
value to skip input preprocessing when using table functions directly in Polars SQL (e.g. read_csv()
and read_parquet()
) - all while increasing performance thanks to the Polars engine being upgraded to 0.39.2.
New Features
cat
: Added--no-headers
support to therowskey
subcommand.describegpt
: Added compatibility for other local Large Language Models (LLMs) such as Ollama and Jan, broadening the tool's utility in diverse AI environments.excel
: Introduced new options in the excel command:--error-format
for better error handling and a short--metadata
JSON mode.foreach
: added a--dry-run
option, allowing users to preview the results of scripts without executing them.frequency
: New options added such as--unq-limit
for limiting unique counts; support for negative limits to only show frequencies >= abs(negative limit); and a--lmt-threshold
option to allow the compilation of comprehensive frequencies below the threshold - all providing more detailed control over frequency analysis.slice
: Support for negative indices to slice from the end and new JSON output options.sqlp
: sqlp now supports single-line comments and includes a special SKIP_INPUT value for more efficient data loading. The Polars engine has also been upgraded to 0.39.2, providing enhanced performance and stability.
Changes and Optimizations
- Performance Enhancements: Microoptimizations in
datefmt
andvalidate
commands, and increased default length for--infer-len
insqlp
for improved performance. - Dependency Updates: Numerous updates including bumping Luau, jql-runner, pyo3, and other dependencies to enhance stability and security.
- Benchmarks Added: New performance benchmarks for
sqlp
vs duckdb added to ensure there are no performance regressions between releases. Right now,sqlp
is faster thanduckdb
in most cases (thanks to Polars - see the latest TPC-H benchmarks), but we want to make sure that we keep it that way.
Security and Robustness
- Security Fixes: Updated rustls to fix a specific CVE, and other minor fixes to enhance the security and robustness of network and data processing features.
- Bug Fixes: Various bug fixes including improvements in error formatting in excel and robustness in fetch and fetchpost commands.
Added
cat
: add--no-headers
support to rowskey subcommand #1762describegpt
: add compatibility for other (local) LLMs (Ollama, Jan, etc.) by @rzmk in #1761excel
: add--error-format
option #1721excel
: add--metadata
short JSON mode #1738foreach
: add--dry-run
option #1740frequency
: add--unq-limit
option #1763frequency
: add support for negative--limit
s #1765frequency
: add--lmt-threshold
option #1766slice
: add support for negative--index
option values #1726slice
: implement--json
output option #1729sqlp
: added support for single-line comments in SQL scripts bb52bcesqlp
: added SKIP_INPUT special value to short-circuit input processing if the user wants to
load input files directly using table functions (e.g. read_csv(), read_parquet(), etc.) fe850advalidate
: add--valid-output
option #1730- contrib: add sample Bashly completions implementation by @rzmk in #1731
benchmarks
: addedsqlp
vsduckdb
benchmarks.
Changed
datefmt
: microoptimize formatting 0ee27e7joinp
: adapt to breaking change in Polars 0.39 for lazyframe sort c625ca9sqlp
: change--infer-len
option default from 250 to 1000 for increased performance da1d215validate
: microoptimizeto_json_instance()
c2e4a1c- bump Luau from 0.616 to 0.622 9216ec3
- build(deps): bump jql-runner from 7.1.6 to 7.1.7 by @dependabot in #1711
- build(deps): bump pyo3 from 0.21.0 to 0.21.1 by @dependabot in #1712
- build(deps): bump pyo3 from 0.21.1 to 0.21.2 by @dependabot in #1750
- build(deps): bump strsim from 0.11.0 to 0.11.1 by @dependabot in #1715
- build(deps): bump sysinfo from 0.30.7 to 0.30.8 by @dependabot in #1716
- build(deps): bump sysinfo from 0.30.8 to 0.30.9 by @dependabot in #1732
- build(deps): bump sysinfo from 0.30.9 to 0.30.10 by @dependabot in #1735
- build(deps): bump sysinfo from 0.30.10 to 0.30.11 by @dependabot in #1755
- build(deps): bump redis from 0.25.2 to 0.25.3 by @dependabot in #1720
- build(deps): bump mlua from 0.9.6 to 0.9.7 by @dependabot in #1724
- build(deps): bump reqwest from 0.12.2 to 0.12.3 by @dependabot in #1725
- build(deps): bump reqwest from 0.12.3 to 0.12.4 by @dependabot in #1759
- build(deps): bump anyhow from 1.0.81 to 1.0.82 by @dependabot in #1733
- build(deps): bump robinraju/release-downloader from 1.9 to 1.10 by @dependabot in #1734
- build(deps): bump chrono from 0.4.37 to 0.4.38 by @dependabot in #1744
- bump polars from 0.38 to 0.39 #1745
- build(deps): bump polars from 0.39.0 to 0.39.1 by @dependabot in #1746
- build(deps): bump polars from 0.39.1 to 0.39.2 by @dependabot in #1752
- build(deps): bump qsv-dateparser from 0.12.0 to 0.12.1 by @dependabot in #1747
- build(deps): bump serde_json from 1.0.115 to 1.0.116 by @dependabot in #1749
- build(deps): bump serde from 1.0.197 to 1.0.198 by @dependabot in #1751
- build(deps): bump rustls from 0.22.3 to 0.22.4 by @dependabot in #1758
- build(deps): bump simple-expand-tilde from 0.1.4 to 0.1.5 by @dependabot in #1767
- build(deps): bump serial_test from 3.0.0 to 3.1.0 by @dependabot in #1768
- build(deps): bump actions/setup-python from 5.0.0 to 5.1.0 by @dependabot in #1769
- applied select clippy recommendations
- updated several indirect dependencies
- added several benchmarks for new/changed commands
- pin Rust nightly to 2024-04-15 - the same nightly that Polars 0.39 is pinned to
- bumped MSRV to 1.77.2
Fixed
- Make init_logger more robust #1717
count
: empty CSVs count as zero also for polars. Fixes #1741 #1742excel
: fix #1682 by adding--error-format
option #1689fetch
&fetchpost
: more robust JSON response validation ebc7287slice
: usewrite!
macro to get rid of GH Advanced Security lint c739097sqlp
: fixed docopt defaults that were not being parsed correctly fe850addeps
: bump h2 from 0.4.3 to 0.4.4 to fix HTTP2 Continuation Flood vulnerability 6af0da2deps
: bump rustls from 0.22.3 to 0.22.4 to fix https://nvd.nist.gov/vuln/detail/CVE-2024-32650 #1758
Removed
fetch
&fetch post
: remove jsonxf crate; use serde_json to prettify JSON strings #1727reverse
: remove kludgy expansion of read/write buffers 46095cd
Full Changelog: 0.125.0...0.126.0