github dathere/qsv 7.1.0

one day ago

[7.1.0] - 2025-09-06

🇮🇹 csv,conf,v9 edition 🍝

   
csvconfv9-flavor-small Just in time for csv,conf,v9, we're Bologna-bound and will be talking all things qsv, CSV, open data, metadata standards, AI, POSE and CKAN!

For this feature release, we polished describegpt a bit more for the occassion...

Towards the "People's API!"! Verso l'API del Popolo!
(Answering People/Policymaker Interface)

🚀 Enhanced describegpt Command

  • Configurable Frequency Limits: Make frequency distribution limit configurable for better control over data analysis
  • Few-shot Learning: Add --fewshot-examples option to improve LLM response quality with contextual examples
  • Advanced SQL Generation: Fine-tuned SQL generation guidance for better date handling and query optimization
  • Conditional SQL Results: Implement conditional --sql-results format for more efficient "SQL RAG" processing - i.e. if the generated SQL query executes successfully - the results are saved to the specified file with a .csv extension. If a "SQL hallucination" fails, the file is saved with a .sql extension instead for the user to tweak and edit.
  • TogetherAI Support: Add support for TogetherAI models endpoint, expanding LLM provider options
  • Enhanced Error Handling: Improved SQL parsing error handling and more informative error messages
  • Disk Cache by Default: The disk cache is now enabled by default for better performance
  • TOML Configuration: Migrate from JSON to more readable TOML format for more easily modifiable prompt files.
    (see https://github.com/dathere/qsv/blob/master/resources/describegpt_defaults.toml)
  • Better Local LLM Support: --api-key can now be set to NONE for local LLM configurations that may not necessarily run on localhost (e.g. a shared Local LLM service running on the local network)

partition Command Enhancements

  • New --limit Option: Implement --limit option to set the maximum number of open files
  • Streaming to Enhanced Batching Logic: Convert from streaming to a simplified, two-pass batched approach designed to partition on columns with high cardinality for very large datasets

Added

  • describegpt: add configurable frequency limit #2950
  • describegpt: migrate prompt file from JSON to more easier to edit TOML format #2954
  • describegpt: refactor default prompt file; add --fewshot-examples option #2955
  • describegpt: add TogetherAI support for models endpoint #2965
  • partition: add --limit option #2960
  • added Windows ARM64 prebuilt binaries

Changed

  • describegpt: enable disk cache by default #2951
  • describegpt: Polars SQL generation tweaks #2958
  • python: replace deprecated with_gil with attach #2949. This sets the stage for "free-threaded" Python 3.14 support when its released in October 2025. Buh-bye GIL!
  • deps: bump embedded Luau from 0.688 to 0.690 #2967
  • deps: bump Polars to 0.50.0 at py-1.33.0 tag
  • build(deps): bump actions/setup-python from 5.6.0 to 6.0.0 by @dependabot[bot] in #2962
  • build(deps): bump actions/stale from 9 to 10 by @dependabot[bot] in #2963
  • build(deps): bump log from 0.4.27 to 0.4.28 by @dependabot[bot] in #2961
  • build(deps): bump mlua from 0.11.2 to 0.11.3 by @dependabot[bot] in #2948
  • build(deps): bump pyo3 from 0.25.1 to 0.26.0 by @dependabot[bot] in #2946
  • build(deps): bump uuid from 1.18.0 to 1.18.1 by @dependabot[bot] in #2956
  • build(deps): bump zip from 4.5.0 to 4.6.0 by @dependabot[bot] in #2952
  • applied select clippy lints
  • updated indirect dependencies

Full Changelog: 7.0.1...7.1.0

Don't miss a new qsv release

NewReleases is sending notifications on new releases.