github jqnatividad/qsv 0.129.0

latest releases: 0.138.0, 0.137.0, 0.136.0...
4 months ago

This release is the biggest one ever!

Packed with new features, improvements, and previews of upcoming qsv pro features, here are a few highlights:

📌 Highlights (click each dropdown for more info)

Meet @rzmk - qsv pro's software engineer now also co-maintains qsv!

@rzmk has contributed to projects in the qsv ecosystem including qsv's describegpt, prompt, json, and clipboard commands; qsv's tab completion support; qsv.dathere.com including its online configurator and benchmarks page; 100.dathere.com with its qsv lessons and exercises; and qsv pro the spreadsheet data wrangling desktop app (along with its promo site). @rzmk now also co-maintains qsv!

With @rzmk now also co-maintaining qsv, our data-wrangling portfolio's roadmap may get more intriguing as @rzmk's work on qsv pro, 100.dathere.com, and other initiatives can result in contributions to qsv as we've seen in this release. Perhaps some aims may be put towards AI; "automagical" metadata inferencing; DCAT 3; and expanded recipe support with the accelerated evolution of qsv pro as an enterprise-grade Data-Wrangling/Data Curation Workbench.

Polars v0.41.3 - numerous sqlp and joinp improvements
  • sqlp: expanded SQL support
    • Natural Join support
    • DuckDB-like COLUMNS SQL function to select columns that match a pattern
    • ORDER BY ALL support
    • Support POSTGRESQL ^@ ("starts with"), ~~,~~*,!~~,!~~* ("like", "ilike") string-matching operators
    • Support for SQL SELECT * ILIKE wildcard syntax
    • Support SQL temporal functions STRFTIME and STRPTIME
  • sqlp: added --streaming option
New command qsv prompt - Use a file dialog for qsv file input and output

Be more interactive with qsv by using a file dialog to select a file for input and output.

qsv-prompt-0.129.0-demo

Here are a few key highlights:

  • Start with qsv prompt when piping commands to provide a file as input from an open file dialog and pipe it into another command, for example: qsv prompt | qsv stats.
  • End with qsv prompt -f when piping commands to save the output to a file you choose with a save file dialog.

There are other options too, so feel free to explore more with qsv prompt --help.

This will allow you to create qsv pipelines that are more "user-friendly" and distribute them to non-technical users. It's not as flexible as qsv pro's full-blown GUI, but it's a start!

New command qsv json - Convert JSON data to CSV and optionally provide a jq-like filter

The new json command allows you to convert non-nested JSON data to CSV. If your data is not in the expected format, try using the --jaq option to provide a jq-like filter. See qsv json --help for more information and examples.

qsv-json-demo

Here are a few key highlights:

  • Specify the path to a JSON file to attempt conversion to CSV with qsv json <filepath>.
  • Attempt conversion of JSON to CSV data from stdin, for example: qsv slice <filepath.csv> --json | qsv json.
  • Write the output to a file with the --output <filepath> (or -o for short) option.
  • Use the --jaq <filter> option to try converting nested or complex JSON data into the intended format before parsing to CSV.

You may learn more by running qsv json --help.

Along with the jsonl command, we now have more options to convert JSON to CSV with qsv!

New command qsv clipboard - Provide input from your clipboard and save output to your clipboard

Provide your clipboard content using qsv clipboard and save output to your clipboard by piping into qsv clipboard --save (or -s for short).

qsv-clipboard-demo

100.dathere.com - Try out lessons and exercises with qsv from your browser!

You may run qsv commands from your browser without having to install it locally at 100.dathere.com.

Within the lesson (in-page) using Thebe In a Jupyter Lab environment
qsv Thebe demo qsv Jupyter Lab demo

Thanks to Jupyter Book, datHere has released a website available at 100.dathere.com where you may explore lessons and exercises with qsv by running them within the web page, in a Jupyter Lab environment, or locally after following the provided installation instructions. There are multiple exercises planned, but feel free to try out the first few available lessons/exercises by visiting 100.dathere.com and star the source code's repository here.

New multi-shell completions draft (bash, zsh, powershell, fish, nushell, fig, elvish)

There's a draft of more qsv shell completion support including 7 different shells! The plan is to add the rest of the commands in this implementation since we can use one codebase to generate the 7 shell completion script files. Feel free to try out the various shell completions in the examples folder from contrib/completions to verify if the examples work (as of today's release date only qsv count and qsv clipboard may be available) and also contribute to adding the rest of the completions if you know a bit of Rust.

The existing Bash shell completions for v0.129.0 and fish shell completions draft are available for now as the multi-shell completions draft is being developed.

Bash completions demo Fish completions demo
qsv Bash completions demo qsv Fish completions demo

With shell completions enabled, you may identify qsv commands more easily when pressing the tab key on your keyboard in certain positions using the relevant Bash or fish shell from your terminal. You may follow the instructions from 100.dathere.com here to learn how to install the Bash completions and under the Usage section here for fish shell completions. Note that the fish shell completions are incomplete and both of the implementations may be replaced by the multi-shell completions implementation once complete.

qsvpro.dathere.com - Preview: Download spreadsheets from a compatible CKAN instance into the qsv pro Workflow

This is a preview of a feature, meaning it is planned for an upcoming release but may change by the time it is released.

qsv-pro-ckan-download-demo

In addition to importing local spreadsheet files and uploading to a CKAN instance, this new feature allows users to select a locally registered CKAN instance where they have the create_dataset permission to download a spreadsheet file from their CKAN instance and load the new local spreadsheet file into the Workflow. qsv pro's Workflow would therefore have both upload and download capability to and from a compatible CKAN instance.

qsvpro.dathere.com - Preview: Attempt SQL query generation from natural language with a compatible LLM API instance

This is a preview of a feature, meaning it is planned for an upcoming release but may change by the time it is released.
Also note that this video is sped up as you may see by the notes that pop up (you may pause the video to read them).

qsv-pro-ask-demo.mp4

Leveraging qsv describegpt's AI integration capabilities along with multiple other qsv commands, qsv pro's Workflow's existing SQL query tab now has a generator that may attempt to generate a SQL query natural language using an LLM API compatible with OpenAI's API specification such as running an Ollama (v0.2.0 or above) server locally and attempt to generate a SQL query by asking a question related to your spreadsheet data. Results may vary depending on your configuration and you may need to fix the generated output. For example in the demo we asked for who has the highest salary but extra information and only the highest salary was provided, though this does give a query we can modify and work with.

Note on Ask and qsv describegpt

We mention attempt since LLMs can produce incorrect output, even output that seems correct but is not. We mention that "inaccurate information" may be produced within qsv describegpt's usage text too along with AI-generated output potentially being incorrect within qsv pro, so make sure the output is fixed and verified before using it in production use cases.

🔁 Changelog

Added

  • clipboard: add qsv clipboard command for clipboard input/output by @rzmk in #1953
  • describegpt: add --prompt for custom prompt & update prompt file + docs by @rzmk in #1862
  • describegpt: add base_url, model, ollama, & timeout to prompt file by @rzmk in #1859
  • enum: add --hash option to create a platform-independent deterministic id #1902
  • enum: add --uuid7 option to create UUID v7 identifiers #1914
  • freq: add --no-trim option #1944
  • foreach: add sample Windows implementation by @rzmk in #1847
  • joinp: add --right outer join option #1945
  • json: change jsonp to json using new implementation by @rzmk in #1924
  • json: add --jaq option to allow jq-like filtering & test by @rzmk in #1959
  • jsonp: add jsonp command allowing non-nested JSON to CSV conversion with Polars by @rzmk in #1880
  • prompt: add qsv prompt to pick a file with a file dialog & write to stdout by @rzmk in #1860
  • prompt: add --fd-output (-f) & --output (-o) options by @rzmk in #1861
  • select: add --sort, --random & --seed options; also add 9999 sentinel value to indicate last column #1867
  • select: use underscore char (_) to indicate last column, replacing 9999 sentinel value #1873
  • sqlp: add --streaming option e8bee9a
  • stats: add Standard Error of the Mean (SEM) & Coefficient of Variation (CV) #1857
  • validate: added custom JSONschema format "currency" (decimal with 2 decimal places). Also, added check that only ascii characters are allowed in keys in JSONschema files.
  • added --batch zero option to all commands with batch processing. This sentinel value is used to indicate that the entire input should be processed in one batch feedbda
  • added typos check to CI 9fdf066
  • contrib(fish): add fish completions prototype with qsv.fish and docs by @rzmk in #1884
  • contrib(bashly): add --hash <columns> option to enum by @rzmk in #1905
  • contrib(bashly): add --uuid4 & --uuid7 for qsv enum by @rzmk in #1915
  • contrib(bashly): remove --ollama from qsv describegpt by @rzmk in #1951
  • contrib(bashly): add --no-trim to frequency & --right to joinp by @rzmk in #1952
  • tests: add tests for 100.dathere.com/lessons/1 by @rzmk in #1876
  • tests: add test_100 for 100.dathere.com & tests for lesson/exercise 0 by @rzmk in #1848
  • docs: add 👆 emoji to indicate commands with column selector support 40ac8a7
  • Incorporate typos check in CI #1930

Changed

  • stats: made several microoptimizations to Field Data Type inferencing 3500454 f829e0c

  • select: --sort & --random options now work with the initial selection, not just the entire CSV #1875

  • contrib(bashly): update contrib/bashly/completions.bash (prep for qsv v0.129.0) by @rzmk in #1885

  • jsonp: use print! instead of println! & add House.csv + tests by @rzmk in #1897

  • docs: add column selector emoji - 👆 #1906

  • upgrade to polars 0.41.0 #1907

  • describegpt: update dotenv.template variable with QSV_LLM_APIKEY by @rzmk in #1908

  • describegpt: change min Ollama version from 0.1.49 to 0.2.0 by @rzmk in #1954

  • describegpt: add {headers} replaced by qsv slice ... --len 1 -n by @rzmk in #1941

  • validate: validating against a JSONschema requires headers #1931

  • setting --batch to 0 loads all rows at once before parallel processing #1928

  • deps: add polars timezones support #1898

  • tests: update test_100/exercise_0.rs setup file data by @rzmk in #1858

  • build(deps): bump actions/setup-python from 5.1.0 to 5.1.1 by @dependabot in #1961

  • build(deps): bump actix-web from 4.6.0 to 4.7.0 by @dependabot in #1866

  • build(deps): bump actix-web from 4.7.0 to 4.8.0 by @dependabot in #1901

  • build(deps): bump atoi_simd from 0.15.6 to 0.16.0 by @dependabot in #1844

  • build(deps): bump cached from 0.51.3 to 0.51.4 by @dependabot in #1874

  • build(deps): bump cached from 0.51.4 to 0.52.0 by @dependabot in #1938

  • build(deps): bump csvs_convert from 0.8.10 to 0.8.11 by @dependabot in #1891

  • build(deps): bump csvs_convert from 0.8.11 to 0.8.12 by @dependabot in #1948

  • build(deps): bump curve25519-dalek from 4.1.2 to 4.1.3 by @dependabot in #1893

  • build(deps): bump flexi_logger from 0.28.0 to 0.28.1 by @dependabot in #1853

  • build(deps): bump flexi_logger from 0.28.1 to 0.28.2 by @dependabot in #1868

  • build(deps): bump flexi_logger from 0.28.2 to 0.28.3 by @dependabot in #1870

  • build(deps): bump flexi_logger from 0.28.3 to 0.28.4 by @dependabot in #1881

  • build(deps): bump flexi_logger from 0.28.4 to 0.28.5 by @dependabot in #1904

  • build(deps): bump geosuggest-core from 0.6.2 to 0.6.3 by @dependabot in #1883

  • build(deps): bump geosuggest-utils from 0.6.2 to 0.6.3 by @dependabot in #1882

  • build(deps): bump jql-runner from 7.1.9 to 7.1.10 by @dependabot in #1845

  • build(deps): bump jql-runner from 7.1.10 to 7.1.11 by @dependabot in #1856

  • build(deps): bump jql-runner from 7.1.11 to 7.1.12 by @dependabot in #1903

  • build(deps): bump jql-runner from 7.1.12 to 7.1.13 by @dependabot in #1960

  • build(deps): bump log from 0.4.21 to 0.4.22 by @dependabot in #1925

  • build(deps): bump mimalloc from 0.1.42 to 0.1.43 by @dependabot in #1911

  • build(deps): bump mlua from 0.9.8 to 0.9.9 by @dependabot in #1894

  • deps: apply latest polars upstream with unreleased fixes 261ede5

  • deps: we now track py-polars release, instead of rust-polars #1854

  • deps: update polars engine to use py-polars-1.0.0-beta1 #1896

  • build(deps): bump polars from 0.41.0 to 0.41.1 by @dependabot in #1909

  • build(deps): bump polars from 0.41.1 to 0.41.2 by @dependabot in #1916

  • deps: bump polars from 0.41.2 to 0.41.3 dc0492f

  • build(deps): bump pyo3 from 0.21.2 to 0.22.0 by @dependabot in #1918

  • build(deps): bump pyo3 from 0.22.0 to 0.22.1 by @dependabot in #1950

  • build(deps): bump regex from 1.10.4 to 1.10.5 by @dependabot in #1865

  • build(deps): bump redis from 0.25.3 to 0.25.4 by @dependabot in #1846

  • build(deps): bump reqwest from 0.12.4 to 0.12.5 by @dependabot in #1889

  • build(deps): bump self_update from 0.40.0 to 0.41.0 by @dependabot in #1939

  • build(deps): bump serde from 1.0.203 to 1.0.204 by @dependabot in #1949

  • build(deps): bump serde_json from 1.0.117 to 1.0.118 by @dependabot in #1920

  • build(deps): bump serde_json from 1.0.118 to 1.0.119 by @dependabot in #1932

  • build(deps): bump serde_json from 1.0.119 to 1.0.120 by @dependabot in #1935

  • build(deps): bump simple-expand-tilde from 0.1.6 to 0.1.7 by @dependabot in #1886

  • build(deps): bump strum from 0.26.2 to 0.26.3 by @dependabot in #1913

  • build(deps): bump strum_macros from 0.26.2 to 0.26.3 by @dependabot in #1855

  • build(deps): bump strum_macros from 0.26.3 to 0.26.4 by @dependabot in #1863

  • build(deps): bump sysinfo from 0.30.12 to 0.30.13 by @dependabot in #1957

  • build(deps): bump sysinfo from 0.30.12 to 0.30.13 by @dependabot in #1965

  • build(deps): bump titlecase from 3.2.0 to 3.3.0 by @dependabot in #1963

  • build(deps): bump tokio from 1.37.0 to 1.38.0 by @dependabot in #1850

  • build(deps): bump url from 2.5.0 to 2.5.1 by @dependabot in #1869

  • build(deps): bump url from 2.5.1 to 2.5.2 by @dependabot in #1895

  • build(deps): bump uuid from 1.8.0 to 1.9.0 by @dependabot in #1912

  • build(deps): bump uuid from 1.9.0 to 1.9.1 by @dependabot in #1919

  • build(deps): bump uuid from 1.9.1 to 1.10.0 by @dependabot in #1964

  • build(deps): bump xxhash-rust from 0.8.10 to 0.8.11 by @dependabot in #1942

  • apply select clippy suggestions

  • updated several indirect dependencies

  • made various usage text improvements

  • added several benchmarks

  • pin Rust nightly to 2024-06-23

Fixed

  • frequency: fix unique identifiers column detection #1966
  • json: add empty single JSON object logic & empty tests by @rzmk in #1958
  • json: fix typo in error message by @rzmk in #1929
  • sniff: fix doc typo sanple -> sample by @rzmk in #1947
  • validate: validating with a JSONSchema requires headers 6164382
  • Fixed several typos 9fdf066

Removed

  • describegpt: remove --ollama since Ollama v0.1.49 has endpoints by @rzmk in #1946
  • json: remove necessity for polars feature & fix --list formatting by @rzmk in #1936
  • jsonp: remove jsonp command in favor of json by @rzmk in #1924
  • deps: fine tune polars features and remove explicit polars-ops dependency ccfd000

Full Changelog: 0.128.0...0.129.0


To stay updated with datHere's latest news and updates (including qsv pro, datHere's CKAN DMS, and analyze.dathere.com), subscribe to the newsletter here: dathere.com/newsletter

Don't miss a new qsv release

NewReleases is sending notifications on new releases.