This release is the biggest one ever!
Packed with new features, improvements, and previews of upcoming qsv pro features, here are a few highlights:
📌 Highlights (click each dropdown for more info)
Meet @rzmk - qsv pro's software engineer now also co-maintains qsv!
@rzmk has contributed to projects in the qsv ecosystem including qsv's describegpt
, prompt
, json
, and clipboard
commands; qsv's tab completion support; qsv.dathere.com including its online configurator and benchmarks page; 100.dathere.com with its qsv lessons and exercises; and qsv pro the spreadsheet data wrangling desktop app (along with its promo site). @rzmk now also co-maintains qsv!
With @rzmk now also co-maintaining qsv, our data-wrangling portfolio's roadmap may get more intriguing as @rzmk's work on qsv pro, 100.dathere.com, and other initiatives can result in contributions to qsv as we've seen in this release. Perhaps some aims may be put towards AI; "automagical" metadata inferencing; DCAT 3; and expanded recipe support with the accelerated evolution of qsv pro as an enterprise-grade Data-Wrangling/Data Curation Workbench.
Polars v0.41.3 - numerous sqlp
and joinp
improvements
sqlp
: expanded SQL support- Natural Join support
- DuckDB-like
COLUMNS
SQL function to select columns that match a pattern - ORDER BY ALL support
- Support POSTGRESQL
^@
("starts with"),~~
,~~*
,!~~
,!~~*
("like", "ilike") string-matching operators - Support for SQL
SELECT * ILIKE
wildcard syntax - Support SQL temporal functions
STRFTIME
andSTRPTIME
sqlp
: added--streaming
option
New command qsv prompt
- Use a file dialog for qsv file input and output
Be more interactive with qsv by using a file dialog to select a file for input and output.
Here are a few key highlights:
- Start with
qsv prompt
when piping commands to provide a file as input from an open file dialog and pipe it into another command, for example:qsv prompt | qsv stats
. - End with
qsv prompt -f
when piping commands to save the output to a file you choose with a save file dialog.
There are other options too, so feel free to explore more with qsv prompt --help
.
This will allow you to create qsv pipelines that are more "user-friendly" and distribute them to non-technical users. It's not as flexible as qsv pro's full-blown GUI, but it's a start!
New command qsv json
- Convert JSON data to CSV and optionally provide a jq-like filter
The new json
command allows you to convert non-nested JSON data to CSV. If your data is not in the expected format, try using the --jaq
option to provide a jq-like filter. See qsv json --help
for more information and examples.
Here are a few key highlights:
- Specify the path to a JSON file to attempt conversion to CSV with
qsv json <filepath>
. - Attempt conversion of JSON to CSV data from
stdin
, for example:qsv slice <filepath.csv> --json | qsv json
. - Write the output to a file with the
--output <filepath>
(or-o
for short) option. - Use the
--jaq <filter>
option to try converting nested or complex JSON data into the intended format before parsing to CSV.
You may learn more by running qsv json --help
.
Along with the jsonl
command, we now have more options to convert JSON to CSV with qsv!
New command qsv clipboard
- Provide input from your clipboard and save output to your clipboard
Provide your clipboard content using qsv clipboard
and save output to your clipboard by piping into qsv clipboard --save
(or -s
for short).
100.dathere.com - Try out lessons and exercises with qsv from your browser!
You may run qsv commands from your browser without having to install it locally at 100.dathere.com.
Within the lesson (in-page) using Thebe | In a Jupyter Lab environment |
---|---|
Thanks to Jupyter Book, datHere has released a website available at 100.dathere.com where you may explore lessons and exercises with qsv by running them within the web page, in a Jupyter Lab environment, or locally after following the provided installation instructions. There are multiple exercises planned, but feel free to try out the first few available lessons/exercises by visiting 100.dathere.com and star the source code's repository here.
New multi-shell completions draft (bash, zsh, powershell, fish, nushell, fig, elvish)
There's a draft of more qsv shell completion support including 7 different shells! The plan is to add the rest of the commands in this implementation since we can use one codebase to generate the 7 shell completion script files. Feel free to try out the various shell completions in the examples
folder from contrib/completions
to verify if the examples work (as of today's release date only qsv count
and qsv clipboard
may be available) and also contribute to adding the rest of the completions if you know a bit of Rust.
The existing Bash shell completions for v0.129.0 and fish shell completions draft are available for now as the multi-shell completions draft is being developed.
Bash completions demo | Fish completions demo |
---|---|
With shell completions enabled, you may identify qsv commands more easily when pressing the tab
key on your keyboard in certain positions using the relevant Bash or fish shell from your terminal. You may follow the instructions from 100.dathere.com here to learn how to install the Bash completions and under the Usage section here for fish shell completions. Note that the fish shell completions are incomplete and both of the implementations may be replaced by the multi-shell completions implementation once complete.
qsvpro.dathere.com - Preview: Download spreadsheets from a compatible CKAN instance into the qsv pro Workflow
This is a preview of a feature, meaning it is planned for an upcoming release but may change by the time it is released.
In addition to importing local spreadsheet files and uploading to a CKAN instance, this new feature allows users to select a locally registered CKAN instance where they have the create_dataset
permission to download a spreadsheet file from their CKAN instance and load the new local spreadsheet file into the Workflow. qsv pro's Workflow would therefore have both upload and download capability to and from a compatible CKAN instance.
qsvpro.dathere.com - Preview: Attempt SQL query generation from natural language with a compatible LLM API instance
This is a preview of a feature, meaning it is planned for an upcoming release but may change by the time it is released.
Also note that this video is sped up as you may see by the notes that pop up (you may pause the video to read them).
qsv-pro-ask-demo.mp4
Leveraging We mention attempt since LLMs can produce incorrect output, even output that seems correct but is not. We mention that "inaccurate information" may be produced within qsv describegpt
's AI integration capabilities along with multiple other qsv commands, qsv pro's Workflow's existing SQL query tab now has a generator that may attempt to generate a SQL query natural language using an LLM API compatible with OpenAI's API specification such as running an Ollama (v0.2.0 or above) server locally and attempt to generate a SQL query by asking a question related to your spreadsheet data. Results may vary depending on your configuration and you may need to fix the generated output. For example in the demo we asked for who has the highest salary but extra information and only the highest salary was provided, though this does give a query we can modify and work with.
Note on Ask and
qsv describegpt
qsv describegpt
's usage text too along with AI-generated output potentially being incorrect within qsv pro, so make sure the output is fixed and verified before using it in production use cases.
🔁 Changelog
Added
clipboard
: addqsv clipboard
command for clipboard input/output by @rzmk in #1953describegpt
: add--prompt
for custom prompt & update prompt file + docs by @rzmk in #1862describegpt
: add base_url, model, ollama, & timeout to prompt file by @rzmk in #1859enum
: add--hash
option to create a platform-independent deterministic id #1902enum
: add--uuid7
option to create UUID v7 identifiers #1914freq
: add--no-trim
option #1944foreach
: add sample Windows implementation by @rzmk in #1847joinp
: add--right
outer join option #1945json
: change jsonp to json using new implementation by @rzmk in #1924json
: add--jaq
option to allow jq-like filtering & test by @rzmk in #1959jsonp
: addjsonp
command allowing non-nested JSON to CSV conversion with Polars by @rzmk in #1880prompt
: addqsv prompt
to pick a file with a file dialog & write to stdout by @rzmk in #1860prompt
: add--fd-output
(-f
) &--output
(-o
) options by @rzmk in #1861select
: add--sort
,--random
&--seed
options; also add 9999 sentinel value to indicate last column #1867select
: use underscore char (_) to indicate last column, replacing 9999 sentinel value #1873sqlp
: add--streaming
option e8bee9astats
: add Standard Error of the Mean (SEM) & Coefficient of Variation (CV) #1857validate
: added custom JSONschema format "currency" (decimal with 2 decimal places). Also, added check that only ascii characters are allowed in keys in JSONschema files.- added
--batch
zero option to all commands with batch processing. This sentinel value is used to indicate that the entire input should be processed in one batch feedbda - added typos check to CI 9fdf066
contrib(fish)
: add fish completions prototype withqsv.fish
and docs by @rzmk in #1884- contrib(bashly): add
--hash <columns>
option toenum
by @rzmk in #1905 - contrib(bashly): add
--uuid4
&--uuid7
forqsv enum
by @rzmk in #1915 contrib(bashly)
: remove--ollama
fromqsv describegpt
by @rzmk in #1951contrib(bashly)
: add--no-trim
tofrequency
&--right
tojoinp
by @rzmk in #1952tests
: add tests for 100.dathere.com/lessons/1 by @rzmk in #1876tests
: add test_100 for 100.dathere.com & tests for lesson/exercise 0 by @rzmk in #1848docs
: add 👆 emoji to indicate commands with column selector support 40ac8a7- Incorporate typos check in CI #1930
Changed
-
stats
: made several microoptimizations to Field Data Type inferencing 3500454 f829e0c -
select
:--sort
&--random
options now work with the initial selection, not just the entire CSV #1875 -
contrib(bashly)
: updatecontrib/bashly/completions.bash
(prep for qsv v0.129.0) by @rzmk in #1885 -
jsonp
: useprint!
instead ofprintln!
& addHouse.csv
+ tests by @rzmk in #1897 -
docs
: add column selector emoji - 👆 #1906 -
upgrade to polars 0.41.0 #1907
-
describegpt
: updatedotenv.template
variable withQSV_LLM_APIKEY
by @rzmk in #1908 -
describegpt
: change min Ollama version from 0.1.49 to 0.2.0 by @rzmk in #1954 -
describegpt
: add{headers}
replaced byqsv slice ... --len 1 -n
by @rzmk in #1941 -
validate
: validating against a JSONschema requires headers #1931 -
setting
--batch
to 0 loads all rows at once before parallel processing #1928 -
deps
: add polars timezones support #1898 -
tests
: updatetest_100/exercise_0.rs
setup file data by @rzmk in #1858 -
build(deps): bump actions/setup-python from 5.1.0 to 5.1.1 by @dependabot in #1961
-
build(deps): bump actix-web from 4.6.0 to 4.7.0 by @dependabot in #1866
-
build(deps): bump actix-web from 4.7.0 to 4.8.0 by @dependabot in #1901
-
build(deps): bump atoi_simd from 0.15.6 to 0.16.0 by @dependabot in #1844
-
build(deps): bump cached from 0.51.3 to 0.51.4 by @dependabot in #1874
-
build(deps): bump cached from 0.51.4 to 0.52.0 by @dependabot in #1938
-
build(deps): bump csvs_convert from 0.8.10 to 0.8.11 by @dependabot in #1891
-
build(deps): bump csvs_convert from 0.8.11 to 0.8.12 by @dependabot in #1948
-
build(deps): bump curve25519-dalek from 4.1.2 to 4.1.3 by @dependabot in #1893
-
build(deps): bump flexi_logger from 0.28.0 to 0.28.1 by @dependabot in #1853
-
build(deps): bump flexi_logger from 0.28.1 to 0.28.2 by @dependabot in #1868
-
build(deps): bump flexi_logger from 0.28.2 to 0.28.3 by @dependabot in #1870
-
build(deps): bump flexi_logger from 0.28.3 to 0.28.4 by @dependabot in #1881
-
build(deps): bump flexi_logger from 0.28.4 to 0.28.5 by @dependabot in #1904
-
build(deps): bump geosuggest-core from 0.6.2 to 0.6.3 by @dependabot in #1883
-
build(deps): bump geosuggest-utils from 0.6.2 to 0.6.3 by @dependabot in #1882
-
build(deps): bump jql-runner from 7.1.9 to 7.1.10 by @dependabot in #1845
-
build(deps): bump jql-runner from 7.1.10 to 7.1.11 by @dependabot in #1856
-
build(deps): bump jql-runner from 7.1.11 to 7.1.12 by @dependabot in #1903
-
build(deps): bump jql-runner from 7.1.12 to 7.1.13 by @dependabot in #1960
-
build(deps): bump log from 0.4.21 to 0.4.22 by @dependabot in #1925
-
build(deps): bump mimalloc from 0.1.42 to 0.1.43 by @dependabot in #1911
-
build(deps): bump mlua from 0.9.8 to 0.9.9 by @dependabot in #1894
-
deps
: apply latest polars upstream with unreleased fixes 261ede5 -
deps
: we now track py-polars release, instead of rust-polars #1854 -
deps
: update polars engine to use py-polars-1.0.0-beta1 #1896 -
build(deps): bump polars from 0.41.0 to 0.41.1 by @dependabot in #1909
-
build(deps): bump polars from 0.41.1 to 0.41.2 by @dependabot in #1916
-
deps: bump polars from 0.41.2 to 0.41.3 dc0492f
-
build(deps): bump pyo3 from 0.21.2 to 0.22.0 by @dependabot in #1918
-
build(deps): bump pyo3 from 0.22.0 to 0.22.1 by @dependabot in #1950
-
build(deps): bump regex from 1.10.4 to 1.10.5 by @dependabot in #1865
-
build(deps): bump redis from 0.25.3 to 0.25.4 by @dependabot in #1846
-
build(deps): bump reqwest from 0.12.4 to 0.12.5 by @dependabot in #1889
-
build(deps): bump self_update from 0.40.0 to 0.41.0 by @dependabot in #1939
-
build(deps): bump serde from 1.0.203 to 1.0.204 by @dependabot in #1949
-
build(deps): bump serde_json from 1.0.117 to 1.0.118 by @dependabot in #1920
-
build(deps): bump serde_json from 1.0.118 to 1.0.119 by @dependabot in #1932
-
build(deps): bump serde_json from 1.0.119 to 1.0.120 by @dependabot in #1935
-
build(deps): bump simple-expand-tilde from 0.1.6 to 0.1.7 by @dependabot in #1886
-
build(deps): bump strum from 0.26.2 to 0.26.3 by @dependabot in #1913
-
build(deps): bump strum_macros from 0.26.2 to 0.26.3 by @dependabot in #1855
-
build(deps): bump strum_macros from 0.26.3 to 0.26.4 by @dependabot in #1863
-
build(deps): bump sysinfo from 0.30.12 to 0.30.13 by @dependabot in #1957
-
build(deps): bump sysinfo from 0.30.12 to 0.30.13 by @dependabot in #1965
-
build(deps): bump titlecase from 3.2.0 to 3.3.0 by @dependabot in #1963
-
build(deps): bump tokio from 1.37.0 to 1.38.0 by @dependabot in #1850
-
build(deps): bump url from 2.5.0 to 2.5.1 by @dependabot in #1869
-
build(deps): bump url from 2.5.1 to 2.5.2 by @dependabot in #1895
-
build(deps): bump uuid from 1.8.0 to 1.9.0 by @dependabot in #1912
-
build(deps): bump uuid from 1.9.0 to 1.9.1 by @dependabot in #1919
-
build(deps): bump uuid from 1.9.1 to 1.10.0 by @dependabot in #1964
-
build(deps): bump xxhash-rust from 0.8.10 to 0.8.11 by @dependabot in #1942
-
apply select clippy suggestions
-
updated several indirect dependencies
-
made various usage text improvements
-
added several benchmarks
-
pin Rust nightly to 2024-06-23
Fixed
frequency
: fix unique identifiers column detection #1966json
: add empty single JSON object logic & empty tests by @rzmk in #1958json
: fix typo in error message by @rzmk in #1929sniff
: fix doc typosanple
->sample
by @rzmk in #1947validate
: validating with a JSONSchema requires headers 6164382- Fixed several typos 9fdf066
Removed
describegpt
: remove--ollama
since Ollama v0.1.49 has endpoints by @rzmk in #1946json
: remove necessity forpolars
feature & fix--list
formatting by @rzmk in #1936jsonp
: removejsonp
command in favor ofjson
by @rzmk in #1924deps
: fine tune polars features and remove explicit polars-ops dependency ccfd000
Full Changelog: 0.128.0...0.129.0
To stay updated with datHere's latest news and updates (including qsv pro, datHere's CKAN DMS, and analyze.dathere.com), subscribe to the newsletter here: dathere.com/newsletter