[20.0.0] - 2026-05-03 π§Ή The "Spring Cleaning" Release π±
Using Claude Code, roborev, Serena, Context7 and GitHub Copilot orchestrated using an adversarial review workflow β we systematically audited every command for correctness, safety, and performance. Over the past four weeks, we did the first end-to-end pass over the qsv codebase. The result is the largest correctness-and-safety sweep in qsv's history: ALL commands were touched by review-driven cleanups, with dozens of latent bugs, panic paths, and performance cliffs swept out, while adding more than 250 new tests across the board.
This is a major version bump because that sweep also surfaced four user-visible behaviors that were demonstrably wrong and could not be fixed without breaking compatibility:
safenamesverify-mode now correctly counts duplicate-suffix renames as unsafe (previously under-reported).enum --hashis now collision-resistant across multi-column inputs (previously ["ab","c"] and ["a","bc"] hashed identically).excel --metadata csvcolumn ordering now actually matches its header row (previously the type, visible, and headers columns held each other's values).util::safe_header_namesnow enforces its 60-char cap in bytes end-to-end (previously chars-based, allowing UTF-8 names up to 240 bytes β past Postgres' NAMEDATALEN).
Plus a few smaller but breaking corrections: headers --intersect is renamed to --union (the flag never computed an intersection), luau qsv_loadcsv headersvare now 1-indexed per Lua convention, and MSRV is bumped to Rust 1.95.
Beyond the cleanup, this release adds one new top-level command:
- NEW
implodecommand: the inverse ofexplode. Groups rows by key column(s) and joins a value column into a single delimited string per group β useful for collapsing normalized output back into compact form.
And a notable performance win:
frequency: parallel tree-reduce of partial frequency tables delivers a ~1.3x speedup on multi-core machines. Smaller per-command perf wins also landed infill(+22%),datefmt(+9%),cat,dedup,replace,search/searchset, andtranspose.
Detailed MCP Server and Cowork Plugin changes are documented in the MCP Server/Cowork Plugin CHANGELOG.
Important
This is a major release with breaking changes. Pipelines that consume qsv excel --metadata csv by column position, store qsv enum --hash digests across versions, parse qsv safenames verify-mode output, or invoke qsv headers --intersect will need updates. See the Changed and Removed sections below for migration notes.
Added
implode: new command β inverse ofexplode#3733 (closes #917)generators: mark required options in help markdown and MCP skills #3734sortcheck: add--numericand--naturalflags; allocation-free streaming loop #3756exclude: add stdin support and memcheck #3749
Changed
- BREAKING
excel:--metadata csvcolumn ordering fortype,visible, andheadersis corrected. Previously the CSV header row declaredtype, visible, headersbut the data rows pushed values in the orderheaders, typ, visible, so under each named column the wrong values appeared (thetypecolumn held the headers list,visibleheld the type, andheadersheld the visibility). The CSV output now matches the--metadata json(SheetMetadatastruct) field order:index, sheet_name, type, visible, headers, column_count, β¦. Pipelines that consumedqsv excel --metadata csvand indexed by column position must shift those three columns; consumers that indexed by header name see corrected values automatically. - BREAKING
enum:--hashdigest values change. The hashed input now carries au64length prefix per field (to fix the multi-column collision bug above), so every--hashdigest differs from earlier qsv versions β single-column hashes change identity values too, and stored hashes from earlier qsv versions will not match. Same input still hashes deterministically across rows and runs in β₯ this version. - BREAKING
luau:qsv_loadcsvnow returns the headers table 1-indexed (per Lua convention). Scripts that accessedheaders[0]or iteratedfor i = 0, #headers - 1must shift toheaders[1]andfor i = 1, #headers(oripairs(headers)). Previouslyheaders[1]returned the second header. - BREAKING
headers: rename--intersectto--union. The flag has always computed a deduplicated union of headers across inputs, not a true set intersection β the name was a long-standing misnomer.--intersectis removed entirely (no alias) given the surrounding breaking-change window. Migration: replaceqsv headers --intersect β¦withqsv headers --union β¦; output is unchanged. - BREAKING
safenames: verify-mode (--mode v / V / j / J) outputs change. (1) Verify counts now include header positions that would be renamed by the duplicate-suffix pass β inputs containing duplicate column names will report higher unsafe counts than earlier qsv versions; the count now matches what--mode awould actually rewrite. (2)--mode V / j / Jdisplays unsafe-header strings with leading/trailing whitespace and surrounding"already trimmed (matching what the safe-rename pass actually evaluates), andduplicate_headersis now sorted alphabetically rather than appearing in undefined HashMap iteration order. Pipelines that parsed verbose/JSON output and depended on the old ordering or untrimmed strings must update. - BREAKING
util::safe_header_names: the 60-length cap is now enforced in bytes on the final name, including any duplicate-disambiguation suffix. Previously the truncation was chars-based (take(60).sum()) and only applied to the base, so non-ASCII headers could produce up to ~240-byte names and duplicate-disambiguated headers added_<n>after truncation, pushing past Postgres'NAMEDATALEN(63 bytes). Now the rewrite path lowercases and prepends the leading-_prefix before truncating, then snaps to a UTF-8 char boundary at β€60 bytes. ASCII-only inputs see the same output as before for non-suffixed cases. Long ASCII headers that previously generated 61β63-char suffixed variants will be 1β2 chars shorter at the boundary. Headers containing multibyte UTF-8 (CJK, accented chars, emoji) that previously produced names >60 bytes will now be aggressively trimmed to fit. Affects every caller (safenames,applydp,apply,fetch,python); stored mappings keyed on the old over-long forms will not match. - BREAKING MSRV bumped to Rust 1.95
describegpt: splitprocess_phase_outputinto per-branch helpers (dictionary context-only, full dictionary, JSON, TSV, TOON, Markdown). No behavior change β same output, smaller functions.luau:qsv_coalescenow stringifies non-string values (numbers and booleans render viato_string; nil / arrays / objects are skipped). Previously, numbers and booleans were silently treated as missing values viaas_str().unwrap_or_default(). Scripts relying onqsv_coalesce(some_bool, fallback)to skip booleans will now return"true"/"false"for the boolean.describegpt: per-phase helper split, widened cache key, ~21% LOC reduction #3720 #3721 #3722frequency: parallel tree-reduce of partial FTables (~1.3x speedup) #3728moarstats: collapse duplicated outlier bivariate scan; safety/perf cleanup, unit tests #3718 #3719validate: usecold_hint(stabilized in Rust 1.95) #3717; correctness, perf cleanup #3743 #3779frequency: correctness, perf, refactor cleanup #3745apply: review-driven cleanup, perf #3741template: subdir bug fix, lookup perf, render-error visibility, helper extraction #3740dedup: allocation-free ignore-case #3754datefmt: ~9% perf #3753fill: ~22% faster hot path #3762replace: streaming parallel write; dead match-flag tracking #3777search/searchset: parallel memory streaming;--quickfixes; USAGE alignment #3776cat: rowskey speedup #3750transpose: correctness, perf cleanup, polish #3781cleanup: renamefail_oom_clierror; surfacegeocodeupdate-check error #3806- applied select clippy lints
Fixed
excel: review-driven cleanup ofsrc/cmd/excel.rsβ fix four correctness bugs. (1) Negative--sheetindices that overshot the sheet count silently selected a wrong sheet because theabs_diffclamp "bounced" past zero (e.g.--sheet -4on a 3-sheet workbook returned the 2nd sheet); now errors withusage error: negative sheet index N is out of range for K sheets. (2)get_requested_rangelower-cased the sheet name into the caller'ssheetvariable and never restored it, so--range 'Sheet2!A1:B2' --error-format formula|bothfailed because calamine'sworksheet_formulais case-sensitive, and the success message printed the wrong case; now the canonical name from the workbook's sheet list is preserved. (3) The float-to-i64 conversion guard used*float_val > i64::MAX as f64, buti64::MAX as f64rounds up pasti64::MAX, so a value of2^63slipped through, saturating-cast toi64::MAX, and was emitted as9223372036854775807(off by one). Replaced withis_finite() && val >= i64::MIN as f64 && val < i64::MAX as f64 && fract() == 0.0, with a NaN/Inf fallback toDisplay(the previous code would have hitzmij::format_finiteUB on non-finite floats). (4)--rangeagainst an empty sheet reported "larger than sheet" because the bounds check used a(0, 0)fallback that swallowed the empty-sheet case; now reports "sheet is empty" distinctly. Also: hoistedto_lowercase()allocations out of the table-name and named-range search loops, fixed two USAGE typos (3ndβ3rd, stray paren in the negative-index example), removed a stray blank line inNamesMetadata's derive, and dropped the now-unusedcmpimport.enum: review-driven cleanup ofsrc/cmd/enumerate.rsβ fix silent multi-column hash collision and lossy UTF-8 hashing. Previously--hashover multiple columns concatenated the UTF-8 strings of each field into a single buffer before hashing, so distinct rows like["ab","c"]and["a","bc"]flattened to"abc"and hashed identically; non-UTF-8 fields were silently replaced with the empty string viaunwrap_or_default(), masking different invalid byte sequences. Switch to streamingXxh3over raw bytes with a per-field length prefix, eliminating both bugs and removing per-rowStringallocation. Also fix a degenerate-selection bug where a--hashselector resolving to only the existinghashcolumn (e.g.--hash hashonname,hashheaders) silently re-included thehashcolumn β the post-filter parse/selection round-trip turned the empty selection into "all columns"; now constructed viaSelection::from_indicesand rejected with a clear error. Also: collapse three redundantselect()calls in hash setup, avoid per-rowByteRecordreallocation when an existinghashcolumn is dropped, fix docstring mode count (was "four", now "six") and duplicated3.numbering, and addenumerate_hash_no_concat_collisionregression test.util: review-driven hardening ofsrc/util.rsβ defendprocess_inputagainst zip-slip, fix panics inoptimal_batch_size,safe_header_names(whitespace-only headers),qsv_check_for_update(empty release list), andformat_systemtime(pre-1970 timestamps); drop UB-proneunsafe { get_unchecked }fromwrite_json/csv_to_jsonl(this also fixes a long-standing trailing-comma bug when records were shorter than headers); log warnings (including the input path) when row counting hits a CSV read error instead of silently undercounting. Behavior change:download_filenow propagatestimeout_secsvalidation errors instead of silently coercing a bad timeout to 30s β passing--timeout 0or--timeout> 3600 to a downloading command will now error.get_envvar_flagnow trims whitespace, acceptson/off, and warns at most once per unrecognized (key, value) pair.describegpt: widen cache key to cover all template-affecting flags (--tag-vocab,--num-tags,--enum-threshold,--sample-size,--fewshot-examples, theQSV_DUCKDB_PATHtoggle, and the generated Data Dictionary). Previously, changing any of these between runs silently returned stale cached output. First run after upgrade will re-invoke the LLM once per phase as prior cache entries no longer match.luau: fix stale per-column globals when a row has fewer fields than headers (in both sequential and random-access modes); fix_LASTROW/_ROWCOUNTmath under--no-headersin random-access mode (was off-by-one); fix_LASTROWunderflow on empty input; fixqsv_loadcsvswallowing CSV parse errors; boundqsv_lag/qsv_diffhistory withVecDequeto prevent unbounded memory growth on large CSVs; rejectlag/periods <= 0with clear errors; short-circuit MAIN on empty random-access input so END can still run.headers: review-driven cleanup ofsrc/cmd/headers.rs. The docopt USAGE for--intersectclaimed it computed a set intersection, but the implementation (and the test fixtures) have always emitted a deduplicated union across inputs β see the BREAKING--unionrename in Changed. Tightened the multi-input path: now actually setsflag_just_nameswhen more than one input is given (previously the docopt promised this but the code only suppressed the index prefix), removing a redundantnum_inputs == 1guard and the wastedTabWriterwrap on that path.--trimnow operates on raw bytes (noString::from_utf8_lossyround-trip, so non-UTF-8 header bytes pass through untouched) and additionally strips leading/trailing tab characters in addition to spaces and double-quotes.safenames: review-driven cleanup ofsrc/cmd/safenames.rsβ fix two correctness bugs in verify modes (--mode v / V / j / J). (1) Headers that the rename pass changed only via the duplicate-suffix step (e.g.col1, col1βcol1, col1_2) were counted as safe because the verify loop checked membership against the rewritten list rather than comparing positionally; the count now agrees with always-mode'schanged_countfor the same input. See BREAKING note in Changed for output impact. (2) The verify loop iterated the original (untrimmed, quote-included) headers but compared against the quote-and-space-trimmed list that was actually passed tosafe_header_names, so a literally-quoted header like"col"was wrongly flagged unsafe; now compares trimmed-to-trimmed. Also: replaced twosimd_json::to_string{,_pretty}(...).unwrap()calls with?propagation (a serialize failure now surfaces a CLI error instead of panicking); collapsed the verify-mode body into a single pass (replaces O(NΒ²)Vec::containslookups + fourStringallocations per header with aHashSetdedup andentry().or_insert(0)count update); sortedduplicate_headersfor deterministic output (previouslyfoldhash::HashMapiteration order leaked into stderr/JSON, forcing tests to accept two orderings); writesafe_headersdirectly in the always/conditional path instead of clearing aStringRecordand re-pushing each field; rewrote the--modedocopt block to make case sensitivity explicit (the parser keys off only the first character, with case-sensitivev/Vandj/Jdistinctions previously buried) and dropped a misleading "verify does not count quoted identifiers as unsafe" note that contradicted the actual implementation and the in-tree example output. Tightened theSome(reserved_names_vec).as_ref()call toSome(&reserved_names_vec).util:safe_header_namesnow enforces the length limit in bytes (β€60, snapped to a UTF-8 char boundary) end-to-end, including any disambiguation suffix. Previously the truncation was a hybrid:is_safe_namerejected names >60 bytes, but the rewrite path usedchars().map(char::len_utf8).take(60).sum()(chars-based, up to 240 bytes for 4-byte UTF-8) and the duplicate-suffix step appended_2/_3after truncation, so a non-ASCII or duplicate-disambiguated header could land at 62β240+ bytes β past the documented bound and Postgres' defaultNAMEDATALENof 63 bytes. The function now lowercases and prepends the leading-_prefix before truncation (case-folding can change byte length, prefixing adds bytes) and snaps the truncation to a char boundary viastr::floor_char_boundary. Affects every caller (safenames,applydp,apply,fetch,python); see BREAKING note in Changed.apply: malformed CSV, init bugs, doc typos, perf #3741blake3: check-mode interop fixes, parser polish #3782cat:--no-headersfix #3750clipboard: surface error details, use?for clipboard ops #3783color: drop dead clones, mark in-memory #3785config: HumanCount overflow, env-var unwrap panics, sniff UTF-8 panic #3770count: quoting fix, stdin temp-file leak #3751datefmt: strict tz/flag validation #3753dedup: edge-case fixes #3754diff: surface builder error, dedupe index/colname parsing #3784edit:--in-place, bounds checks, silent no-ops #3786explode: validate separator and column selection #3787extdedup: key-collision and dupes-writer issues #3759extsort: CRLF off-by-one fix (lineβrecord) #3790; error propagation, temp-file handle release #3789fetch/fetchpost: cache, panic, safety fixes #3747fixlengths:--remove-emptycrash on flexible rows; widen insert arithmetic #3764fmt:--no-final-newlinebugs, tempfile leak #3767foreach: dry-run truncation, panic, multi-column drop, child-failure propagation #3757geocode: remove latent panics, fix FIPS JSON shape, perf #3739geoconvert: lat/lon swap, tempfile leak, UTF-8 panic #3768input: panic, validation, clarity fixes #3791join: unify key transform; fix silent--keys-outputdrop #3769joinp: correctness, validation, schema-handling #3731json: preserve BigInt precision, surfacejaqruntime errors #3794; clarify--jaqnumeric precision in USAGE; defer BigIntto_stringallocation #3795jsonl: honor--ignore-errorsfor header inference; fix line-number reporting #3796lens: fix--streaming-stdin; reject invalid--wrap-mode#3797lookup: harden cache, download errors, CKAN auth handling #3803luau: correctness and consistency #3742moarstats: Atkinson re-population bug; harden test coverage #3799partition: UTF-8 panic, O(N) collision check #3771pivotp: correctness, clarity, cleanup #3732pragmastat: Windows backup path; suppress meaningless date ratios #3805prompt: stream file I/O; avoid unnecessary clone #3798pseudo: reject--increment 0; preserve last-valid-counter row on overflow #3792py: hoist Python module setup; jagged-row panic #3758reverse: avoid u64 underflow on indexed reverse #3808sample: streaming bernoulli header bug; dead retry loop; cluster pre-alloc #3774; add integration tests for streaming Bernoulli URL path #3775schema: correctness, panic-safety #3746scoresql: USING panic, string-literal handling, filter heuristic + tests #3810select: review-driven cleanup, fix/panic, quoted-name CSV-escape, empty-name silent fall-through #3772;--sortround-trip (quotes, non-UTF-8, dup names) #3773slice: panic and underflow fixes #3748snappy: preserve validate error; guard decompress ratio on stdin #3809sort:--numeric --natural --uniqueconsistency #3755split: correctness fixes, error propagation, tests #3780; Windows--filterquoted-arg corruption fix viaraw_arg#3788sqlp: word-boundary alias replacement #3730stats: cache & boolean-pattern fixes #3744; close cache short-circuit gaps for select/round/typesonly/infer-boolean #3800tojsonl: guard non-finite Number; hoist header escaping; drop unusedunused_assignmentsallows #3807
Removed
- BREAKING
headers: removed--intersectflag (use--unioninstead) #3763
Dependencies
- Bump polars to 0.53.0 (multiple bumps; latest tracks py-1.40.1)
- Bump
mluato 0.12.0-rc.1; Luau from 709 to 716 - Bump
zipfrom 7 to 8 - Switch csv crate to qsv-tuned fork (replaces ryu with zmij)
- Bump
jsonschemafrom 0.46.1 to 0.46.4 #3723 #3778 #3804 - Bump
mimallocfrom 0.1.49 to 0.1.50 #3729 - Bump
rayonfrom 1.11.0 to 1.12.0 #3710 - Bump
tokiofrom 1.51.1 to 1.52.0 #3712 - Bump
libcfrom 0.2.184 to 0.2.186 #3709 #3736 - Bump
reqwestfrom 0.13.2 to 0.13.3 #3766 - Bump
blake3from 1.8.4 to 1.8.5 #3738 - Bump
magikafrom 1.0.1 to 1.1.0 #3737 - Bump
robinraju/release-downloaderfrom 1.12 to 1.13 #3726 - Bump
qsv-statsfrom 0.49.0 to 0.50.0 #3727 - Bump
qsv_docoptfrom 1.9.0 to 1.10.0 #3724 - Update
self_updateto latest upstream (qsv PR merged) - Update
geosuggestto 0.8.3 - Update
csvs_convert - Removed
kiddopatch fork now that 0.5.3 is released with our PR merged
Full Changelog: 19.1.0...20.0.0