github dathere/qsv 16.0.0

one day ago

[16.0.0] - 2026-02-08 🤖 "The AI-Native Release" 🤖

This release makes qsv deeply AI-native — from smarter date detection that flows through to Polars schemas, to a MCP Plugin layer that lets AI agents wield qsv as a first-class data tool.

Claude Desktop, Code, and Cowork users can now use qsv's powerful data-wrangling capabilities directly within their AI workflows, with intelligent guidance and seamless integration. Google Gemini is now also supported thanks to @kulnor.


🌟 Major Features

Smarter Date/DateTime Detection

qsv can now automatically detect date and datetime columns and carry that knowledge through the entire pipeline:

  • stats --dates-whitelist sniff is now the default — qsv sniffs the first 1000 rows to identify date/datetime field candidates for further guaranteed date/datetime type inferencing
  • schema auto-detects Date/DateTime columns when generating Polars schemas (.pschema.json)
  • DateTime type support in Polars schema parsing — temporal types are preserved through sqlp, joinp, and Parquet conversion

Hardened Stats Cache

The stats cache system that accelerates frequency, schema, tojsonl, sqlp, joinp, pivotp, diff, and sample is now more robust:

  • Simplified API: Removed dataset_stats from get_stats_records(), streamlining all downstream consumers
  • Safe fallback: Corrupted or unparsable cache files are gracefully handled instead of erroring out
  • Auto-regeneration: Stats cache regenerates on parse error rather than failing

Enhanced MCP Server (16.0.0)

The qsv MCP Server receives its largest update yet — see MCP CHANGELOG for full details.


Breaking Changes

  1. diff command: --force option removed
    • Was used for short-circuiting diffs based on dataset_stats
    • No longer needed after stats cache API simplification
  2. to command: parquet subcommand removed
    • Use dedicated qsv_to_parquet MCP tool or sqlp for Parquet output

Added

  • feat: stats — add 'sniff' support for --dates-whitelist
  • feat: schema — auto-detect Date/DateTime columns for Polars schema via sniff
  • feat: Support DateTime type in Polars schema parsing

Changed

  • refactor: stats — make --dates-whitelist sniff the default
  • perf: Use foldhash HashMap/HashSet across codebase for faster hashing
    • Replaces std::collections with foldhash in 14 modules
    • foldhash is much faster than std::collections for non-crypto hashing
  • refactor: stats Remove dataset_stats from stats cache system
    • Simplified get_stats_records() API
    • Centralized rowcount handling in sample command
    • Adapted diff, pivotp, sample, and other commands to new API
  • refactor: stats Stats cache now regenerates on parse error (improved robustness)
  • refactor: stats Safe fallback on corrupted stats cache
  • refactor: pivotp use sparsity for suggestions and uniqueness_ratio for pivot heuristics
  • refactor: sample lazily compute row_count only for sampling methods that need it
  • deps: bump async-compression to 0.4.39
  • deps: bump bytes from 1.11.0 to 1.11.1
  • deps: bump calamine to 0.33
  • deps: bump csv-nose from 0.7.0 to 0.8.0
  • deps: bump csvlens to latest upstream (PR merged)
  • deps: bump geosuggest to latest upstream
  • deps: bump flate2 from 1.1.8 to 1.1.9
  • deps: bump jsonschema from 0.40.0 to 0.41 (latest upstream with unreleased perf improvements)
  • deps: bump polars from 0.52.0 at py-1.38.1 tag to 0.53
  • deps: bump pyo3 from 0.27.2 to 0.28.0
  • deps: bump redis from 1.0.2 to 1.0.3
  • deps: bump regex from 1.12.2 to 1.12.3
  • deps: bump reqwest from 0.13.1 to 0.13.2
  • deps: bump zerocopy from 0.8.35 to 0.8.36
  • deps: bump zip from 6 to 7
  • deps: bump zmij from 1.0.17 to 1.0.20
  • deps: we now bundle Luau 0.708 from 0.706
  • deps: bump @modelcontextprotocol/sdk (MCP)
  • applied several clippy lint suggestions
  • applied several GH Copilot and Claude review suggestions

Fixed

  • fix: frequency column selection when using --select option in different order
    • Now lookup cardinality by column name instead of index
    • Handles user-selected/reordered column subsets correctly
  • fix: sample handle missing min weight in stats cache
  • fix: validate adapt tests to jsonschema 0.40.2 error message format changes
  • fix: joinp switch pschema serialization to serde_json for compound type support
  • fix: excel adjust jsonl path usage caused by calamine 0.33 release
  • fix: stats return sentinel when sniff finds no date columns
  • fix: configQSV_NO_HEADERS environment variable being ignored; split no_headers into explicit setter and CLI flag method

Removed

  • removed to parquet subcommand in favor of dedicated qsv_to_parquet MCP tool and sqlp Parquet output support
  • removed cargo install instructions from README as qsv is rarely cargo installable as it uses patched forks on a regular basis and cargo install doesn't support git dependencies.

Full Changelog: 15.0.1...16.0.0

Don't miss a new qsv release

NewReleases is sending notifications on new releases.