The SIMD update.
Breaking
- Bumping MSRV to
1.83.0. - Dropping
xan plot -Y/--add-series. It is now possible to select multiple columns as<y>inxan plot <x> <y>instead. - Dropping the
-C/--force-colorsflag inflatten,heatmap,hist,plotandviewin favor of the more standardized and flexible--color=(auto|never|always)flag. xan joinwill now automatically drop joined columns from one the files when it is obviously safe to do so.xan behead&xan renamedo not normalize the output anymore to be as fast as possible.- The new SIMD CSV parser might not deal with CSV irregular cases the same way
rust-csvdid. In any case,xan inputwill still continue to userust-csv. xan slice -B/--byte-offset&xan slice -A/--accumulateare now mutually exclusive.xan inputhas been overhauled.- Dropping
xan count --sample-size. - Overhauling
xan fixlengthsto accept streams by shifting default from double-pass read to buffering the whole stream into memory. xan plot --x-scale log & --y-scale logare now natural log. Uselog10for the base10 log as before.- Dropping
xan reverse -m/--in-memoryflag. Behavior is now automatically detected. - Dropping
xan shuffle -m/--in-memoryflag. Loading the file into memory is now the default. Thexan shuffle -e/--externalflag has been added if
you want the old default behavior. xan binsnow outputs<empty>values instead of<nulls>.- Overhauling
xan bins. The default is now to find nice boundaries for the bins. Use-e/--exactto revert to the old behavior. The default number of bins is now10, and won't use Freedman-Diaconis rule by default. A-H/--heuristicflag has been added if you want to automatically select a suitable number of bins.
Features
- Adding
xan flatten -F/--flatter. xan pivotcan now target multiple columns.- Adding the
xan grepcommand for fast but coarse filtering. - Adding
xan search -f/--flag. - Adding
xan map -F/--filter. xan search -B/--breakdownnow consolidates the results when multiple patterns have a same name.- Adding
xan flatten --row-separator. - Adding
xan flatten --csv. - Adding
xan headers --color. - Adding the
xan join <columns> <input1> <input2>arity as a convenience when joined column names are the same in both inputs. - Adding
xan join -D/--drop-key=(none|both|left|right). - Adding
xan fuzzy-join -D/--drop-key=(none|both|left|right). - Adding
xan plot -A/--aggregate. - Adding support for plural selection clauses in both
xan select -e&xan mape.g.xan map 'full_name.split(" ") as (first_name, last_name). - Adding
xan search -P/--add-pattern. - Adding
xan groupby -M/--along-matrix. - Adding
xan groupby -T/--total. - Adding support for
.ndjson&.jsonlfiles. Those are considered as headless TSV files with null byte quoting so you can easily use them withxancommands. - Adding out-of-the-box support for
.vcf,.sam,.bed,.gtf&.gff2files. - Adding a
xan cat colsalias toxan cat columns. - Adding
zstdsupport. - Adding
earliest&latestmoonblade functions. - Adding
xan dedup -f/--flag. - Adding
-kshort flag forxan dedup --keep-duplicates, and-Cshort flag forxan dedup --choose. - Adding
xan fixlengths -H/--trust-header. - Adding
xan separate. - Adding full log scale support to
xan plot. - Adding
xan hist --scale. xan windowis now able to run total aggregations.- Adding
thousands_sep,commaandsignificancekwargs tonumfmtmoonblade function.
Fixes
- Fixing
xan dedup --checkbug where the first record was ignored. - Fixing
xan hist -Dwhen a same date is found multiple times. - Fixing
xan from -f xlsdatetime conversion. - Fixing
xan flatten&xan viewwhen column names contain line breaks. - Fixing invalid argument parsing error being printed to stdout instead of stderr.
- Fixing
xan progressSIGINT corrupting output. - Fixing
xan enum -A/--accumulate. - Fixing
xan from -f tarwhen tarball archive is not gzipped. - Fixing
min&maxmoonblade function when passing a list of numbers. - Fixing
xan flatten -Hedge cases. - Fixing commands requiring seekable streams accepting unindexed compressed files by error.
- Fixing
xan plot --count --y-scale log.
Performance
- Wildly improving performance of most of
xancommands by leveraging a novel SIMD CSV parser/writer. - Improving performance of
xan from -f txt&xan from -f npy. - Improving memory footprint of hash-based commands (e.g.
frequency,groupby,dedupetc.). - Improving performance of
xan progress,xan range,xan enum,xan behead,xan rename.
Quality of Life
xan parallel catnow flushing more consistently.- Better highlighting of problematic strings in
xan flatten,xan view&xan headers. xan parallelwill now generally stop as soon as an error is detected in a subprocess and cleanly report errors.- Better argv parsing error UX in general.
- The
-pflag will now avoid going further than 16 to avoid issues on server with many CPUs where hogging the resources is an issue and where using too much threads at once could hurt performance. The-tflag remain available to tweak the number of threads. xan histwill now dim bars having a0count so you can easily distinguish them from non-empty bars.