github medialab/xan 0.51.0
v0.51.0

latest releases: 0.52.0, 0.52.0-rc.4, 0.52.0-rc.3...
2 months ago

The parallel update.

Breaking

  • Dropping undocumented xan index and related interactions (in xan count, xan sample, xan slice & xan split --jobs).
  • Dropping now useless coalesce moonblade function.
  • xan split now accepts its output directory as an optional flag.
  • xan partition now accepts its output directory as an optional flag.
  • xan split -s becomes xan split -S to avoid confusion with the -s/--select flag used everywhere else.
  • Dropping useless xan count --csv flag.
  • Dropping xan freq -t/--threshold. Use xan freq | xan filter 'count >= n' instead.
  • Adding xan slice -I/--indices taking care of xan slice -i polymorphism taking multiple indices before.
  • xan parallel freq now follows xan freq behavior regarding limits.
  • Dropping xan url-join & xan regex-join. Both commands have been merged into a new xan fuzzy-join command using the -u/--url-prefix & -r/--regex flags respectively.
  • xan from --sheet becomes --sheet-name and is no longer the default. --sheet-index 0 becomes the default.
  • Dropping xan foreach. It is not distinctive enough as you can use xan map for the same purpose and get useful information about the results of evaluated side effects or write to /dev/null.
  • Renaming xan agg --cols to xan agg --along-rows.
  • Changing cell placeholder to anonymous _ value in xan agg -R/--along-rows.
  • Dropping moonblade commands -E/--errors flags. A lot has changed since they were created. They will be reevaluated in the future if required. You can rely on the try & warn moonblade functions instead, for now.
  • Dropping xan select -A/--append. Latest xan map is now actually equivalent to xan select -eA.
  • Changing xan map to accept a selection expression able to create multiple columns at once rather than a single expression and a column name. This means xan map 'expr' col_name becomes xan map 'expr as col_name'.

Features

  • Adding xan count -a/--approx.
  • Adding xan slice --end-byte.
  • Adding xan slice -S/--start-condition & xan slice -E/--end-condition.
  • Adding xan slice -L/--last.
  • Allowing -n/--no-headers and -d/--delimiter flags to appear before subcommands.
  • Adding backtick quoted strings to moonblade.
  • Adding moonblade printf function.
  • Adding moonblade pad, lpad & rpad functions.
  • Adding xan select -f/--evaluate-file.
  • Adding multi-member gzip files support (to handle files compressed with bgzip notably).
  • Adding xan split -f & xan partition -f short flag for --filename.
  • Adding xan split -c/--chunks & xan split --segments.
  • Adding xan sample -§/--cursed.
  • Adding xan search -B/--breakdown and the related --name-column flag.
  • Adding CSV file chunking capabilities to xan parallel.
  • Adding xan from md.
  • Adding xan parallel map.
  • Adding -p/--parallel & -t/--threads to count, freq, stats, search, agg & groupby commands.
  • Adding piped column access to expression given to xan flatmap -r.
  • Adding xan rename -R/--replace & xan rename -x/--suffix.
  • Adding xan parallel freq -l/--limit, -A/--all, -a/--approx & -N/--no-extra.
  • Adding xan search -U/--unique-matches & --sep & --left.
  • Adding parallelization through novel file segmentation of files compressed with bgzip when a .gzi index can be found.
  • Adding the xan window command for window aggregations like rolling averages, cumulative sums, lags etc.
  • Adding xan help window.
  • Adding xan head & xan tail as aliases over xan slice -l & xan slice -L respectively.
  • Adding xan from --sheet-index & --list-sheets.
  • Adding xan flatten -H/--highlight & -i/--ignore-case.
  • Adding xan agg -C/--along-cols & xan agg -M/--along-matrix.
  • Adding xan groupby -C/--along-cols.
  • Adding support for xan search -l -p -t.
  • Adding rms moonblade aggregation function.
  • Adding xan scrape -E/--encoding.
  • Adding CDX files support.
  • Adding regex moonblade function.
  • Adding header, col_index & col_index? moonblade functions.
  • Adding find & find_index moonblade functions.
  • Adding -l/--limit support to xan search -p & xan filter -p.

Fixes

  • Adding missing highlight for NULL values in xan view & xan flatten.
  • Fixing moonblade slicing wrt negative indexing and nontrivial inner expression.
  • Fixing moonblade get function for bytes.
  • Fixing xan sort -e skipping first record of each chunk.
  • Fixing xan sort -e stability.
  • More accurate xan sort -e memory usage calculations.
  • Fixing xan transform -n.
  • Fixing xan view -g -s.
  • Fixing moonblade concretization wrt branching.
  • Fixing xan behead -o and xan behead -Ao.
  • Reorganizing xan help functions.
  • Fixing lexicographic extent merging in xan parallel stats.
  • Fixing xan to md width alignment.
  • Renaming xan parallel --shell-preprocessing short flag to be -H because it was being overriden by -S/--source-column.
  • Adding missing subcommand completions for xan parallel & xan cat.
  • Better default threads count heuristics.
  • Better xan plot -T date parsing.
  • Fixing xan search replacements when using the -s/--select flag with a non-full selection.
  • Adding the xan view -r/--right flag to force right alignment for a selection of columns.
  • Fixing xan flatten broken pipe panics when piped.
  • Fixing xan plot -R/--regression-line when linear function endpoints are out of bounds.
  • xan parallel early exits when a target file does not exist.
  • Fixing moonblade list slicing.
  • Fixing cols() & headers() moonblade functions without arguments.
  • Fixing cols() & headers() not working with dynamic arguments.
  • Fixing moonblade indexing parsing.
  • Fixing aggregation arity validation.
  • Fixing xan agg & xan groupby behavior wrt -n/--no-headers.
  • Fixing shortcircuiting of and and or moonblade functions.
  • Fixing issue with degenerate cases in xan bins --nice.
  • Fixing bin allocation in xan bins --nice.
  • Fixing xan bins --nice first and last bound to stick to min & max.
  • Fixing negative indexing with col*(name, pos) moonblade functions.
  • Fixing argmin & argmax parallel stability.
  • Fixing panic with xan plot when using log scales and min/max are <= 0.

Performance

  • Switching hashmaps to ahash.
  • Optimizing moonblade pipelines with more than a single underscore substitution.
  • Improving xan reverse performance.
  • Reducing memory footprint of aggregators.
  • Optimizing xan select -e allocations.

Quality of Life

  • Prepending xan subcommand to error messages.
  • Better error messages when moonblade expressions cannot be parsed.
  • Displaying number of threads actually used when using xan parallel.
  • xan view now automatically right-align columns containing only integers.
  • Better moonblade casting errors.
  • xan bins formatted bound will now be padded for better readability.

Don't miss a new xan release

NewReleases is sending notifications on new releases.