github 01mf02/jaq v3.0.0-alpha
3.0 alpha

pre-release7 hours ago

jaq is a jq clone with a focus on correctness, performance, and simplicity.
This release brings a large number of new features, including a brand-new manual, support for YAML/CBOR/TOML/XML, an extended set of values, and several new filters.

This work has been sponsored by NLnet as part of the polyglot jaq project. Thanks a lot!

Given that my current NLnet funding for this project is nearly depleted, I will soon adapt my efforts in developing jaq by financial support from its users. My current position is that I want to continue maintaining jaq and integrating features developed by the community, but without financial support, I plan not to actively develop new features myself. If you want me to develop a new feature in jaq for you (or simply say thanks), then your sponsorship would be very appreciated.

Manual

jaq now has a manual that can be read as HTML or as man page #332. This manual has been written from the ground up and features nearly 500 new examples that are automatically tested. It should be also usable as manual for jq.

Feedback on the manual is particularly solicited. Thanks to @wader for the existing feedback!

Multiple format support

This release marks the advent of "polyglot jaq"; that is, jaq with built-in first-class support for other data formats than JSON. #284 In particular, this version supports reading and writing JSON, YAML, CBOR, TOML, and XML. Care has been taken to preserve the behaviour for existing JSON workflows as much as possible.

To read or write data in a non-JSON format, you can use the new command-line flags --from FORMAT and --to FORMAT. Alternatively, you can simply pass files with a file extension such as .cbor, .yaml etc.
This behaviour is modeled after Pandoc.

XJON

This release supports reading/writing an superset of JSON called XJON. #284
XJON allows perfect round-tripping of the values that jaq calculates with. It adds the following constructs to JSON:

  • Numbers: Infinity, -Infinity, and NaN
  • Text strings with invalid UTF-8 code units
  • Byte strings, e.g. b"Hello\xFFWorld!\n"
  • Objects with non-string keys, e.g. {0: 1, [2]: 3, {}: 4}
  • Comments, starting with the character # and ending at the next \n (or EOF)

See the XJON section in the manual for more information.

The motivation for these additions are:

  • Special floating-point numbers can also be produced by jq, e.g. jq -n infinite,nan. However, jq prints them as 1.7976931348623157e+308 and null, thus making it difficult to recognise these numbers. jaq prints them as Infinity and NaN to make the output clearer to users and to catch such numbers.
  • Text strings with invalid UTF-8 code units, such as can be produced with the Linux command printf '"Hi\xFFthere"'xFFthere"', can be loaded via --rawfile in constant time, thanks to memory mapping. Furthermore, converting from byte strings to such text strings also takes constant time. This change also improves interoperability with tools that output invalid UTF-8 code units as part of strings. Thanks to @Maxdamantus for significantly having shaped the design of this feature, see #309.
  • Byte strings can be created from traditional text strings with the tobytes filter (see new filters). Unlike text strings, they allow for indexing (e.g. .[0]) and slicing (e.g. .[:1]) in constant time, thus allowing to write efficient binary format decoders in jaq.
  • Objects with non-string keys can be used as a generalised hash map, e.g. to store sparse arrays or sets of arbitrary values. Furthermore, like byte strings, they can occur in CBOR and YAML files.
  • Comments have been requested by jaq users in the past, and # ... \n can be parsed easily and efficiently. (In contrast to // ... //* ... */, which introduce new error conditions for the parser.)

Currently, wherever jaq previously read/wrote JSON, it now reads/writes XJON.
For existing, valid JSON files, the only impact on the output is when you output NaN/Infinity values.

As part of this change, jaq now also supports arbitrarily big integers.

New filters

This release brings support for the filter path/1, closing the last large remaining gap in functionality between jq and jaq. #296 Supporting path/1 required a big overhaul of jaq's filter execution engine, see API. At the same time, having support for path/1 paved the way towards implementing several other path-related filters #332, such as:

  • setpath/2
  • delpaths/1
  • pick/1

The filter tobytes is a new addition from fq. It allows to convert several kinds of values (including text strings) to byte strings.

There are several new filters analogous to fromjson / tojson to (de-)serialise from/to various data formats:

  • fromcbor / tocbor
  • fromyaml / toyaml
  • fromtoml / totoml
  • fromxml / toxml

Finally a few more filters:

Changed behaviour

The fromjson filter can now read an arbitrary number of inputs; e.g. "1 [2] {"3": 4}" | fromjson yields 1, [2], and {"3": 4}. In jq, this fails with an error message because the string contains multiple JSON values. This change makes the behaviour of jaq . file.json identical to the behaviour of jaq -Rs fromjson file.json, while simplifying the implementation. The same behaviour is also implemented for fromcbor, fromyaml, and fromxml. (However, it is not implemented for TOML, because
TOML documents always represent a single object.)

Indexing null previously yielded an error in jaq, whereas it yielded null in jq. Now this also yields null in jaq, which further increases jaq's compatibility with jq. #336

The explode and implode filters now return negative integers for invalid UTF-8 code units #317. This can never occur in jq, because it excludes invalid UTF-8 code units.

Nested repls are now indented. #306

API

This release enables passing arbitrary (Rust) data to native filters at runtime. #296 This finally makes jaq's core implement a pure programming language, and makes it possible to run jq filters without having to think about inputs. On the other hand, this change required breaking the compilation/execution/native filter API. To upgrade from jaq 2.0, it is advisable to study jaq/src/{filter,funs}.rs.

Furthermore, if you implemented native filters with support for updates (i.e. that can be used on the right-hand side of |=, such as debug), then you may enhance these filters with path support. See the implementation of the filter first in jaq-std, for example.

A non-breaking change is that enabling the sync feature on jaq-json makes jaq_json::Val implement Send + Sync, allowing such values to be used freely across multiple threads. This has been already used to achieve impressive performance results for a program running parallel jaq instances. Thanks to @I-Al-Istannen! #325

A few lifetimes have been improved by @jqnatividad #318.

The jaq_json::Map type is now public thanks to @jakobhellermann #350.

Bugs

Color output now works correctly again on Windows. #333

New Contributors

Full Changelog: v2.3.0...v3.0.0-alpha

Don't miss a new jaq release

NewReleases is sending notifications on new releases.