github simdjson/simdjson v1.0.0
Version 1.0

latest releases: v3.8.0, v3.7.1, v3.7.0...
2 years ago

Release 1.0.0 of the simdjson library builds on earlier pre-1.0 release that made the On Demand frontend our default. The On Demand front-end is a new way to build parser. With On Demand, if you open a file containing 1000 numbers and you need just one of these numbers, only one number is parsed. If you need to put the numbers into your own data structure, they are materialized there directly, without being first written to a temporary tree. Thus we expect that the simdjson On Demand might often provide superior performance, when you do not need to intermediate materialized view of a DOM tree. The On Demand front-end was primarily developed by @jkeiser.

If you adopted simdjson from an earlier version and relied on the DOM approach, it remains as always. Though On Demand is our new default, we remain committed to supporting the conventional DOM approach in the future, as there are instances where it is more appropriate.

Release 1.0.0 adds several key features:

  • In big data analytics, it is common to serialize large sets of records as multiple JSON documents separated by while spaces. You can now get the benefits of On Demand while parsing almost infinitely long streams of JSON records (see iterate_many). At each step, you have access to the current document, but a secondary thread indexes the following block. You can thus access enormous files while using a small amount of memory and achieve record-breaking speeds. Credit: @NicolasJiaxin.
  • In some cases, JSON documents contain numbers embedded within strings (e.g., "3.1416"). You can access these numbers directly using methods such as get_double_in_string(). Credit: @NicolasJiaxin
  • Given an On Demand instance (value, array, object, etc.), you can now convert it to a JSON string using the to_json_string method which returns a string view in the original document for unbeatable speeds. Credit: @NicolasJiaxin
  • The On Demand front-end now supports the JSON Pointer specification. You can request a specific value using a JSON Pointer within a large document. Credit: @NicolasJiaxin
  • Arrays in On Demand now have a count_elements() method. Objects have a count_fields() method. Arrays and objects have a reset method for when you need to iterate through them more than once. Document instances now have a rewind method in case you need to process the same document multiple times.

Other improvements include:

  • We have extended and improved our documentation and we have added much testing.
  • We have accelerated the JSON minification function (simdjson::minify) under ARM processors (credit @dougallj)

We encourage users of previous versions of the simdjson library to update. We encourage users to deploy it for production uses.

Don't miss a new simdjson release

NewReleases is sending notifications on new releases.