Merged PRs
dolt
- 7912: Add
IndexedJsonDocument
, aJSONWrapper
implementation that stores JSON documents in a prolly tree with probabilistic hashing.
tl;dr: We store a JSON document in a prolly tree, where the leaf nodes of the tree are blob nodes with each contain a fragment of the document, and the intermediate nodes are address map nodes, where the keys describe a JSONPath.
The new logic for reading and writing JSON documents is cleanly separated into the following files:
IndexedJsonDocument - The newJSONWrapper
implementation. It holds the root hash of the prolly tree.
JsonChunker - A wrapper around a regular chunker. Used to write new JSON documents or apply edits to existing documents.
JsonCursor - A wrapper around a regular cursor, with added functionality allowing callers to seek to a specific location in the document.
JsonScanner - A custom JSON parser that tracks that current JSONPath.
JsonLocation - A custom representation of a JSON path suitable for use as a prolly tree key.
Each added file has additional documentation with more details about the individual components.
Throughout every iteration of this project, the core idea has always been to represent a JSON document as a mapping from JSONPath locations to the values stored at those locations, then we could store that map in a prolly tree and get all the benefits that we currently get from storing tables in prolly trees: fast diffing and merging, fast point lookups and mutations, etc.
This goal has three major challenges:- For deeply nested JSON documents, simply listing every JSONPath requires asymptotically more space than the original document.
- We need to do this in a way that doesn't compromise performance on simply reading JSON documents from a table, which I understand is the most common use pattern.
- Ideally, users should not need to migrate their databases, or update their clients in order to read newer dbs, or have to choose between different configurations based on their use case.
This design achieves all three of these requirements: - While it requires additional storage, this additional storage cannot exceed the size of the original document, and is in practice much smaller.
- It has indistinguishable performance for reading JSON documents from storage, while also allowing asymptotically faster diff and merge operations when the size of the changes is much smaller than the size of the document. (There is a cost: initial inserts of JSON documents are currently around 20% slower, but this is a one-time cost that does not impact subsequent reads and could potentially be optimized further.)
- Documents written by the new
JSONChunker
are backwards compatible with current Dolt binaries and can be read back by existing versions of Dolt. (Although they will have different hashes than equivalent documents that those versions would write.)
go-mysql-server
vitess
- 352: Add support for the
CONSTRAINT
keyword when adding a foreign key without a constraint name
Customer issue: #8008 - 350: Refactoring
BinlogStream
type intoBinlogMetadata
Themysql.BinlogStream
type from Vitess was a little awkward to use, and seems to have been mostly intended as test code. This gives it a more descriptive name and makes it a little easier to pass around struct copies without concurrency issues from a shared instance.