Merged PRs

dolt

4609: use github actions for docker
Use GitHub Actions to push to dolthub/dolt and dolthub/dolt-sql-server images.
4608: go/doltcore/migrate: Patch schemas to use utf8mb4_0900_bin to match e…
…xisting index order
fix for #4603
4605: Add dolt sql server docker image
Adds Docker Image for dolt sql-server. The server starts at 0.0.0.0:3306, which allow connections from outside the container through port-mapping. User can define host and port through yaml configuration. The data directory is default to /var/lib/dolt/ directory in the container. It can be mounted with directory on host system. It's not recommended to define data_dir in configuration, which will cause mounting host system directory fail.
4594: Make dolt log work with ^ ancestor spec
4585: Branch Control Pt. 9
This PR:
- Moves the branch controller into the context
- Adds the remaining stored procedures
- Handles declaring the super user via CLI arguments or YAML
- Fixes bugs found through additional testing
- Enables branch control globally
  Things still missing:
- Bats tests (will mainly copy the engine tests, but makes use of saving/loading from disk)
- Ways to interact with the binlog
- Support for roles
  Role support is actually a bit more involved than originally anticipated. It involves coordination with the privilege tables, which were not built to be interacted with outside of their specific GMS context. I'm thinking it should get pushed back to a v2 implementation, unless it's high priority.
4516: join ordering GMS bump
companion PR: dolthub/go-mysql-server#1290

go-mysql-server

1343: allow adding new primary key to table with > 1 row iff it has auto_increment
fix for: #4581
tests in dolt because memory.Table doesn't implement RewriteableTable
#4593
1310: Derived table outer scope visibility
Add support for outer scope visibility for derived tables, as introduced in MySQL 8.0.14.
References:
- MySQL Blog Post: Outer references for derived tables
- MySQL Worklog task and notes for derived table outer scope visibility
  Resolves: #4534
  Dolt CI tests: #4472
1290: New join planning
Join planning has many high level components:
- table order
- tree shape (left deep, right deep, bushy, etc)
- indexes for lookup joins
- filter arrangement
- choosing between logically equivalent plans (costing)
  This PR does not fix all of these components, but it does provides a structure for gracefully increasing the complexity of each.
  The new memo data structure is a memory efficient IR for SQL queries that 1) groups equivalent plans by their output schemas, and 2) generalizes child relationships. For example, here is an example memo for a join query:
```
select * from ab
inner join uv on a = u
full join pq on a = p
memo:
├── G1: (tablescan: ab)
├── G2: (tablescan: uv)
├── G3: (tablescan: pq)
├── G4: (innerJoin 2 1) (innerJoin 1 2)
└── G5: (leftJoin 4 3)
```
"Relational expression" each have their own expression group, within which several physical/concrete implementation can reside. An expression group is defined in terms of expression group relationships, not physical implementations, because the output from any relational expression within a group will be the same. Groups are usually keyed by a hash of the operator type and the child group keys, or in the case of scalar expression by the literal parameter values (this last point will be useful for prepared statements).
All valid reorderings of the join tree are added to the memo by joinOrderBuilder.
Applying indexes is a separate step, where we consider an indexed version for every join implementation where the outer (right) table is an indexable data source. Adding indexed plans to the query above yields:
```
memo:
├── G1: (tablescan: ab)
├── G2: (tablescan: uv)
├── G3: (tablescan: pq)
├── G4: (indexedJoin 1 2) (indexedJoin 2 1) (innerJoin 2 1) (innerJoin 1 2)
└── G5: (indexedJoin 4 3) (leftJoin 4 3)
```
Hashed joins are added in the same manner:
```
memo:
├── G1: (tablescan: ab)
├── G2: (tablescan: uv)
├── G3: (tablescan: pq)
├── G4: (hashJoin 1 2) (hashJoin 2 1) (indexedJoin 1 2) (indexedJoin 2 1) (innerJoin 2 1) (innerJoin 1 2)
└── G5: (hashJoin 4 3) (indexedJoin 4 3) (leftJoin 4 3)
```
This memo is missing several components that would make it more useful, including other node and expression types. For example, filter pushdown would be much easier in the memo. You could imagine extending the memo above with sql.Expression tree memo groups (scalar relations). Canonical (simplified) expression groups only requiring one allocation, and the scalar expression group caches properties like table and column dependencies similar to relProp column and table dependencies. We would need to build a memo prior to join planning to use filters for join planning, and run other transformation rules on the memo. I'd expect the memo to absorb rules over time, which will simplify transform logic and improve analyzer memory consumption. I added some starters to make this easier in the future, like codegen'd memo expressions, and a move towards visitor interfaces for IR transformations.
A fully exhausted memo tree is converted back into a physical plan first through a costing process. Costing builds the fastest implementation for child group expressions bottom-up, using only the best child implementations for costing parent expressions and eventually the root node.
The coster is designed to use histogram statistics to pick the fastest join plan. We only provide a subset of index metadata and table sizes at this moment. Additionally, costing should apply to generic SQL nodes and expressions, not just join trees and join leaves.
Other notes:
- we miss join plans that result from transitive predicates (ex: ab.a = uv.u + uv.u = xy.x => ab.a = xy.x).
- we miss join plan optimizations resulting from table functional dependencies (ex: eliminate redundant filters if we already join on a a primary key), and null-allowing joins (ex: IS NULL).
- we could more aggressively apply filters to join trees.
- converting from the memo to an execution plan is fraught for the same reasons it was before this PR, but tree building and fixup is condensed in exec_builder, which i'd expect to expand to include more exprGroup types in the future.
- there is a hack for fixing up field indexes in join trees not considered for reordering.
- right joins are converted to left joins after expandStars in transposeRightJoins.
- non-left deep trees necessitate passing parent rows down the right side of join trees, which were previously only table sources (left deep).
- joinOrder hinting is rewritten to more easily fit within exprGroup costing.
- I started to use bitmaps to track expression group column dependencies, which should also be used for filter applicability.
- indexedJoinIter is duplicate and should be deprecated, but i had a hard time quickly removing it.
- every join node is condensed into *plan.JoinNode and differentiated based on its join type, which lets me bucket different join types more easily (refer to JoinType type methods).
- applyHashLookups should be deprecated, given that we will preemptively build hash joins. If hash joins do not select the lookup, we should either not cache the subquery or improve the coster to make better decisions.
- there are probably bugs where we apply hash joins to non-cacheable subquery scopes for derived tables. Need more testing.
  Additional refs:
- https://www.researchgate.net/publication/262216932_On_the_correct_and_complete_enumeration_of_the_core_search_space
- https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/xform/join_order_builder.go.
- https://15721.courses.cs.cmu.edu/spring2019/papers/22-optimizer1/xu-columbia-thesis1998.pdf

Closed Issues

4526: Duplicate tag error
4603: dolt migrate: differing rows
4534: Derived Table access to Outer Query Scope

Latency

Read Tests	MySQL	Dolt	Multiple
covering_index_scan	1.96	2.71	1.4
groupby_scan	12.3	17.95	1.5
index_join	1.47	4.65	3.2
index_join_scan	1.44	3.89	2.7
index_scan	30.81	54.83	1.8
oltp_point_select	0.15	0.48	3.2
oltp_read_only	3.02	8.74	2.9
select_random_points	0.31	0.77	2.5
select_random_ranges	0.36	1.14	3.2
table_scan	30.81	63.32	2.1
types_table_scan	70.55	189.93	2.7
reads_mean_multiplier			2.5

Write Tests	MySQL	Dolt	Multiple
bulk_insert	0.001	0.001	1.0
oltp_delete_insert	3.43	8.9	2.6
oltp_insert	1.61	2.76	1.7
oltp_read_write	5.47	16.41	3.0
oltp_update_index	1.7	4.1	2.4
oltp_update_non_index	1.64	4.25	2.6
oltp_write_only	2.43	7.84	3.2
types_delete_insert	3.3	10.65	3.2
writes_mean_multiplier			2.5

Overall Mean Multiple	2.5

dolthub/dolt v0.50.9 0.50.9 on GitHub

Merged PRs

dolt

go-mysql-server

Closed Issues

Latency

dolthub/dolt v0.50.9
0.50.9

on GitHub