Merged PRs

dolt

4661: Bats tests for sql-client fixes
4659: go/libraries/doltcore/sqle/cluster: Add the ability to configure SAN assertions on URIs and DNS names in the server certificate when configuring TLS.
4658: Remove server_query in favor of dolt sql-client in bats
This PR depends on #4640.
4656: Improve dump-docs error and add bats, fix diff docs
4647: Handle removed tables in //.../doltcore/migrate
Fixes #4602
Validated changes against repos that previously failed with duplicate tag errors:
- https://dolthub.awsdev.ld-corp.com/repositories/dolthub/pdap-datasets/data/master
- https://www.dolthub.com/repositories/dolthub/us-president-precinct-results
4646: Support two and three dot syntax in dolt_diff_summary table function
Two dot: dolt_diff_summary('main..feature', 'table') (equivalent to dolt_diff_summary('main', 'feature', 'table'))
Three dot: dolt_diff_summary('main...feature', 'table')
4644: Support two and three dot syntax in dolt_diff table function
Two dot: dolt_diff('main..feature', 'table') (equivalent to dolt_diff('main', 'feature', 'table'))
Three dot: dolt_diff('main...feature', 'table')
4641: integration-tests/bats/sql-server.bats: Move the temporary table session specific test to go-sql-server-driver.
4640: SQL client --query fixes
Fixes the following issues regarding dolt sql-client --query
4639: Add sleeps to a couple bats tests that were starting a SQL Server
Give the server a chance to start up. Was failing weirdly on my Mac without this.
4638: go/libraries/doltcore/sqle/cluster: Add support for configured tls_{cert,key,ca} on the cluster.remotesapi.
4637: Add two and three dot diff syntax to CLI
Two dot:
dolt diff A..B
dolt diff A..
Three dot:
dolt diff A...B
dolt diff A...
dolt diff --merge-base A B
4634: go/store/nbs: Removed chunkReader.extract(), converted public methods to private methods for…
… chunkReader, chunkSource and tableSet
4626: more permissive constraints and types when importing
There was a change added that samples the rows when doing large imports, but this can incorrectly give columns a NOT NULL constraint. We can either not sample tables or just not apply the NOT NULL constraintl. I chose to not apply the NOT NULL constraint. Additionally, the sampling change makes it possible to incorrectly make certain columns UNISIGNED INT, when they contain negative numbers. I changed it so infer will always treat columns as signed, so integers that are too large/small for signed int will become strings.
fix for: #4620
4622: Partially remove server_query in favor of dolt sql-client in bats tests
server_query requires python and has a non-descriptive interface. dolt sql-client allows for a native dolt interface to a running dolt sql-server that is easier to read and maps better to the rest of the bats tests.
4583: avoid hidden secondary indexes when creating foreign keys that prefix of pk
TODO: maybe use empty string instead of PRIMARY for name of index used by foreign key.
When creating foreign keys, we typically create a new secondary index so violations and cascades are handled quickly.
However, there is no need to create a secondary index when the foreign key references a prefix to an existing primary key.
This PR makes it so that the Primary Key shows up as an Index with the name "PRIMARY".

go-mysql-server

1354: Bug fix for pushdownSort handling missing cols qualified with a table name
The Dolt bump for my GMS change to fix an alias issue in sort node pushdown triggered an error with matching missing column names now that we can include qualified column names.
This PR adds a repro for that case to GMS and fixes the issue by ensuring we create a UnresolvedQualifiedColumn when the missing column is qualified with a table name. I've run Dolt tests locally and confirmed there shouldn't be any other test failures in the next bump.
1353: Fix panic for show keys from information_schema.columns
Does this by removing the special handling of information_schema.columns as a separate node type, treats it just like any other ResolvedTable.
In the process, effectively rewrote how we handle column default values by 1) moving most logic to happen in the OnceBefore batch, rather than default rules, and 2) splitting it up into multiple passes that each have a single purpose. I found in the process of 1) that the previous rules had a lot of side effects and unintended ordering constraints, so introduced new rules and tweaked others to eliminate those.
1352: Update sort node to use alias reference when it will resolve missing columns
Fixes: #3016
Other changes:
- Refactors the existing OrderBy/GroupBy tests into a ScriptTest.
- Introduces a new interface, sql.Projector, that unites GroupBy, Window, and Project if a caller just needs to get the projected expressions.
1350: Subquery row iter fix field indexes
Recent changes to subquery scope visibility use the scope to communicate column definition availability; i.e., we do not pass the scope into subqueries we have determined to not depend on the outer scope, marking the same scope as cacheable. This analysis change needs a corresponding runtime change to indicate whether the scope row is expected at execution time.
1349: Skip process tracking when prepreparing queries
When we prepare a statement, the QueryProcess node we create is bound to the current context's PID. This is not the same PID as the context that will execute a statement created from that template, which results in ProcessList metadata not being properly cleaned up after a query has finished processing.
Fixes: #4601
I couldn't find a great way to test this in the GMS package, but I'm working on a test in dolt that I'll link to shortly.
1290: New join planning
Join planning has many high level components:
- table order
- tree shape (left deep, right deep, bushy, etc)
- indexes for lookup joins
- filter arrangement
- choosing between logically equivalent plans (costing)
  This PR does not fix all of these components, but it does provides a structure for gracefully increasing the complexity of each.
  The new memo data structure is a memory efficient IR for SQL queries that 1) groups equivalent plans by their output schemas, and 2) generalizes child relationships. For example, here is an example memo for a join query:
```
select * from ab
inner join uv on a = u
full join pq on a = p
memo:
├── G1: (tablescan: ab)
├── G2: (tablescan: uv)
├── G3: (tablescan: pq)
├── G4: (innerJoin 2 1) (innerJoin 1 2)
└── G5: (leftJoin 4 3)
```
"Relational expression" each have their own expression group, within which several physical/concrete implementation can reside. An expression group is defined in terms of expression group relationships, not physical implementations, because the output from any relational expression within a group will be the same. Groups are usually keyed by a hash of the operator type and the child group keys, or in the case of scalar expression by the literal parameter values (this last point will be useful for prepared statements).
All valid reorderings of the join tree are added to the memo by joinOrderBuilder.
Applying indexes is a separate step, where we consider an indexed version for every join implementation where the outer (right) table is an indexable data source. Adding indexed plans to the query above yields:
```
memo:
├── G1: (tablescan: ab)
├── G2: (tablescan: uv)
├── G3: (tablescan: pq)
├── G4: (indexedJoin 1 2) (indexedJoin 2 1) (innerJoin 2 1) (innerJoin 1 2)
└── G5: (indexedJoin 4 3) (leftJoin 4 3)
```
Hashed joins are added in the same manner:
```
memo:
├── G1: (tablescan: ab)
├── G2: (tablescan: uv)
├── G3: (tablescan: pq)
├── G4: (hashJoin 1 2) (hashJoin 2 1) (indexedJoin 1 2) (indexedJoin 2 1) (innerJoin 2 1) (innerJoin 1 2)
└── G5: (hashJoin 4 3) (indexedJoin 4 3) (leftJoin 4 3)
```
This memo is missing several components that would make it more useful, including other node and expression types. For example, filter pushdown would be much easier in the memo. You could imagine extending the memo above with sql.Expression tree memo groups (scalar relations). Canonical (simplified) expression groups only requiring one allocation, and the scalar expression group caches properties like table and column dependencies similar to relProp column and table dependencies. We would need to build a memo prior to join planning to use filters for join planning, and run other transformation rules on the memo. I'd expect the memo to absorb rules over time, which will simplify transform logic and improve analyzer memory consumption. I added some starters to make this easier in the future, like codegen'd memo expressions, and a move towards visitor interfaces for IR transformations.
A fully exhausted memo tree is converted back into a physical plan first through a costing process. Costing builds the fastest implementation for child group expressions bottom-up, using only the best child implementations for costing parent expressions and eventually the root node.
The coster is designed to use histogram statistics to pick the fastest join plan. We only provide a subset of index metadata and table sizes at this moment. Additionally, costing should apply to generic SQL nodes and expressions, not just join trees and join leaves.
Other notes:
- we miss join plans that result from transitive predicates (ex: ab.a = uv.u + uv.u = xy.x => ab.a = xy.x).
- we miss join plan optimizations resulting from table functional dependencies (ex: eliminate redundant filters if we already join on a a primary key), and null-allowing joins (ex: IS NULL).
- we could more aggressively apply filters to join trees.
- converting from the memo to an execution plan is fraught for the same reasons it was before this PR, but tree building and fixup is condensed in exec_builder, which i'd expect to expand to include more exprGroup types in the future.
- there is a hack for fixing up field indexes in join trees not considered for reordering.
- right joins are converted to left joins after expandStars in transposeRightJoins.
- non-left deep trees necessitate passing parent rows down the right side of join trees, which were previously only table sources (left deep).
- joinOrder hinting is rewritten to more easily fit within exprGroup costing.
- I started to use bitmaps to track expression group column dependencies, which should also be used for filter applicability.
- indexedJoinIter is duplicate and should be deprecated, but i had a hard time quickly removing it.
- every join node is condensed into *plan.JoinNode and differentiated based on its join type, which lets me bucket different join types more easily (refer to JoinType type methods).
- applyHashLookups should be deprecated, given that we will preemptively build hash joins. If hash joins do not select the lookup, we should either not cache the subquery or improve the coster to make better decisions.
- there are probably bugs where we apply hash joins to non-cacheable subquery scopes for derived tables. Need more testing.
  Additional refs:
- https://www.researchgate.net/publication/262216932_On_the_correct_and_complete_enumeration_of_the_core_search_space
- https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/xform/join_order_builder.go.
- https://15721.courses.cs.cmu.edu/spring2019/papers/22-optimizer1/xu-columbia-thesis1998.pdf

Closed Issues

4652: upgrade dolt from 0.26.6 to 0.50.2， sql error
1345: SET @@mydb_head = HASHOF() panics when session variables is spelled wrong
4602: dolt migrate: Fails to migrate when a table has been renamed
2082: Don't create duplicate primary key index for referenced tables in foreign keys
4625: [dolt migrate]: Incorrect datetime value
3016: Strange error with Large Join Query
1897: Addition of Docker image
4620: import table returns error

dolthub/dolt v0.50.11 0.50.11 on GitHub

Merged PRs

dolt

go-mysql-server

Closed Issues

dolthub/dolt v0.50.11
0.50.11

on GitHub