This release contains a behavior change to LOAD DATA LOCAL
, which now defaults to have the same effect as IGNORE
in order to match MySQL.
Per Dolt's versioning policy, this may require users to update code that uses LOAD DATA LOCAL
. Thus, the minor (major.minor.patch) version bump.
Merged PRs
dolt
- 6092: Integration tests for load data ignore/replace
- 6086: Add support for
--skip-empty
todolt commit
anddolt_commit()
The new--skip-empty
flag can be passed todolt commit ...
orcall dolt_commit(...)
and have the commit operation be a no-op, instead of an error, if there are no changes staged to commit. It is an error to use--skip-empty
together with--allow-empty
.
Fixes: #5678 - 6077: Evaluate column default expressions during merge
Previously, we only supported evaluating column default expressions during merge when the expression only contained literals. This PR expands that support to evaluate column default expressions that contain functions and column references.
go-mysql-server
- 1812: Support load data ignore/replace
Here are the docs for load data with ignore/replace modifiers: https://dev.mysql.com/doc/refman/8.0/en/load-data.html#load-data-error-handling
This also changesLOCAL
to have the same effect asIGNORE
to match mysql - 1811: Hoist subquery filters bug
Hoist filters is supposed to move filters that do not reference tables in the current sope upwards. We did not descend subqueries when checking for that condition, mistakenly hoisting filters in some cases.
Re: #6089 - 1797: Add filters to memo
Scalar expressions added to memo along with scalar properties, expression ids, filter closures. Goal here is equivalent behavior to before, just with filters represented differently. Filter organization mostly mirrors the plan package, except scalar and relational expressions are both represented as expression groups here. Done in a rush, still back and forth on whether there should be an interface there.
Additionally:- scalar expressions added to memo along with scalar properties, expression id
- rewrites join planning and costing to use bitset representations of filters
- refactors codegen so definition files are yaml, source is compiled independently from target code
The organization is bit wonky b/c this should be using my name resolution symbol tables, and the entire tree should be memoized not just the join tree (used temporary solutions for the problems created by both of these).
Re: #5993
- 1791: Functional dependencies
Functional dependencies track 1) key uniqueness, constant propagation, column equivalence, nullability sets.
This information is built bottom-up from tables scans through projections, and is used to answer certain questions about relational nodes:- What is the equivalence closure for a join condition?
- Are a set of filters redundant?
- Do a set of index expressions comprise a strict key for a LOOKUP_JOIN?
- Does a subquery decorrelation scope have a strict/null-safe key for an ANTI_JOIN?
- Are the grouping columns a strict key of the table (only_full_group_by is unnecessary)
- Is the relation sorted on a given column set? (is a Sort already enforced)
- Is a relation constant? (Max1Row)
Questions (1) and (3) contribute towards fixing this issue: #5993. Question (2) contributes to filter pruning. Question (4) is relevant for this issue: #5954.
This master's thesis explains how to build the derivation graph starting at page 113: https://cs.uwaterloo.ca/research/tr/2000/11/CS-2000-11.thesis.pdf. The graph is composed of(determinant) -> (dependent)
relationships on columns to track these properties. They color edges and nodes to differentiate constant, nullability, equivalence attributes. Any set of of columns uniquely determines the value of constants, so they have empty determinants:() -> (colSet)
. We differentiate strict keys (set of columns unique and non-nullable index) from lax keys (index that maybe be non-unique or nullable).
Cockroach implemented a version that uses flattened to/from sets rather than individual nodes for determinant/dependents, and makes optimizations for quickly computing candidate keys: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/props/func_dep.go.
My encoding is a little different. First, I assume that attributes trickle down from nullability -> constant -> equivalence -> functional dependencies. An FD built in this order simplifies the upstream additions in a way that avoids having to recompute dependency closures (ex: nullability, constant, and equiv columns don't recompute keys). Second, I assume FDs will be limited to primary and secondary key indexes; keys will have either strict or lax determinants, and the dependents are always assumed to be the rest of the table. So far this drops LEFT_JOIN right-equivalence relations that translate to lax-keys after the join, which could opportunistically be converted back to strict keys by downstream operators. If this was a mistake we can undo that, add back in dependent column sets.
We need to support a handful of operators to use FDs in the join memo:
- Table scan
- Cross join
- Inner join
- Left join
- Project (Distinct)
- Filter
Missing: - Full outer join
- Synthesized columns
Additionally: - the memo needs to embed equal filters in a format with expression ids
- join reordering should compute equivalence closures for join edges
- join selection should use functional dependencies to check if lookup expressions are valid
Missing practical considerations: - when we determine a lookup expression comprises a strict key for a table, we need a way to backfill constants and equivalences used to make that decision
- filters should maybe be represented in memo selection nodes to support redundancy elimination
vitess
- 243: Add ignore/replace modifiers to load data
- 241: Walking sub-nodes for
SHOW TABLE
statements
When preparing aSHOW TABLES
statement with a bound variable in the filter clause (e.g.SHOW TABLES FROM mydb WHERE Tables_in_mydb = ?;
) GMS and Vitess were identifying the bound variable parameters differently and causing the SQL client on the other end to panic. Vitess code inconn.go
walks the parsed tree and looks forSQLVal
instances to identify the parameters and then returns that metadata over the SQL connection. TheSHOW TABLES
statement above fails because thesqlparser
AST wasn't including all the members ofSHOW TABLES
node in the walk. This case is a little tricky to test directly in go-mysql-server, because it only repros in a running sql-server when running over a Vitessconn
.
The GMS and Vitess layers are both calculating bind variable metadata, with two different techniques, and whenever they get out of sync, we will see issues like this that only appear when running over a SQL connection. Longer term, we may want consider allowing GMS to return its bind variable metadata and avoid Vitess needing to re-calculate it, if we see more instances of this problem.
Fixes: dolthub/go-mysql-server#1793
Closed Issues
- 5982: LOAD DATA does not support replace or ignore options
- 5678: Feature request: Add --skip-empty to DOLT_COMMIT()
- 6089: Analyzer Error: failed to replan join: field "id" is not on schema
Latency
Read Tests | MySQL | Dolt | Multiple |
---|---|---|---|
covering_index_scan | 1.93 | 2.71 | 1.4 |
groupby_scan | 12.3 | 16.71 | 1.4 |
index_join | 1.18 | 4.18 | 3.5 |
index_join_scan | 1.14 | 2.11 | 1.9 |
index_scan | 30.26 | 54.83 | 1.8 |
oltp_point_select | 0.14 | 0.47 | 3.4 |
oltp_read_only | 2.86 | 8.13 | 2.8 |
select_random_points | 0.3 | 0.77 | 2.6 |
select_random_ranges | 0.35 | 1.1 | 3.1 |
table_scan | 30.26 | 55.82 | 1.8 |
types_table_scan | 70.55 | 158.63 | 2.2 |
reads_mean_multiplier | 2.4 |
Write Tests | MySQL | Dolt | Multiple |
---|---|---|---|
bulk_insert | 0.001 | 0.001 | 1.0 |
oltp_delete_insert | 5.0 | 6.32 | 1.3 |
oltp_insert | 2.52 | 3.02 | 1.2 |
oltp_read_write | 6.43 | 15.55 | 2.4 |
oltp_update_index | 2.57 | 3.13 | 1.2 |
oltp_update_non_index | 2.66 | 3.07 | 1.2 |
oltp_write_only | 3.62 | 7.84 | 2.2 |
types_delete_insert | 5.0 | 6.91 | 1.4 |
writes_mean_multiplier | 1.5 |
Overall Mean Multiple | 2.0 |
---|