github dolthub/dolt v1.3.0
1.3.0

latest releases: v1.38.1, v1.38.0, v1.37.0...
11 months ago

This release contains a behavior change to LOAD DATA LOCAL, which now defaults to have the same effect as IGNORE in order to match MySQL.

Per Dolt's versioning policy, this may require users to update code that uses LOAD DATA LOCAL. Thus, the minor (major.minor.patch) version bump.

Merged PRs

dolt

  • 6092: Integration tests for load data ignore/replace
  • 6086: Add support for --skip-empty to dolt commit and dolt_commit()
    The new --skip-empty flag can be passed to dolt commit ... or call dolt_commit(...) and have the commit operation be a no-op, instead of an error, if there are no changes staged to commit. It is an error to use --skip-empty together with --allow-empty.
    Fixes: #5678
  • 6077: Evaluate column default expressions during merge
    Previously, we only supported evaluating column default expressions during merge when the expression only contained literals. This PR expands that support to evaluate column default expressions that contain functions and column references.

go-mysql-server

  • 1812: Support load data ignore/replace
    Here are the docs for load data with ignore/replace modifiers: https://dev.mysql.com/doc/refman/8.0/en/load-data.html#load-data-error-handling
    This also changes LOCAL to have the same effect as IGNORE to match mysql
  • 1811: Hoist subquery filters bug
    Hoist filters is supposed to move filters that do not reference tables in the current sope upwards. We did not descend subqueries when checking for that condition, mistakenly hoisting filters in some cases.
    Re: #6089
  • 1797: Add filters to memo
    Scalar expressions added to memo along with scalar properties, expression ids, filter closures. Goal here is equivalent behavior to before, just with filters represented differently. Filter organization mostly mirrors the plan package, except scalar and relational expressions are both represented as expression groups here. Done in a rush, still back and forth on whether there should be an interface there.
    Additionally:
    • scalar expressions added to memo along with scalar properties, expression id
    • rewrites join planning and costing to use bitset representations of filters
    • refactors codegen so definition files are yaml, source is compiled independently from target code
      The organization is bit wonky b/c this should be using my name resolution symbol tables, and the entire tree should be memoized not just the join tree (used temporary solutions for the problems created by both of these).
      Re: #5993
  • 1791: Functional dependencies
    Functional dependencies track 1) key uniqueness, constant propagation, column equivalence, nullability sets.
    This information is built bottom-up from tables scans through projections, and is used to answer certain questions about relational nodes:
    1. What is the equivalence closure for a join condition?
    2. Are a set of filters redundant?
    3. Do a set of index expressions comprise a strict key for a LOOKUP_JOIN?
    4. Does a subquery decorrelation scope have a strict/null-safe key for an ANTI_JOIN?
    5. Are the grouping columns a strict key of the table (only_full_group_by is unnecessary)
    6. Is the relation sorted on a given column set? (is a Sort already enforced)
    7. Is a relation constant? (Max1Row)
      Questions (1) and (3) contribute towards fixing this issue: #5993. Question (2) contributes to filter pruning. Question (4) is relevant for this issue: #5954.
      This master's thesis explains how to build the derivation graph starting at page 113: https://cs.uwaterloo.ca/research/tr/2000/11/CS-2000-11.thesis.pdf. The graph is composed of (determinant) -> (dependent) relationships on columns to track these properties. They color edges and nodes to differentiate constant, nullability, equivalence attributes. Any set of of columns uniquely determines the value of constants, so they have empty determinants: () -> (colSet). We differentiate strict keys (set of columns unique and non-nullable index) from lax keys (index that maybe be non-unique or nullable).
      Cockroach implemented a version that uses flattened to/from sets rather than individual nodes for determinant/dependents, and makes optimizations for quickly computing candidate keys: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/props/func_dep.go.
      My encoding is a little different. First, I assume that attributes trickle down from nullability -> constant -> equivalence -> functional dependencies. An FD built in this order simplifies the upstream additions in a way that avoids having to recompute dependency closures (ex: nullability, constant, and equiv columns don't recompute keys). Second, I assume FDs will be limited to primary and secondary key indexes; keys will have either strict or lax determinants, and the dependents are always assumed to be the rest of the table. So far this drops LEFT_JOIN right-equivalence relations that translate to lax-keys after the join, which could opportunistically be converted back to strict keys by downstream operators. If this was a mistake we can undo that, add back in dependent column sets.
      We need to support a handful of operators to use FDs in the join memo:
    • Table scan
    • Cross join
    • Inner join
    • Left join
    • Project (Distinct)
    • Filter
      Missing:
    • Full outer join
    • Synthesized columns
      Additionally:
    • the memo needs to embed equal filters in a format with expression ids
    • join reordering should compute equivalence closures for join edges
    • join selection should use functional dependencies to check if lookup expressions are valid
      Missing practical considerations:
    • when we determine a lookup expression comprises a strict key for a table, we need a way to backfill constants and equivalences used to make that decision
    • filters should maybe be represented in memo selection nodes to support redundancy elimination

vitess

  • 243: Add ignore/replace modifiers to load data
  • 241: Walking sub-nodes for SHOW TABLE statements
    When preparing a SHOW TABLES statement with a bound variable in the filter clause (e.g. SHOW TABLES FROM mydb WHERE Tables_in_mydb = ?;) GMS and Vitess were identifying the bound variable parameters differently and causing the SQL client on the other end to panic. Vitess code in conn.go walks the parsed tree and looks for SQLVal instances to identify the parameters and then returns that metadata over the SQL connection. The SHOW TABLES statement above fails because the sqlparser AST wasn't including all the members of SHOW TABLES node in the walk. This case is a little tricky to test directly in go-mysql-server, because it only repros in a running sql-server when running over a Vitess conn.
    The GMS and Vitess layers are both calculating bind variable metadata, with two different techniques, and whenever they get out of sync, we will see issues like this that only appear when running over a SQL connection. Longer term, we may want consider allowing GMS to return its bind variable metadata and avoid Vitess needing to re-calculate it, if we see more instances of this problem.
    Fixes: dolthub/go-mysql-server#1793

Closed Issues

  • 5982: LOAD DATA does not support replace or ignore options
  • 5678: Feature request: Add --skip-empty to DOLT_COMMIT()
  • 6089: Analyzer Error: failed to replan join: field "id" is not on schema

Latency

Read Tests MySQL Dolt Multiple
covering_index_scan 1.93 2.71 1.4
groupby_scan 12.3 16.71 1.4
index_join 1.18 4.18 3.5
index_join_scan 1.14 2.11 1.9
index_scan 30.26 54.83 1.8
oltp_point_select 0.14 0.47 3.4
oltp_read_only 2.86 8.13 2.8
select_random_points 0.3 0.77 2.6
select_random_ranges 0.35 1.1 3.1
table_scan 30.26 55.82 1.8
types_table_scan 70.55 158.63 2.2
reads_mean_multiplier 2.4
Write Tests MySQL Dolt Multiple
bulk_insert 0.001 0.001 1.0
oltp_delete_insert 5.0 6.32 1.3
oltp_insert 2.52 3.02 1.2
oltp_read_write 6.43 15.55 2.4
oltp_update_index 2.57 3.13 1.2
oltp_update_non_index 2.66 3.07 1.2
oltp_write_only 3.62 7.84 2.2
types_delete_insert 5.0 6.91 1.4
writes_mean_multiplier 1.5
Overall Mean Multiple 2.0

Don't miss a new dolt release

NewReleases is sending notifications on new releases.