github dolthub/dolt v1.78.8
1.78.8

latest release: v1.79.0
3 days ago

Merged PRs

dolt

  • 10183: go: store/datas/pull: pull_chunk_tracker.go: Optimize memory use when backing up to an AWS remote.
    PullChunkTracker is responsible for making the HasMany calls against the destination and batching up absent hashes into HashSets which will be delivered to GetManyCompressed and eventually written into table files which are uploaded. This code is used for both pull and push, when the destination is the "local" database or when destination is the remote database respectively. It is used when the remote is both doltremoteapi, thus every HasMany call is an RPC, and when the remote is something like file:// or aws://, thus the table file indexes for the remote are in memory and HasMany calls are very quick.
    Different operational characteristics of the various dependencies mean that sometimes a Pull is prone to build up large sets of hashes waiting for HasMany calls, whereas other times it is prone to build up large sets of absent hashes which are waiting for the fetcher thread(s) to take them.
    Previously, PullChunkTracker was structured to accumulate HasMany responses and wait to batch them into appropriately-sized batches for GetManyCompressed until the fetcher threads asked for them. This meant that if HasMany batches were very small, because HasMany was very fast, we would accumulate a large number of very small HashSets. These HashSets would take up large amounts of memory. Accumulating the batches as the HasMany responses come in is more memory efficient and should be no slower - we will always accumulate the full batches, and in basically the same order.
    Tested by pushing a large database to an AWS remote and memory profiling the result.
  • 10164: #10136: Fix dolt_backup to work in non-Dolt directories
    Fixes #10136
    • Fix dolt_backup to work in non-Dolt directories; this amends the dolt.go boolean expression for commands that accept non-Dolt directories into a searchable map.
    • Remote sql-backup.bats from local-remote.bash so it also gets run against server.

go-mysql-server

  • 3332: Fix create view error message
    This fixes: #10177
  • 3328: Avoid underestimating rows in outer joins.
    When computing row estimates for joins, if the join can't be optimized into a lookup join or a merge join, we use stats to predict the fraction of pairwise combinations of left and right rows that will match and estimate the number of result rows as leftRows * rightRows * selectivity.
    This is correct for inner joins, but not correct for outer joins, because left joins guarantee at least one result per left row, and full outer joins guarentee at least one result per left or right row.
    Consider a left join where left.RowCount() is much greater than right.RowCount(), and every row of the relevant column is distinct (so left.RowCount() == left.DistinctCount(). In that case, selectivity == 1.0 / left.RowCount(), and the estimated cardinality is equal to:
    left.RowCount() * right.RowCount() * selectivity == left.RowCount() * right.RowCount() * (1.0 / left.RowCount()) ==right.RowCount().
    If the selectivity of the join is very small, this could result in a row estimate that is lower than the guaranteed minimum, which can cause the join planner to pick bad plans. In the worst case it could cause us to favor an unoptimizable join order over an optimizable one.
    A common impact of this change is to now favor hash joins for left joins when the right is much smaller than the left. This makes sense: iterating over the smaller right table once and building a hash table in memory is going to be much faster than doing a table lookup for each left row.

Closed Issues

  • 9520: PRIMARY KEY isn't always used in left joins
  • 10177: need better error message when creating view with conflicting name
  • 10176: panic during dolt_rebase: panic: expected false
  • 10136: DOLT_BACKUP Restore Requires Existing Database Context and Service Restart to Recognize New Database
  • 10157: Unexpected ANTI JOIN Result
  • 10086: Dolt Unable to Resolve Default Branch Head Error

Performance

Read Tests MySQL Dolt Multiple
covering_index_scan 1.86 0.55 0.3
groupby_scan 13.7 12.08 0.88
index_join 1.52 1.96 1.29
index_join_scan 1.47 1.34 0.91
index_scan 35.59 22.28 0.63
oltp_point_select 0.2 0.28 1.4
oltp_read_only 3.82 5.28 1.38
select_random_points 0.35 0.58 1.66
select_random_ranges 0.39 0.57 1.46
table_scan 35.59 27.66 0.78
types_table_scan 80.03 65.65 0.82
reads_mean_multiplier 1.05
Write Tests MySQL Dolt Multiple
oltp_delete_insert 8.43 6.55 0.78
oltp_insert 4.18 3.19 0.76
oltp_read_write 9.22 11.65 1.26
oltp_update_index 4.18 3.25 0.78
oltp_update_non_index 4.25 3.19 0.75
oltp_write_only 5.28 6.32 1.2
types_delete_insert 8.58 6.91 0.81
writes_mean_multiplier 0.91
TPC-C TPS Tests MySQL Dolt Multiple
tpcc-scale-factor-1 93.62 36.19 2.59
tpcc_tps_multiplier 2.59
Overall Mean Multiple 1.52

Don't miss a new dolt release

NewReleases is sending notifications on new releases.