Merged PRs

dolt

10183: go: store/datas/pull: pull_chunk_tracker.go: Optimize memory use when backing up to an AWS remote.
PullChunkTracker is responsible for making the HasMany calls against the destination and batching up absent hashes into HashSets which will be delivered to GetManyCompressed and eventually written into table files which are uploaded. This code is used for both pull and push, when the destination is the "local" database or when destination is the remote database respectively. It is used when the remote is both doltremoteapi, thus every HasMany call is an RPC, and when the remote is something like file:// or aws://, thus the table file indexes for the remote are in memory and HasMany calls are very quick.
Different operational characteristics of the various dependencies mean that sometimes a Pull is prone to build up large sets of hashes waiting for HasMany calls, whereas other times it is prone to build up large sets of absent hashes which are waiting for the fetcher thread(s) to take them.
Previously, PullChunkTracker was structured to accumulate HasMany responses and wait to batch them into appropriately-sized batches for GetManyCompressed until the fetcher threads asked for them. This meant that if HasMany batches were very small, because HasMany was very fast, we would accumulate a large number of very small HashSets. These HashSets would take up large amounts of memory. Accumulating the batches as the HasMany responses come in is more memory efficient and should be no slower - we will always accumulate the full batches, and in basically the same order.
Tested by pushing a large database to an AWS remote and memory profiling the result.
10164: #10136: Fix dolt_backup to work in non-Dolt directories
Fixes #10136
- Fix dolt_backup to work in non-Dolt directories; this amends the dolt.go boolean expression for commands that accept non-Dolt directories into a searchable map.
- Remote sql-backup.bats from local-remote.bash so it also gets run against server.

go-mysql-server

3332: Fix create view error message
This fixes: #10177
3328: Avoid underestimating rows in outer joins.
When computing row estimates for joins, if the join can't be optimized into a lookup join or a merge join, we use stats to predict the fraction of pairwise combinations of left and right rows that will match and estimate the number of result rows as leftRows * rightRows * selectivity.
This is correct for inner joins, but not correct for outer joins, because left joins guarantee at least one result per left row, and full outer joins guarentee at least one result per left or right row.
Consider a left join where left.RowCount() is much greater than right.RowCount(), and every row of the relevant column is distinct (so left.RowCount() == left.DistinctCount(). In that case, selectivity == 1.0 / left.RowCount(), and the estimated cardinality is equal to:
left.RowCount() * right.RowCount() * selectivity == left.RowCount() * right.RowCount() * (1.0 / left.RowCount()) ==right.RowCount().
If the selectivity of the join is very small, this could result in a row estimate that is lower than the guaranteed minimum, which can cause the join planner to pick bad plans. In the worst case it could cause us to favor an unoptimizable join order over an optimizable one.
A common impact of this change is to now favor hash joins for left joins when the right is much smaller than the left. This makes sense: iterating over the smaller right table once and building a hash table in memory is going to be much faster than doing a table lookup for each left row.

Closed Issues

9520: PRIMARY KEY isn't always used in left joins
10177: need better error message when creating view with conflicting name
10176: panic during dolt_rebase: panic: expected false
10136: DOLT_BACKUP Restore Requires Existing Database Context and Service Restart to Recognize New Database
10157: Unexpected ANTI JOIN Result
10086: Dolt Unable to Resolve Default Branch Head Error

Performance

Read Tests	MySQL	Dolt	Multiple
covering_index_scan	1.86	0.55	0.3
groupby_scan	13.7	12.08	0.88
index_join	1.52	1.96	1.29
index_join_scan	1.47	1.34	0.91
index_scan	35.59	22.28	0.63
oltp_point_select	0.2	0.28	1.4
oltp_read_only	3.82	5.28	1.38
select_random_points	0.35	0.58	1.66
select_random_ranges	0.39	0.57	1.46
table_scan	35.59	27.66	0.78
types_table_scan	80.03	65.65	0.82
reads_mean_multiplier			1.05

Write Tests	MySQL	Dolt	Multiple
oltp_delete_insert	8.43	6.55	0.78
oltp_insert	4.18	3.19	0.76
oltp_read_write	9.22	11.65	1.26
oltp_update_index	4.18	3.25	0.78
oltp_update_non_index	4.25	3.19	0.75
oltp_write_only	5.28	6.32	1.2
types_delete_insert	8.58	6.91	0.81
writes_mean_multiplier			0.91

TPC-C TPS Tests	MySQL	Dolt	Multiple
tpcc-scale-factor-1	93.62	36.19	2.59
tpcc_tps_multiplier			2.59

Overall Mean Multiple	1.52

dolthub/dolt v1.78.8 1.78.8 on GitHub

Merged PRs

dolt

go-mysql-server

Closed Issues

Performance

dolthub/dolt v1.78.8
1.78.8

on GitHub