Merged PRs
dolt
- 7386: Cleaner clone CLI
Fixes #7043 for the CLI.
dolt_clone() procedure has not been touched by this PR. - 7377: Push tags on remote replication
Fixes #7375 - 7243: Write stats to disk and
dolt_statistics
table
New flatbuffer type for statistics, which contains an address reference to a prolly.Map with a hardcoded statistics schema. Statistics for a table are rewritten every ANALYZE TABLE call.
Because the statistics schema is client-dependent, backwards compatibility will be fairly brittle right now. We could use a version identifier to invalidate a set of statistics if the client code creating a sql engine has a different version than the statistics read from disk.
go-mysql-server
- 2279: fix count distinct with decimals
Swapped out the library used in the CountDistinct aggregation function, as it would hash decimals to the same value.
Correctness: #7374 - 2278: RangeHeapJoin should consistently sort NULL values before non-NULL values while managing its heap.
Fixes #7260
This was ultimately caused by dolthub/go-mysql-server#1903. I didn't think it was possible for that issue to cause user-facing problems, but I was wrong. Because of that issue, RangeHeapJoins considered all NULL values in its children iterators to come after all non-NULL values. However, if the child node was an index, then the child iterator would order its rows with the NULL values first. This causes the RangeHeapIterator to mismanage the heap and skip rows that should have been in the results.
I updated the range heap code to manually check for NULL values when manipulating the heap. I also updated the plan tests to include NULL values in the test tables, which should now catch this issue. - 2274: Fixup index selection when prefix not complete
Consider a querySELECT * from t where b = 1 and c = 1
and two indexes,(a,b,c)
and(b,c)
. We want to use the(b,c)
index as a lookup, because(a,b,c)
will be disjoint on a(b,c)
key. This PR fixes index costing to record and prefer non-zero prefix matches. We only differentiate zero and non-zero cases here because it is easier and I think pretty reliable. - 2247: Merge joins populate join stats
When merge joins are a join operator for a memo group, use the two indexes in the merge to estimate the join cardinality. Small updates so that join cardinality estimates work in the coster. A few tests that make use of join statistics. The tests are affected both by stat estimates and costing methodology. It's a bit hard to separate the two, since more accurate stat estimates so often identify issues with costing. The join statistic tests are subject to shifting based on whether the smallest table is estimated to be smaller than the smallest join cardinality estimate. Better tests would be less subject to noise. Tests for avoiding anti-patterns for specific join operators would also be useful.
Closed Issues
- 7043:
dolt clone
can include extraneous remote refs - 7375: new tags not automatically being pushed to remote for remote primary
- 7260: Unexpected Results when Using
BETWEEN AND
afterCREATE INDEX
- 7348: Add "Alter User" and "Set Password" SQL Statements
Latency
Read Tests | MySQL | Dolt | Multiple |
---|---|---|---|
covering_index_scan | 2.11 | 2.81 | 1.3 |
groupby_scan | 13.22 | 17.63 | 1.3 |
index_join | 1.32 | 5.0 | 3.8 |
index_join_scan | 1.25 | 2.14 | 1.7 |
index_scan | 34.33 | 62.19 | 1.8 |
oltp_point_select | 0.17 | 0.46 | 2.7 |
oltp_read_only | 3.36 | 7.98 | 2.4 |
select_random_points | 0.33 | 0.74 | 2.2 |
select_random_ranges | 0.39 | 0.9 | 2.3 |
table_scan | 34.33 | 63.32 | 1.8 |
types_table_scan | 74.46 | 170.48 | 2.3 |
reads_mean_multiplier | 2.1 |
Write Tests | MySQL | Dolt | Multiple |
---|---|---|---|
oltp_delete_insert | 5.67 | 6.09 | 1.1 |
oltp_insert | 2.76 | 2.97 | 1.1 |
oltp_read_write | 7.3 | 15.27 | 2.1 |
oltp_update_index | 2.81 | 3.07 | 1.1 |
oltp_update_non_index | 2.97 | 3.02 | 1.0 |
oltp_write_only | 4.03 | 7.43 | 1.8 |
types_delete_insert | 5.47 | 6.67 | 1.2 |
writes_mean_multiplier | 1.3 |
Overall Mean Multiple | 1.7 |
---|