Contained in this release
- remote performance improvements (clone, push, and pull)
- better support for MySQL in server mode, including
DROP
,UPDATE
,INSERT
- SQL performance improvement
- diff summary
- more metrics
- other assorted bug fixes and improvements
If you find any bugs, have a feature request, or an interesting use-case, please raise an issue.
Merged PRs
- 114: go/libraries/doltcore/sqle: types: Make SqlValToNomsVal compile for 32bit by checking for overflow on uint -> int64 differently.
- 112: Zachmu/drop table
- 110: go/utils/checkcommitters: Oscar is an allowed committer and author.
- 109: attempted deadlock fix
- 108: Correct the installation instructions
- 105: dolt diff --summary
Example output using Liquidata/tatoeba-sentence-translations:
Fixes #77$ dolt diff --summary rnfm50gmumlettuebt2latmer617ni3t diff --dolt a/sentences b/sentences --- a/sentences @ gd1v6fsc04k5676c105d046m04hla3ia +++ b/sentences @ 2ttci8id13mijhv8u94qlioqegh7lgpo 7,800,102 Rows Unmodified (99.99%) 15,030 Rows Added (0.19%) 108 Rows Deleted (0.00%) 960 Rows Modified (0.01%) 1,888 Cells Modified (0.00%) (7,801,170 Entries vs 7,816,092 Entries) diff --dolt a/translations b/translations --- a/translations @ p2355o6clst8ssvr9jha2bfgqbrstkmm +++ b/translations @ 62ri8lmohbhs1mc01m9o4rbvj6rbl8ee 5,856,845 Rows Unmodified (90.91%) 468,173 Rows Added (7.27%) 578,242 Rows Deleted (8.98%) 7,626 Rows Modified (0.12%) 7,626 Cells Modified (0.06%) (6,442,713 Entries vs 6,332,494 Entries)
- 104: Bh/output updates3
- 103: dolt/go/store: Stop panicing on sequence walks when expected hashes are not in the ValueReader.
- 101: go/{store,libraries/doltcore/remotestorage}: Make the code peddling in nbs table file formats a little more explicit about it.
- 100: newline changes
- 99: Implemented UPDATE
I think we should delete the old SQL methods that are in thesql.go
file. I know at first you mentioned keeping them there for reference, but they're not being used at all at this point, and they're still in git history if we want to look at them again in the future for some reason. It's clutter at this point.
I'm skipping that one test at the end because of a WHERE decision ingo-mysql-server
. The code looks intentional, in that converting strings to ints will return 0 if the string is not parsable. I'll file it as a non-conforming bug on their end, but for now I'm skipping the test. - 98: Bh/output updates
- 97: store/{nbs,chunks}: Make ChunkStore#GetMany{,Compressed} take send-only channels.
- 96: update status messages for push/pull
- 94: Update README.md
Ensure that installing from source is properly documented, including go-gotchas. - 93: Reverts the revert of my push/pull changes with fixes.
- 92: content length fix
- 91: go: store/nbs: table_reader: getManyAtOffsetsWithReadFunc: Stop unbounded I/O parallelism in GetMany implementation.
When we do things like push, pull or (soon-to-be) garbage collection, we have large sets of Chunk addresses that we pass intoChunkStore#GetMany
and then go off and process. Clients largely try to control the memory overhead and pipeline depth by passing in a buffered channel of an appropriate size. The expectation is that the implementation ofGetMany
will have an amount of data in flight at any give in time that is in some reasonable way proportional to the channel size.
In the current implementation, there is unbounded concurrency on the read destination allocations and the reads themselves, with one go routine spawned for each byte range we want to read. This results in absolutely massive (virtual) heap utilization and unreasonable I/O parallelism and context switch thrashing in large repo push/pull situations.
This is a small PR to change the concurrency paradigm insidegetManyAtOffsetsWithReadFunc
so that we only have 4 concurrent dispatched reads pertable_reader
instance at a time.
This is still not the behavior we actually want.- I/O concurrency should be configurable at the ChunkStore layer (or eventually per-device backing a set of
tableReader
s), and not depend on the number oftableReader
s which happen to back the chunk store. - Memory overhead is still not correctly bounded here, since read ahead batches are allowed to grow to arbitrary sizes. Reasonable bounds on memory overhead should be configurable at the ChunkStore layer.
I'm landing this as a big incremental improvement over status quo. Here are some non-reproducible one-shot test results from a test program. The test program walks the entire chunk graph, assembles every chunk address, and then does aGetManyCompressed
on every chunk address and copies their contents to/dev/null
. It was run on a ~10GB (compressed) data set:
Before:
After:$ /usr/bin/time -l -- go run test.go ... MemStats: Sys: 16628128568 161.29 real 67.29 user 456.38 sys 5106425856 maximum resident set size 0 average shared memory size 0 average unshared data size 0 average unshared stack size 10805008 page reclaims 23881 page faults 0 swaps 0 block input operations 0 block output operations 0 messages sent 0 messages received 8 signals received 652686 voluntary context switches 21071339 involuntary context switches
On these runs, sys time, wallclock time, vm page reclaims and virtual memory used are all improved pretty substantially.$ /usr/bin/time -l -- go run test.go ... MemStats: Sys: 4590759160 32.17 real 30.53 user 29.62 sys 4561879040 maximum resident set size 0 average shared memory size 0 average unshared data size 0 average unshared stack size 1228770 page reclaims 67100 page faults 0 swaps 0 block input operations 0 block output operations 0 messages sent 0 messages received 14 signals received 456898 voluntary context switches 2954503 involuntary context switches
Very open to feedback and discussion of potential performance regressions here, but I think this is an incremental win for now. - I/O concurrency should be configurable at the ChunkStore layer (or eventually per-device backing a set of
- 90: Implemented REPLACE
Mostly tests since this just uses theDelete
andInsert
functions that we already have. The previous delete would ignore a delete on a non-existent row, so I just changed it to throw the correct error if the row does not exist so thatREPLACE
works properly now (else it will always say aREPLACE
did both a delete & insert). - 89: Push and Pull v2
- 88: Add metrics attributes
Similar to previous PR db/event-metrics, but this time, no byte measurements onclone
as the implementation is different. Some things in the events package have been refactored to prevent circular dependencies. AddingStandardAttributes
will help me generate the info for my new metrics. - 87: {go, bats}: Replace table works with file with schema in different order
- 86: dolt table import -r
Fixes #76
Replaces existing table with the contents of the file while preserving the original schema - 85: Bh/cmp chunks
- 84: revert nil check and always require stats to match aws behavior
- 83: Bh/clone2
This version of clone works on the table files directly. It enumerates all the table files and downloads them. It does not inspect the chunks as v1 did. - 82: Naked deletes now just delete everything instead of iterating
I mean this works but it's ugly and I'm not sure of a better way to do it really - 81: Progress on switching deletes to new engine
Currently works for deletes but not thoroughly testing. - 80: go/store/nbs: store.go: Make global index cache 64MB instead of 8MB.
- 79: Removed skips for tests that will now work
This will fail for now, waiting on dolthub/go-mysql-server#10 to be approved before I merge this in. Super small stuff though. - 73: go/libraries/doltcore/remotestorage: Add the ability to have a noop cache on DoltChunkStore.
- 72: proto: Use fully qualified paths for go_packages.
This allows cross-package references within proto files to work appropriately. - 71: Db/events dir lock
initial implementation of making event flush concurrency safe - 70: go/store/spec: Move to aws://[table:bucket] for NBS on AWS specs because of Go URL parsing changes.
See https://go.googlesource.com/go/+/61bb56ad63992a3199acc55b2537c8355ef887b6
for context on the changes. - 69: proto: remotesapi: chunkstore: Update message names and fields to clarify between chunk hashes on downloads and table file hashes on uploads.
- 68: doltcore: commitwalk: Implement GetDotDotRevisions.
Roughly mimicsgit log master..feature
. Useful for displaying the commit log
of a pull request, for example. - 67: Add file emitter that writes event data file
Added file emitter that saves event data to files, and a flush that parses the files and sends them to the grpc server. - 63: Update README.md
@timsehn pointed out a shortcoming in the README file. - 7: Merge upstream master
- 6: Fixed bug in comparisons for negative float literals
- 5: Zachmu/is true
- 4: Instead of adding offset to rowCount, just reverse the wrapping betwe…
…en offset and limit nodes. - 3: Zachmu/float bugfixes
- 2: Zachmu/limit bug fixes
- 1: Replace of vitess dependency with our forked one, and commented local…
… override