Merged PRs
dolt
- 11124: integration-tests/go-sql-server-driver: no_table_files_read_test.go: Add test cases for multi-database usage.
- 11119: Do not make
dolt_difftable indexes on nil schemas
CallingMakeDiffTableIndexon a nil schema was causing a panic. This would happen if a table existed on one branch but not on the other. - 11118: enforce foreign keys when DELETING through dolt_workspace_TABLE
Workspace table deletes were previously using a simple tableWriter, which went around the FK checks we make during typical updates.
Depends On: dolthub/go-mysql-server#3566
Fixes: #11112 - 11117: go/store/nbs: Lazily load chunk journal database resources when constructing a NomsBlockStore.
Most Dolt commands currently construct a MultiRepoEnv and then call
into the SQL layer to perform the actual work. In turn, the SQL layer
has some logic which looks at the state of the loaded databases and
sees whether the process has ended up loading some of them in ReadOnly
mode. If it has, then in some cases the Dolt process will attempt to
connect to a runningdolt sql-serverinstead of doing the work
against the database files itself.
This PR makes is to that the application code paths which inspect
AccessMode == ReadOnly can get that signal and make the attempt to
reach out to the server before all of the database state has to be
loaded from disk. This greatly improves the performance of running
doltCLI commands against a runningdolt sql-server, from the same
directory as the server data-dir. - 11101: go/store/nbs: Clean up some manifest access patterns.
Removes unnecessary manifestManager and its global cache and in-process locks. - 11100: handle extended types during prollyRange construction
benchmarks: dolthub/doltgresql#2758 (comment) - 11085: add where predicate feature to index (partial index)
go-mysql-server
- 3567: fix for division on decimals
BenchmarkPlusHighScaleDecimals:
BenchmarkMinusHighScaleDecimals:before (decimal.Decimal) 2505679 460.8 ns/op after (apd.Decimal) 3746318 304.5 ns/op
BenchmarkMultHighScaleDecimals:before (decimal.Decimal) 2619691 449.0 ns/op after (apd.Decimal) 3723363 312.1 ns/op
BenchmarkDivHighScaleDecimals:before (decimal.Decimal) 2537882 454.6 ns/op after (apd.Decimal) 3608662 321.3 ns/op
BenchmarkDivManyDecimals:before (decimal.Decimal) 793515 1414 ns/op before fix (apd.Decimal) 367 3112350 ns/op after fix (apd.Decimal) 946606 1246 ns/opbefore (decimal.Decimal) 130582 7691 ns/op before fix (apd.Decimal) 54 21534736 ns/op after fix (apd.Decimal) 110416 9347 ns/op - 3566: analyzer: export BuildForeignKeyEditor for callers outside DML analysis
Refactors the internal FK-editor builder chain (getForeignKeyEditor, getForeignKeyReferences, getForeignKeyRefActions, cache.GetUpdater, getForeignKeyHandlerFromUpdateTarget) to take *Catalog instead of *Analyzer since only a.Catalog was ever used. Adds an exported BuildForeignKeyEditor entry point so integrators can wrap a writer in a ForeignKeyEditor when their write path doesn't go through the analyzer (e.g. dolt's workspace tables). - 3565: Allow unparenthesized function expressions in column defaults.
MariaDB allows the use of a function expression as a column default expressions without requiring it to be wrapped in parentheses. MySQL does not allow this.
In order to parse scripts created for MariaDB, we should also accept unwrapped function expressions as column defaults. However, in order to ensure that dumps created by Dolt are compatible with MySQL, we normalize these expressions by wrapping them in parentheses internally. - 3561: fix(rowexec): dispose CachedResults child when closing hashLookupGeneratingIter (#3560)
Summary
hashLookupGeneratingIter.Closereturnednilwithout closing the child iter chain or disposing the wrapped*plan.CachedResultsnode. Every executed hash join over aCachedResultssubtree could therefore leak one entry intoAnalyzer.CachedResultsManager.cachedResultsCaches, freed only by process restart.
This PR makesClose:- Dispose the
CachedResultschild if present (releases the global Manager entry). - Propagate
Closedown the rest of the child iter chain.
Fix
func (h *hashLookupGeneratingIter) Close(c *sql.Context) error { - return nil + if cr, ok := h.n.Child.(*plan.CachedResults); ok { + cr.Dispose(c) + } + return h.childIter.Close(c) }Why
Full RCA, reproducer, and pprof evidence are in #3560. Headline numbers from the 3-minute, ~30 req/sec workload reported there (Dolt 2.0.6 sql-server vendoring this commit):Metric Vanilla Δ Patched Δ Improvement RSS +5069 MB +44 MB ~115× HeapAlloc (live) +4.3 GB +82 MB ~52× HeapObjects (live) +90 million +1.5 million ~60× Mallocs - Frees(net live)+204 million +4 million ~50× Allocation rate is unchanged (~1.1B mallocs in 3 min both runs); the patched build services the same workload but Freeskeeps up withMallocs.Tests
go test ./sql/rowexec/...passes locally with the patch applied.
I did not include a unit-level regression test in this PR. In a minimalengine.Query→RowIterToRowstest,TrackedRowIter.Close → done → disposeNodealready walks the plan tree and disposes theCachedResultsviasql.Dispose(ctx, node), so the leak doesn't reproduce that way; faithfully exercising the production path (Dolt sql-server, prepared statements via the MySQL wire protocol) needs significantly more harness scaffolding. Happy to add a test if maintainers can point me at the right entry point.
Closes #3560 - Dispose the
- 3553: add partial index on WHERE predicate
- 3537: sql: opt-in SQL trace redaction
Summary
Adds an opt-in redaction layer for the SQL text and identifiers that get attached to OpenTelemetry span attributes — the planbuilder parse-spanqueryattribute and the rowexectable/left/right/indexattributes. When enabled via a newsql.Contextoption, identifiers and literal values are rewritten into stable, low-entropy tokens (n1,n2, ...,v1,v2, ...) that repeat across queries of the same shape so trace storage compresses well.
Use case: multi-tenant SQL servers (Dolt is one, our own dsabstraction layer is another) where tracing data flows to shared long-term storage. Operators want privacy regulation compliance and tenant isolation without giving up trace visibility entirely.
Default off to preserve backward compatibility — existing trace consumers readingquery,table,left,right, etc. verbatim see no change. Callers opt in withsql.WithTraceRedaction(true).How it works
Two passes over the same SQL, deliberately combined:- Parse to AST, walk it to collect every
TableIdentandColIdentstring. The grammar is the authority on which lexemes are identifiers — necessary because vitess treats hundreds of words (NAME,USER,DATA,STATUS, ...) as non-reserved keywords: the lexer emits a keyword token type for them, but real queries use them as bare column / table names. - Lex the SQL and emit token-by-token. Literal token types (
STRING,INTEGRAL,FLOAT,HEX,HEXNUM,BIT_LITERAL) redact as values.IDtokens redact as identifiers.COMMENTtokens drop.VALUE_ARG/LIST_ARGpass through. Any other token (keyword, punctuation, multi-char operator) emits structurally — unless itsvalis in the identifier set from step 1, in which case it redacts as an identifier (this catches the non-reserved-keyword-as-column-name case).
Coverage is bounded by the grammar's notion of identifier (TableIdent+ColIdent) and the lexer's fixed set of literal token types — not by an evolving list of AST node fields. Future grammar / AST shape changes are covered automatically as long as they go throughWalkand produce the same lexer token classes.
This is deliberately distinct fromsqlparser.Normalize/RedactSQLQuery, which is a plan-cache primitive that parameterizes literals only, dedupes only inside SELECT, and skips long values for CPU. The redactor here treats traces as a privacy boundary: every high-cardinality token is rewritten, always dedupes, drops comments, and falls back to<unparseable>on parse failure rather than risk a partial redaction.
Examples
Before: SELECT * FROM t WHERE c1 = 'could_be_sensitive' After: SELECT * FROM `n1` WHERE `n2` = 'v1'Before: SELECT u.name, u.email FROM users AS u After: SELECT `n1` . `n2` , `n1` . `n3` FROM `n4` AS `n1`nameis a non-reserved keyword in the vitess grammar — lexed with a keyword token type, notID. A pure lexer-driven redactor would leak it. The hybrid grammar+lex approach catches it.
CTE column-rename list, USING column list — both areBefore: WITH x (renamed_col) AS (SELECT a FROM t) SELECT * FROM x JOIN y USING (sensitive_join_col) After: WITH `n1` ( `n2` ) AS ( SELECT `n3` FROM `n4` ) SELECT * FROM `n1` JOIN `n5` USING ( `n6` )[]ColIdentslices on pointer parents. Picked up via the AST walk's identifier collection, redacted at lex time.
Both margin and inline comments are dropped (the lexer emitsBefore: /* PII: alice@example.com */ SELECT 1 FROM t /* inline */ After: SELECT :v1 FROM `n1`COMMENTfor both — drop happens at the lex pass).Concurrency
TheMappingis eagerly allocated byNewContextwhen redaction is enabled (never lazy after construction), so concurrent rowexec spans firing while the parser populates the mapping never race on lazy initialization. The Mapping itself uses an internalsync.RWMutex— the fast path for already-minted tokens takesRLock, mints takeLockwith a re-check after upgrade.
A newTestMapping_ConcurrentRedactIsRaceFreeexercises the production usage pattern (parent populates during parse, many goroutines read with occasional mint-on-miss) under-racefor 500 iterations × 32 goroutines.Public API
// sql.Context option (default off) sql.WithTraceRedaction(enabled bool) ContextOption // Methods on *Context (*Context).TraceRedactionEnabled() bool (*Context).RedactionMapping() *sqlredact.Mapping (*Context).RedactQueryForTrace(query string) string (*Context).RedactNameForTrace(name string) string (*Context).RedactStringerForTrace(v interface{ String() string }) string // New subpackage sqlredact.RedactSQLForTrace(sql string) (redacted string, m *Mapping, err error) sqlredact.RedactSQLForTraceInto(sql string, m *Mapping) (string, error) sqlredact.NewMapping() *Mapping
Commits
sqlredact: SQL trace redaction for span attributes (opt-in)— newsql/sqlredactpackage (parse + lex hybrid, theMappingwithsync.RWMutex), newsql.Contextoption and helpers, tests including the concurrency stress.trace: redact SQL identifiers in span attributes— wiresplanbuilder/parse.go,rowexec/rel.go,rowexec/join_iters.go,rowexec/range_heap_iter.go, androwexec/ddl_iters.gothrough the new helpers.
Test plan
-
go build ./...for./sqland./sql/sqlredactclean -
go test ./sql ./sql/sqlredactall pass (incl. testify assertions and the concurrency test under-race) -
gofmt -lclean on touched files -
enginetestend-to-end coverage — not yet added; perCONTRIBUTING.mdthis is expected. Happy to add a query+expected-redacted-trace fixture if there's an existing pattern for span-attribute assertions, or open to guidance on the right scaffolding.
Open questions for review
- Token format
n1/v1/:v1/X'v1'— chosen to be short and repeating for compression density. Happy to adopt a different convention (e.g._redacted_n1) if the project prefers more obviously synthetic markers. - The
<redacted>placeholder forLIMIT/OFFSETStringer attrs drops the value entirely under redaction (see commit message rationale). Open to alternative: re-tokenize the fragment text against the mapping, accepting that pushdown-generated synthetic identifiers won't have entries. - Any existing convention for end-to-end tests of span attributes that I should hook into?
🤖 Generated with Claude Code
- Parse to AST, walk it to collect every
vitess
- 470: Expand the set of function call expressions that can be used as a column default value without requiring parentheses.
These forms are not valid MySQL but are accepted by MariaDB. We should also accept them in order to have better compatibility with schemas created for MariaDB. - 469: add predicate to indexspec