github dolthub/dolt v2.0.8
2.0.8

16 hours ago

Merged PRs

dolt

  • 11124: integration-tests/go-sql-server-driver: no_table_files_read_test.go: Add test cases for multi-database usage.
  • 11119: Do not make dolt_diff table indexes on nil schemas
    Calling MakeDiffTableIndex on a nil schema was causing a panic. This would happen if a table existed on one branch but not on the other.
  • 11118: enforce foreign keys when DELETING through dolt_workspace_TABLE
    Workspace table deletes were previously using a simple tableWriter, which went around the FK checks we make during typical updates.
    Depends On: dolthub/go-mysql-server#3566
    Fixes: #11112
  • 11117: go/store/nbs: Lazily load chunk journal database resources when constructing a NomsBlockStore.
    Most Dolt commands currently construct a MultiRepoEnv and then call
    into the SQL layer to perform the actual work. In turn, the SQL layer
    has some logic which looks at the state of the loaded databases and
    sees whether the process has ended up loading some of them in ReadOnly
    mode. If it has, then in some cases the Dolt process will attempt to
    connect to a running dolt sql-server instead of doing the work
    against the database files itself.
    This PR makes is to that the application code paths which inspect
    AccessMode == ReadOnly can get that signal and make the attempt to
    reach out to the server before all of the database state has to be
    loaded from disk. This greatly improves the performance of running
    dolt CLI commands against a running dolt sql-server, from the same
    directory as the server data-dir.
  • 11101: go/store/nbs: Clean up some manifest access patterns.
    Removes unnecessary manifestManager and its global cache and in-process locks.
  • 11100: handle extended types during prollyRange construction
    benchmarks: dolthub/doltgresql#2758 (comment)
  • 11085: add where predicate feature to index (partial index)

go-mysql-server

  • 3567: fix for division on decimals
    BenchmarkPlusHighScaleDecimals:
    before (decimal.Decimal)   2505679               460.8 ns/op
    after (apd.Decimal)        3746318               304.5 ns/op
    
    BenchmarkMinusHighScaleDecimals:
    before (decimal.Decimal)   2619691               449.0 ns/op
    after (apd.Decimal)        3723363               312.1 ns/op
    
    BenchmarkMultHighScaleDecimals:
    before (decimal.Decimal)   2537882               454.6 ns/op
    after (apd.Decimal)        3608662               321.3 ns/op
    
    BenchmarkDivHighScaleDecimals:
    before (decimal.Decimal)   793515               1414 ns/op
    before fix (apd.Decimal)   367                  3112350 ns/op
    after fix (apd.Decimal)    946606               1246 ns/op
    
    BenchmarkDivManyDecimals:
    before (decimal.Decimal)   130582               7691 ns/op
    before fix (apd.Decimal)   54                   21534736 ns/op
    after fix (apd.Decimal)    110416               9347 ns/op
    
  • 3566: analyzer: export BuildForeignKeyEditor for callers outside DML analysis
    Refactors the internal FK-editor builder chain (getForeignKeyEditor, getForeignKeyReferences, getForeignKeyRefActions, cache.GetUpdater, getForeignKeyHandlerFromUpdateTarget) to take *Catalog instead of *Analyzer since only a.Catalog was ever used. Adds an exported BuildForeignKeyEditor entry point so integrators can wrap a writer in a ForeignKeyEditor when their write path doesn't go through the analyzer (e.g. dolt's workspace tables).
  • 3565: Allow unparenthesized function expressions in column defaults.
    MariaDB allows the use of a function expression as a column default expressions without requiring it to be wrapped in parentheses. MySQL does not allow this.
    In order to parse scripts created for MariaDB, we should also accept unwrapped function expressions as column defaults. However, in order to ensure that dumps created by Dolt are compatible with MySQL, we normalize these expressions by wrapping them in parentheses internally.
  • 3561: fix(rowexec): dispose CachedResults child when closing hashLookupGeneratingIter (#3560)

    Summary

    hashLookupGeneratingIter.Close returned nil without closing the child iter chain or disposing the wrapped *plan.CachedResults node. Every executed hash join over a CachedResults subtree could therefore leak one entry into Analyzer.CachedResultsManager.cachedResultsCaches, freed only by process restart.
    This PR makes Close:
    1. Dispose the CachedResults child if present (releases the global Manager entry).
    2. Propagate Close down the rest of the child iter chain.

    Fix

    func (h *hashLookupGeneratingIter) Close(c *sql.Context) error {
    -    return nil
    +    if cr, ok := h.n.Child.(*plan.CachedResults); ok {
    +        cr.Dispose(c)
    +    }
    +    return h.childIter.Close(c)
    }

    Why

    Full RCA, reproducer, and pprof evidence are in #3560. Headline numbers from the 3-minute, ~30 req/sec workload reported there (Dolt 2.0.6 sql-server vendoring this commit):
    Metric Vanilla Δ Patched Δ Improvement
    RSS +5069 MB +44 MB ~115×
    HeapAlloc (live) +4.3 GB +82 MB ~52×
    HeapObjects (live) +90 million +1.5 million ~60×
    Mallocs - Frees (net live) +204 million +4 million ~50×
    Allocation rate is unchanged (~1.1B mallocs in 3 min both runs); the patched build services the same workload but Frees keeps up with Mallocs.

    Tests

    go test ./sql/rowexec/... passes locally with the patch applied.
    I did not include a unit-level regression test in this PR. In a minimal engine.QueryRowIterToRows test, TrackedRowIter.Close → done → disposeNode already walks the plan tree and disposes the CachedResults via sql.Dispose(ctx, node), so the leak doesn't reproduce that way; faithfully exercising the production path (Dolt sql-server, prepared statements via the MySQL wire protocol) needs significantly more harness scaffolding. Happy to add a test if maintainers can point me at the right entry point.
    Closes #3560
  • 3553: add partial index on WHERE predicate
  • 3537: sql: opt-in SQL trace redaction

    Summary

    Adds an opt-in redaction layer for the SQL text and identifiers that get attached to OpenTelemetry span attributes — the planbuilder parse-span query attribute and the rowexec table/left/right/index attributes. When enabled via a new sql.Context option, identifiers and literal values are rewritten into stable, low-entropy tokens (n1, n2, ..., v1, v2, ...) that repeat across queries of the same shape so trace storage compresses well.
    Use case: multi-tenant SQL servers (Dolt is one, our own dsabstraction layer is another) where tracing data flows to shared long-term storage. Operators want privacy regulation compliance and tenant isolation without giving up trace visibility entirely.
    Default off to preserve backward compatibility — existing trace consumers reading query, table, left, right, etc. verbatim see no change. Callers opt in with sql.WithTraceRedaction(true).

    How it works

    Two passes over the same SQL, deliberately combined:
    1. Parse to AST, walk it to collect every TableIdent and ColIdent string. The grammar is the authority on which lexemes are identifiers — necessary because vitess treats hundreds of words (NAME, USER, DATA, STATUS, ...) as non-reserved keywords: the lexer emits a keyword token type for them, but real queries use them as bare column / table names.
    2. Lex the SQL and emit token-by-token. Literal token types (STRING, INTEGRAL, FLOAT, HEX, HEXNUM, BIT_LITERAL) redact as values. ID tokens redact as identifiers. COMMENT tokens drop. VALUE_ARG / LIST_ARG pass through. Any other token (keyword, punctuation, multi-char operator) emits structurally — unless its val is in the identifier set from step 1, in which case it redacts as an identifier (this catches the non-reserved-keyword-as-column-name case).
      Coverage is bounded by the grammar's notion of identifier (TableIdent + ColIdent) and the lexer's fixed set of literal token types — not by an evolving list of AST node fields. Future grammar / AST shape changes are covered automatically as long as they go through Walk and produce the same lexer token classes.
      This is deliberately distinct from sqlparser.Normalize / RedactSQLQuery, which is a plan-cache primitive that parameterizes literals only, dedupes only inside SELECT, and skips long values for CPU. The redactor here treats traces as a privacy boundary: every high-cardinality token is rewritten, always dedupes, drops comments, and falls back to <unparseable> on parse failure rather than risk a partial redaction.

    Examples

    Before: SELECT * FROM t WHERE c1 = 'could_be_sensitive'
    After:  SELECT * FROM `n1` WHERE `n2` = 'v1'
    
    Before: SELECT u.name, u.email FROM users AS u
    After:  SELECT `n1` . `n2` , `n1` . `n3` FROM `n4` AS `n1`
    
    name is a non-reserved keyword in the vitess grammar — lexed with a keyword token type, not ID. A pure lexer-driven redactor would leak it. The hybrid grammar+lex approach catches it.
    Before: WITH x (renamed_col) AS (SELECT a FROM t) SELECT * FROM x JOIN y USING (sensitive_join_col)
    After:  WITH `n1` ( `n2` ) AS ( SELECT `n3` FROM `n4` ) SELECT * FROM `n1` JOIN `n5` USING ( `n6` )
    
    CTE column-rename list, USING column list — both are []ColIdent slices on pointer parents. Picked up via the AST walk's identifier collection, redacted at lex time.
    Before: /* PII: alice@example.com */ SELECT 1 FROM t /* inline */
    After:  SELECT :v1 FROM `n1`
    
    Both margin and inline comments are dropped (the lexer emits COMMENT for both — drop happens at the lex pass).

    Concurrency

    The Mapping is eagerly allocated by NewContext when redaction is enabled (never lazy after construction), so concurrent rowexec spans firing while the parser populates the mapping never race on lazy initialization. The Mapping itself uses an internal sync.RWMutex — the fast path for already-minted tokens takes RLock, mints take Lock with a re-check after upgrade.
    A new TestMapping_ConcurrentRedactIsRaceFree exercises the production usage pattern (parent populates during parse, many goroutines read with occasional mint-on-miss) under -race for 500 iterations × 32 goroutines.

    Public API

    // sql.Context option (default off)
    sql.WithTraceRedaction(enabled bool) ContextOption
    // Methods on *Context
    (*Context).TraceRedactionEnabled() bool
    (*Context).RedactionMapping() *sqlredact.Mapping
    (*Context).RedactQueryForTrace(query string) string
    (*Context).RedactNameForTrace(name string) string
    (*Context).RedactStringerForTrace(v interface{ String() string }) string
    // New subpackage
    sqlredact.RedactSQLForTrace(sql string) (redacted string, m *Mapping, err error)
    sqlredact.RedactSQLForTraceInto(sql string, m *Mapping) (string, error)
    sqlredact.NewMapping() *Mapping

    Commits

    1. sqlredact: SQL trace redaction for span attributes (opt-in) — new sql/sqlredact package (parse + lex hybrid, the Mapping with sync.RWMutex), new sql.Context option and helpers, tests including the concurrency stress.
    2. trace: redact SQL identifiers in span attributes — wires planbuilder/parse.go, rowexec/rel.go, rowexec/join_iters.go, rowexec/range_heap_iter.go, and rowexec/ddl_iters.go through the new helpers.

    Test plan

    • go build ./... for ./sql and ./sql/sqlredact clean
    • go test ./sql ./sql/sqlredact all pass (incl. testify assertions and the concurrency test under -race)
    • gofmt -l clean on touched files
    • enginetest end-to-end coverage — not yet added; per CONTRIBUTING.md this is expected. Happy to add a query+expected-redacted-trace fixture if there's an existing pattern for span-attribute assertions, or open to guidance on the right scaffolding.

    Open questions for review

    • Token format n1/v1/:v1/X'v1' — chosen to be short and repeating for compression density. Happy to adopt a different convention (e.g. _redacted_n1) if the project prefers more obviously synthetic markers.
    • The <redacted> placeholder for LIMIT/OFFSET Stringer attrs drops the value entirely under redaction (see commit message rationale). Open to alternative: re-tokenize the fragment text against the mapping, accepting that pushdown-generated synthetic identifiers won't have entries.
    • Any existing convention for end-to-end tests of span attributes that I should hook into?
      🤖 Generated with Claude Code

vitess

  • 470: Expand the set of function call expressions that can be used as a column default value without requiring parentheses.
    These forms are not valid MySQL but are accepted by MariaDB. We should also accept them in order to have better compatibility with schemas created for MariaDB.
  • 469: add predicate to indexspec

Closed Issues

  • 11112: Manipulating dolt_workspace_TABLE permits committing foreign violations
  • 3560: CachedResults / HashLookup chain leaks rows-cache for every hash join until process restart

Don't miss a new dolt release

NewReleases is sending notifications on new releases.