github facebook/rocksdb v9.1.0
RocksDB 9.1.0

latest release: v9.1.1
13 days ago

9.1.0 (03/22/2024)

New Features

  • Added an option, GetMergeOperandsOptions::continue_cb, to give users the ability to end GetMergeOperands()'s lookup process before all merge operands were found.
  • *Add sanity checks for ingesting external files that currently checks if the user key comparator used to create the file is compatible with the column family's user key comparator.
    *Support ingesting external files for column family that has user-defined timestamps in memtable only enabled.
  • On file systems that support storage level data checksum and reconstruction, retry SST block reads for point lookups, scans, and flush and compaction if there's a checksum mismatch on the initial read.
  • Some enhancements and fixes to experimental Temperature handling features, including new default_write_temperature CF option and opening an SstFileWriter with a temperature.
  • WriteBatchWithIndex now supports wide-column point lookups via the GetEntityFromBatch API. See the API comments for more details.
  • *Implement experimental features: API Iterator::GetProperty("rocksdb.iterator.write-time") to allow users to get data's approximate write unix time and write data with a specific write time via WriteBatch::TimedPut API.

Public API Changes

  • Best-effort recovery (best_efforts_recovery == true) may now be used together with atomic flush (atomic_flush == true). The all-or-nothing recovery guarantee for atomically flushed data will be upheld.
  • Remove deprecated option bottommost_temperature, already replaced by last_level_temperature
  • Added new PerfContext counters for block cache bytes read - block_cache_index_read_byte, block_cache_filter_read_byte, block_cache_compression_dict_read_byte, and block_cache_read_byte.
  • Deprecate experimental Remote Compaction APIs - StartV2() and WaitForCompleteV2() and introduce Schedule() and Wait(). The new APIs essentially does the same thing as the old APIs. They allow taking externally generated unique id to wait for remote compaction to complete.
  • *For API WriteCommittedTransaction::GetForUpdate, if the column family enables user-defined timestamp, it was mandated that argument do_validate cannot be false, and UDT based validation has to be done with a user set read timestamp. It's updated to make the UDT based validation optional if user sets do_validate to false and does not set a read timestamp. With this, GetForUpdate skips UDT based validation and it's users' responsibility to enforce the UDT invariant. SO DO NOT skip this UDT-based validation if users do not have ways to enforce the UDT invariant. Ways to enforce the invariant on the users side include manage a monotonically increasing timestamp, commit transactions in a single thread etc.
  • Defined a new PerfLevel kEnableWait to measure time spent by user threads blocked in RocksDB other than mutex, such as a write thread waiting to be added to a write group, a write thread delayed or stalled etc.
  • RateLimiter's API no longer requires the burst size to be the refill size. Users of NewGenericRateLimiter() can now provide burst size in single_burst_bytes. Implementors of RateLimiter::SetSingleBurstBytes() need to adapt their implementations to match the changed API doc.
  • Add write_memtable_time to the newly introduced PerfLevel kEnableWait.

Behavior Changes

  • RateLimiters created by NewGenericRateLimiter() no longer modify the refill period when SetSingleBurstBytes() is called.
  • Merge writes will only keep merge operand count within ColumnFamilyOptions::max_successive_merges when the key's merge operands are all found in memory, unless strict_max_successive_merges is explicitly set.

Bug Fixes

  • Fixed kBlockCacheTier reads to return Status::Incomplete when I/O is needed to fetch a merge chain's base value from a blob file.
  • Fixed kBlockCacheTier reads to return Status::Incomplete on table cache miss rather than incorrectly returning an empty value.
  • Fixed a data race in WalManager that may affect how frequent PurgeObsoleteWALFiles() runs.
  • Re-enable the recycle_log_file_num option in DBOptions for kPointInTimeRecovery WAL recovery mode, which was previously disabled due to a bug in the recovery logic. This option is incompatible with WriteOptions::disableWAL. A Status::InvalidArgument() will be returned if disableWAL is specified.

Performance Improvements

  • Java API multiGet() variants now take advantage of the underlying batched multiGet() performance improvements.
    Before
Benchmark (columnFamilyTestType) (keyCount) (keySize) (multiGetSize) (valueSize) Mode Cnt Score Error Units
MultiGetBenchmarks.multiGetList10 no_column_family 10000 16 100 64 thrpt 25 6315.541 ± 8.106 ops/s
MultiGetBenchmarks.multiGetList10 no_column_family 10000 16 100 1024 thrpt 25 6975.468 ± 68.964 ops/s

After

Benchmark (columnFamilyTestType) (keyCount) (keySize) (multiGetSize) (valueSize) Mode Cnt Score Error Units
MultiGetBenchmarks.multiGetList10 no_column_family 10000 16 100 64 thrpt 25 7046.739 ± 13.299 ops/s
MultiGetBenchmarks.multiGetList10 no_column_family 10000 16 100 1024 thrpt 25 7654.521 ± 60.121 ops/s

Don't miss a new rocksdb release

NewReleases is sending notifications on new releases.