9.1.0 (03/22/2024)
New Features
- Added an option,
GetMergeOperandsOptions::continue_cb
, to give users the ability to endGetMergeOperands()
's lookup process before all merge operands were found. - *Add sanity checks for ingesting external files that currently checks if the user key comparator used to create the file is compatible with the column family's user key comparator.
*Support ingesting external files for column family that has user-defined timestamps in memtable only enabled. - On file systems that support storage level data checksum and reconstruction, retry SST block reads for point lookups, scans, and flush and compaction if there's a checksum mismatch on the initial read.
- Some enhancements and fixes to experimental Temperature handling features, including new
default_write_temperature
CF option and opening anSstFileWriter
with a temperature. WriteBatchWithIndex
now supports wide-column point lookups via theGetEntityFromBatch
API. See the API comments for more details.- *Implement experimental features: API
Iterator::GetProperty("rocksdb.iterator.write-time")
to allow users to get data's approximate write unix time and write data with a specific write time viaWriteBatch::TimedPut
API.
Public API Changes
- Best-effort recovery (
best_efforts_recovery == true
) may now be used together with atomic flush (atomic_flush == true
). The all-or-nothing recovery guarantee for atomically flushed data will be upheld. - Remove deprecated option
bottommost_temperature
, already replaced bylast_level_temperature
- Added new PerfContext counters for block cache bytes read - block_cache_index_read_byte, block_cache_filter_read_byte, block_cache_compression_dict_read_byte, and block_cache_read_byte.
- Deprecate experimental Remote Compaction APIs - StartV2() and WaitForCompleteV2() and introduce Schedule() and Wait(). The new APIs essentially does the same thing as the old APIs. They allow taking externally generated unique id to wait for remote compaction to complete.
- *For API
WriteCommittedTransaction::GetForUpdate
, if the column family enables user-defined timestamp, it was mandated that argumentdo_validate
cannot be false, and UDT based validation has to be done with a user set read timestamp. It's updated to make the UDT based validation optional if user setsdo_validate
to false and does not set a read timestamp. With this,GetForUpdate
skips UDT based validation and it's users' responsibility to enforce the UDT invariant. SO DO NOT skip this UDT-based validation if users do not have ways to enforce the UDT invariant. Ways to enforce the invariant on the users side include manage a monotonically increasing timestamp, commit transactions in a single thread etc. - Defined a new PerfLevel
kEnableWait
to measure time spent by user threads blocked in RocksDB other than mutex, such as a write thread waiting to be added to a write group, a write thread delayed or stalled etc. RateLimiter
's API no longer requires the burst size to be the refill size. Users ofNewGenericRateLimiter()
can now provide burst size insingle_burst_bytes
. Implementors ofRateLimiter::SetSingleBurstBytes()
need to adapt their implementations to match the changed API doc.- Add
write_memtable_time
to the newly introduced PerfLevelkEnableWait
.
Behavior Changes
RateLimiter
s created byNewGenericRateLimiter()
no longer modify the refill period whenSetSingleBurstBytes()
is called.- Merge writes will only keep merge operand count within
ColumnFamilyOptions::max_successive_merges
when the key's merge operands are all found in memory, unlessstrict_max_successive_merges
is explicitly set.
Bug Fixes
- Fixed
kBlockCacheTier
reads to returnStatus::Incomplete
when I/O is needed to fetch a merge chain's base value from a blob file. - Fixed
kBlockCacheTier
reads to returnStatus::Incomplete
on table cache miss rather than incorrectly returning an empty value. - Fixed a data race in WalManager that may affect how frequent PurgeObsoleteWALFiles() runs.
- Re-enable the recycle_log_file_num option in DBOptions for kPointInTimeRecovery WAL recovery mode, which was previously disabled due to a bug in the recovery logic. This option is incompatible with WriteOptions::disableWAL. A Status::InvalidArgument() will be returned if disableWAL is specified.
Performance Improvements
- Java API
multiGet()
variants now take advantage of the underlying batchedmultiGet()
performance improvements.
Before
Benchmark (columnFamilyTestType) (keyCount) (keySize) (multiGetSize) (valueSize) Mode Cnt Score Error Units
MultiGetBenchmarks.multiGetList10 no_column_family 10000 16 100 64 thrpt 25 6315.541 ± 8.106 ops/s
MultiGetBenchmarks.multiGetList10 no_column_family 10000 16 100 1024 thrpt 25 6975.468 ± 68.964 ops/s
After
Benchmark (columnFamilyTestType) (keyCount) (keySize) (multiGetSize) (valueSize) Mode Cnt Score Error Units
MultiGetBenchmarks.multiGetList10 no_column_family 10000 16 100 64 thrpt 25 7046.739 ± 13.299 ops/s
MultiGetBenchmarks.multiGetList10 no_column_family 10000 16 100 1024 thrpt 25 7654.521 ± 60.121 ops/s