6.27.3 (2021-12-10)
Bug Fixes
- Fixed a bug in TableOptions.prepopulate_block_cache which causes segmentation fault when used with TableOptions.partition_filters = true and TableOptions.cache_index_and_filter_blocks = true.
- Fixed a bug affecting custom memtable factories which are not registered with the
ObjectRegistry
. The bug could result in failure to save the OPTIONS file.
6.27.2 (2021-12-01)
Bug Fixes
- Fixed a bug in rocksdb automatic implicit prefetching which got broken because of new feature adaptive_readahead and internal prefetching got disabled when iterator moves from one file to next.
6.27.1 (2021-11-29)
Bug Fixes
- Fixed a bug that could, with WAL enabled, cause backups, checkpoints, and
GetSortedWalFiles()
to fail randomly with an error likeIO error: 001234.log: No such file or directory
6.27.0 (2021-11-19)
New Features
- Added new ChecksumType kXXH3 which is faster than kCRC32c on almost all x86_64 hardware.
- Added a new online consistency check for BlobDB which validates that the number/total size of garbage blobs does not exceed the number/total size of all blobs in any given blob file.
- Provided support for tracking per-sst user-defined timestamp information in MANIFEST.
- Added new option "adaptive_readahead" in ReadOptions. For iterators, RocksDB does auto-readahead on noticing sequential reads and by enabling this option, readahead_size of current file (if reads are sequential) will be carried forward to next file instead of starting from the scratch at each level (except L0 level files). If reads are not sequential it will fall back to 8KB. This option is applicable only for RocksDB internal prefetch buffer and isn't supported with underlying file system prefetching.
- Added the read count and read bytes related stats to Statistics for tiered storage hot, warm, and cold file reads.
- Added an option to dynamically charge an updating estimated memory usage of block-based table building to block cache if block cache available. It currently only includes charging memory usage of constructing (new) Bloom Filter and Ribbon Filter to block cache. To enable this feature, set
BlockBasedTableOptions::reserve_table_builder_memory = true
. - Add a new API OnIOError in listener.h that notifies listeners when an IO error occurs during FileSystem operation along with filename, status etc.
- Added compaction readahead support for blob files to the integrated BlobDB implementation, which can improve compaction performance when the database resides on higher-latency storage like HDDs or remote filesystems. Readahead can be configured using the column family option
blob_compaction_readahead_size
.
Bug Fixes
- Prevent a
CompactRange()
withCompactRangeOptions::change_level == true
from possibly causing corruption to the LSM state (overlapping files within a level) when run in parallel with another manual compaction. Note that settingforce_consistency_checks == true
(the default) would cause the DB to enter read-only mode in this scenario and returnStatus::Corruption
, rather than committing any corruption. - Fixed a bug in CompactionIterator when write-prepared transaction is used. A released earliest write conflict snapshot may cause assertion failure in dbg mode and unexpected key in opt mode.
- Fix ticker WRITE_WITH_WAL("rocksdb.write.wal"), this bug is caused by a bad extra
RecordTick(stats_, WRITE_WITH_WAL)
(at 2 place), this fix remove the extraRecordTick
s and fix the corresponding test case. - EventListener::OnTableFileCreated was previously called with OK status and file_size==0 in cases of no SST file contents written (because there was no content to add) and the empty file deleted before calling the listener. Now the status is Aborted.
- Fixed a bug in CompactionIterator when write-preared transaction is used. Releasing earliest_snapshot during compaction may cause a SingleDelete to be output after a PUT of the same user key whose seq has been zeroed.
- Added input sanitization on negative bytes passed into
GenericRateLimiter::Request
. - Fixed an assertion failure in CompactionIterator when write-prepared transaction is used. We prove that certain operations can lead to a Delete being followed by a SingleDelete (same user key). We can drop the SingleDelete.
- Fixed a bug of timestamp-based GC which can cause all versions of a key under full_history_ts_low to be dropped. This bug will be triggered when some of the ikeys' timestamps are lower than full_history_ts_low, while others are newer.
- In some cases outside of the DB read and compaction paths, SST block checksums are now checked where they were not before.
- Explicitly check for and disallow the
BlockBasedTableOptions
if insertion into one of {block_cache
,block_cache_compressed
,persistent_cache
} can show up in another of these. (RocksDB expects to be able to use the same key for different physical data among tiers.) - Users who configured a dedicated thread pool for bottommost compactions by explicitly adding threads to the
Env::Priority::BOTTOM
pool will no longer see RocksDB schedule automatic compactions exceeding the DB's compaction concurrency limit. For details on per-DB compaction concurrency limit, see API docs ofmax_background_compactions
andmax_background_jobs
. - Fixed a bug of background flush thread picking more memtables to flush and prematurely advancing column family's log_number.
- Fixed an assertion failure in ManifestTailer.
Behavior Changes
NUM_FILES_IN_SINGLE_COMPACTION
was only counting the first input level files, now it's including all input files.TransactionUtil::CheckKeyForConflicts
can also perform conflict-checking based on user-defined timestamps in addition to sequence numbers.- Removed
GenericRateLimiter
's minimum refill bytes per period previously enforced.
Public API change
- When options.ttl is used with leveled compaction with compactinon priority kMinOverlappingRatio, files exceeding half of TTL value will be prioritized more, so that by the time TTL is reached, fewer extra compactions will be scheduled to clear them up. At the same time, when compacting files with data older than half of TTL, output files may be cut off based on those files' boundaries, in order for the early TTL compaction to work properly.
- Made FileSystem extend the Customizable class and added a CreateFromString method. Implementations need to be registered with the ObjectRegistry and to implement a Name() method in order to be created via this method.
- Clarified in API comments that RocksDB is not exception safe for callbacks and custom extensions. An exception propagating into RocksDB can lead to undefined behavior, including data loss, unreported corruption, deadlocks, and more.
- Marked
WriteBufferManager
asfinal
because it is not intended for extension. - Removed unimportant implementation details from table_properties.h
- Add API
FSDirectory::FsyncWithDirOptions()
, which provides extra information like directory fsync reason inDirFsyncOptions
. File system like btrfs is using that to skip directory fsync for creating a new file, or when renaming a file, fsync the target file instead of the directory, which improves theDB::Open()
speed by ~20%. DB::Open()
is not going be blocked by obsolete file purge ifDBOptions::avoid_unnecessary_blocking_io
is set to true.- In builds where glibc provides
gettid()
, info log ("LOG" file) lines now print a system-wide thread ID fromgettid()
instead of the process-localpthread_self()
. For all users, the thread ID format is changed from hexadecimal to decimal integer. - In builds where glibc provides
pthread_setname_np()
, the background thread names no longer contain an ID suffix. For example, "rocksdb:bottom7" (and all other threads in theEnv::Priority::BOTTOM
pool) are now named "rocksdb:bottom". Previously large thread pools could breach the name size limit (e.g., naming "rocksdb:bottom10" would fail). - Deprecating
ReadOptions::iter_start_seqnum
andDBOptions::preserve_deletes
, please try using user defined timestamp feature instead. The options will be removed in a future release, currently it logs a warning message when using.
Performance Improvements
- Released some memory related to filter construction earlier in
BlockBasedTableBuilder
forFullFilter
andPartitionedFilter
case (#9070)