7.5.2 (08/02/2022)
Bug Fixes
- Fix a bug starting in 7.4.0 in which some fsync operations might be skipped in a DB after any DropColumnFamily on that DB, until it is re-opened. This can lead to data loss on power loss. (For custom FileSystem implementations, this could lead to
FSDirectory::Fsync
orFSDirectory::Close
after the firstFSDirectory::Close
; Also, valgrind could report call toclose()
withfd=-1
.)
7.5.1 (08/01/2022)
Bug Fixes
- Fix a bug where rate_limiter_parameter is not passed into
PartitionedFilterBlockReader::GetFilterPartitionBlock
.
7.5.0 (07/15/2022)
New Features
- Mempurge option flag
experimental_mempurge_threshold
is now a ColumnFamilyOptions and can now be dynamically configured usingSetOptions()
. - Support backward iteration when
ReadOptions::iter_start_ts
is set. - Provide support for ReadOptions.async_io with direct_io to improve Seek latency by using async IO to parallelize child iterator seek and doing asynchronous prefetching on sequential scans.
- Added support for blob caching in order to cache frequently used blobs for BlobDB.
- User can configure the new ColumnFamilyOptions
blob_cache
to enable/disable blob caching. - Either sharing the backend cache with the block cache or using a completely separate cache is supported.
- A new abstraction interface called
BlobSource
for blob read logic gives all users access to blobs, whether they are in the blob cache, secondary cache, or (remote) storage. Blobs can be potentially read both while handling user reads (Get
,MultiGet
, or iterator) and during compaction (while dealing with compaction filters, Merges, or garbage collection) but eventually all blob reads go throughVersion::GetBlob
or, for MultiGet,Version::MultiGetBlob
(and then get dispatched to the interface --BlobSource
).
- User can configure the new ColumnFamilyOptions
- Add experimental tiered compaction feature
AdvancedColumnFamilyOptions::preclude_last_level_data_seconds
, which makes sure the new data inserted within preclude_last_level_data_seconds won't be placed on cold tier (the feature is not complete).
Public API changes
- Add metadata related structs and functions in C API, including
rocksdb_get_column_family_metadata()
androcksdb_get_column_family_metadata_cf()
to obtainrocksdb_column_family_metadata_t
.rocksdb_column_family_metadata_t
and its get functions & destroy function.rocksdb_level_metadata_t
and its and its get functions & destroy function.rocksdb_file_metadata_t
and its and get functions & destroy functions.
- Add suggest_compact_range() and suggest_compact_range_cf() to C API.
- When using block cache strict capacity limit (
LRUCache
withstrict_capacity_limit=true
), DB operations now fail with Status codekAborted
subcodekMemoryLimit
(IsMemoryLimit()
) instead ofkIncomplete
(IsIncomplete()
) when the capacity limit is reached, because Incomplete can mean other specific things for some operations. In more detail,Cache::Insert()
now returns the updated Status code and this usually propagates through RocksDB to the user on failure. - NewClockCache calls temporarily return an LRUCache (with similar characteristics as the desired ClockCache). This is because ClockCache is being replaced by a new version (the old one had unknown bugs) but this is still under development.
- Add two functions
int ReserveThreads(int threads_to_be_reserved)
andint ReleaseThreads(threads_to_be_released)
intoEnv
class. In the default implementation, both return 0. Newly addedxxxEnv
class that inheritsEnv
should implement these two functions for thread reservation/releasing features. - Removed Customizable support for RateLimiter and removed its CreateFromString() and Type() functions.
Bug Fixes
- Fix a bug in which backup/checkpoint can include a WAL deleted by RocksDB.
- Fix a bug where concurrent compactions might cause unnecessary further write stalling. In some cases, this might cause write rate to drop to minimum.
- Fix a bug in Logger where if dbname and db_log_dir are on different filesystems, dbname creation would fail wrt to db_log_dir path returning an error and fails to open the DB.
- Fix a CPU and memory efficiency issue introduce by #8336 which made InternalKeyComparator configurable as an unintended side effect
- Fix a bug where
GenericRateLimiter
could revert the bandwidth set dynamically usingSetBytesPerSecond()
when a user configures a structure enclosing it, e.g., usingGetOptionsFromString()
to configure anOptions
that references an existingRateLimiter
object.
Behavior Change
- In leveled compaction with dynamic levelling, level multiplier is not anymore adjusted due to oversized L0. Instead, compaction score is adjusted by increasing size level target by adding incoming bytes from upper levels. This would deprioritize compactions from upper levels if more data from L0 is coming. This is to fix some unnecessary full stalling due to drastic change of level targets, while not wasting write bandwidth for compaction while writes are overloaded.
- For track_and_verify_wals_in_manifest, revert to the original behavior before #10087: syncing of live WAL file is not tracked, and we track only the synced sizes of closed WALs. (PR #10330).
- WAL compression now computes/verifies checksum during compression/decompression.
Performance Improvements
- Rather than doing total sort against all files in a level, SortFileByOverlappingRatio() to only find the top 50 files based on score. This can improve write throughput for the use cases where data is loaded in increasing key order and there are a lot of files in one LSM-tree, where applying compaction results is the bottleneck.
- In leveled compaction, L0->L1 trivial move will allow more than one file to be moved in one compaction. This would allow L0 files to be moved down faster when data is loaded in sequential order, making slowdown or stop condition harder to hit. Also seek L0->L1 trivial move when only some files qualify.
- In leveled compaction, try to trivial move more than one files if possible, up to 4 files or max_compaction_bytes. This is to allow higher write throughput for some use cases where data is loaded in sequential order, where appying compaction results is the bottleneck.