github facebook/rocksdb v7.5.3
RocksDB 7.5.3

latest releases: v9.6.1, v9.5.2, v9.4.0...
2 years ago

7.5.2 (08/02/2022)

Bug Fixes

  • Fix a bug starting in 7.4.0 in which some fsync operations might be skipped in a DB after any DropColumnFamily on that DB, until it is re-opened. This can lead to data loss on power loss. (For custom FileSystem implementations, this could lead to FSDirectory::Fsync or FSDirectory::Close after the first FSDirectory::Close; Also, valgrind could report call to close() with fd=-1.)

7.5.1 (08/01/2022)

Bug Fixes

  • Fix a bug where rate_limiter_parameter is not passed into PartitionedFilterBlockReader::GetFilterPartitionBlock.

7.5.0 (07/15/2022)

New Features

  • Mempurge option flag experimental_mempurge_threshold is now a ColumnFamilyOptions and can now be dynamically configured using SetOptions().
  • Support backward iteration when ReadOptions::iter_start_ts is set.
  • Provide support for ReadOptions.async_io with direct_io to improve Seek latency by using async IO to parallelize child iterator seek and doing asynchronous prefetching on sequential scans.
  • Added support for blob caching in order to cache frequently used blobs for BlobDB.
    • User can configure the new ColumnFamilyOptions blob_cache to enable/disable blob caching.
    • Either sharing the backend cache with the block cache or using a completely separate cache is supported.
    • A new abstraction interface called BlobSource for blob read logic gives all users access to blobs, whether they are in the blob cache, secondary cache, or (remote) storage. Blobs can be potentially read both while handling user reads (Get, MultiGet, or iterator) and during compaction (while dealing with compaction filters, Merges, or garbage collection) but eventually all blob reads go through Version::GetBlob or, for MultiGet, Version::MultiGetBlob (and then get dispatched to the interface -- BlobSource).
  • Add experimental tiered compaction feature AdvancedColumnFamilyOptions::preclude_last_level_data_seconds, which makes sure the new data inserted within preclude_last_level_data_seconds won't be placed on cold tier (the feature is not complete).

Public API changes

  • Add metadata related structs and functions in C API, including
    • rocksdb_get_column_family_metadata() and rocksdb_get_column_family_metadata_cf() to obtain rocksdb_column_family_metadata_t.
    • rocksdb_column_family_metadata_t and its get functions & destroy function.
    • rocksdb_level_metadata_t and its and its get functions & destroy function.
    • rocksdb_file_metadata_t and its and get functions & destroy functions.
  • Add suggest_compact_range() and suggest_compact_range_cf() to C API.
  • When using block cache strict capacity limit (LRUCache with strict_capacity_limit=true), DB operations now fail with Status code kAborted subcode kMemoryLimit (IsMemoryLimit()) instead of kIncomplete (IsIncomplete()) when the capacity limit is reached, because Incomplete can mean other specific things for some operations. In more detail, Cache::Insert() now returns the updated Status code and this usually propagates through RocksDB to the user on failure.
  • NewClockCache calls temporarily return an LRUCache (with similar characteristics as the desired ClockCache). This is because ClockCache is being replaced by a new version (the old one had unknown bugs) but this is still under development.
  • Add two functions int ReserveThreads(int threads_to_be_reserved) and int ReleaseThreads(threads_to_be_released) into Env class. In the default implementation, both return 0. Newly added xxxEnv class that inherits Env should implement these two functions for thread reservation/releasing features.
  • Removed Customizable support for RateLimiter and removed its CreateFromString() and Type() functions.

Bug Fixes

  • Fix a bug in which backup/checkpoint can include a WAL deleted by RocksDB.
  • Fix a bug where concurrent compactions might cause unnecessary further write stalling. In some cases, this might cause write rate to drop to minimum.
  • Fix a bug in Logger where if dbname and db_log_dir are on different filesystems, dbname creation would fail wrt to db_log_dir path returning an error and fails to open the DB.
  • Fix a CPU and memory efficiency issue introduce by #8336 which made InternalKeyComparator configurable as an unintended side effect
  • Fix a bug where GenericRateLimiter could revert the bandwidth set dynamically using SetBytesPerSecond() when a user configures a structure enclosing it, e.g., using GetOptionsFromString() to configure an Options that references an existing RateLimiter object.

Behavior Change

  • In leveled compaction with dynamic levelling, level multiplier is not anymore adjusted due to oversized L0. Instead, compaction score is adjusted by increasing size level target by adding incoming bytes from upper levels. This would deprioritize compactions from upper levels if more data from L0 is coming. This is to fix some unnecessary full stalling due to drastic change of level targets, while not wasting write bandwidth for compaction while writes are overloaded.
  • For track_and_verify_wals_in_manifest, revert to the original behavior before #10087: syncing of live WAL file is not tracked, and we track only the synced sizes of closed WALs. (PR #10330).
  • WAL compression now computes/verifies checksum during compression/decompression.

Performance Improvements

  • Rather than doing total sort against all files in a level, SortFileByOverlappingRatio() to only find the top 50 files based on score. This can improve write throughput for the use cases where data is loaded in increasing key order and there are a lot of files in one LSM-tree, where applying compaction results is the bottleneck.
  • In leveled compaction, L0->L1 trivial move will allow more than one file to be moved in one compaction. This would allow L0 files to be moved down faster when data is loaded in sequential order, making slowdown or stop condition harder to hit. Also seek L0->L1 trivial move when only some files qualify.
  • In leveled compaction, try to trivial move more than one files if possible, up to 4 files or max_compaction_bytes. This is to allow higher write throughput for some use cases where data is loaded in sequential order, where appying compaction results is the bottleneck.

Don't miss a new rocksdb release

NewReleases is sending notifications on new releases.