RocksDB 7.1.1 releases, persistent key-value storage system

RocksDB is a C++ library providing an embedded key-value store, where keys and values are arbitrary byte streams. It was developed at Facebook based on LevelDB and provides backwards-compatible support for LevelDB APIs.

It is optimized for Flash with extremely low latencies. RocksDB uses a Log Structured Database Engine for storage, written entirely in C++. A Java version called RocksJava is currently in development. See RocksJava Basics.

RocksDB features highly flexible configuration settings that may be tuned to run on a variety of production environments, including pure memory, Flash, hard disks or HDFS. It supports various compression algorithms and good tools for production support and debugging.

rocksdb

Features

  • Designed for application servers wanting to store up to a few terabytes of data on locally attached Flash drives or in RAM
  • Optimized for storing small to medium size key-values on fast storage — flash devices or in-memory
  • Scales linearly with number of CPUs so that it works well on processors with many cores

Changelog

7.1.1 (04/07/2022)

Bug Fixes

  • Fix segfault in FilePrefetchBuffer with async_io as it doesn’t wait for pending jobs to complete on destruction.

7.1.0 (03/23/2022)

New Features

  • Allow WriteBatchWithIndex to index a WriteBatch that includes keys with user-defined timestamps. The index itself does not have timestamp.
  • Add support for user-defined timestamps to write-committed transaction without API change. The TransactionDB layer APIs do not allow timestamps because we require that all user-defined-timestamps-aware operations go through the Transaction APIs.
  • Added BlobDB options to ldb
  • BlockBasedTableOptions::detect_filter_construct_corruption can now be dynamically configured using DB::SetOptions.
  • Automatically recover from retryable read IO errors during backgorund flush/compaction.
  • Experimental support for preserving file Temperatures through backup and restore, and for updating DB metadata for outside changes to file Temperature (UpdateManifestForFilesState or ldb update_manifest --update_temperatures).
  • Experimental support for async_io in ReadOptions which is used by FilePrefetchBuffer to prefetch some of the data asynchronously, if reads are sequential and auto readahead is enabled by rocksdb internally.

Bug Fixes

  • Fixed a major performance bug in which Bloom filters generated by pre-7.0 releases are not read by early 7.0.x releases (and vice-versa) due to changes to FilterPolicy::Name() in #9590. This can severely impact read performance and read I/O on upgrade or downgrade with existing DB, but not data correctness.
  • Fixed a data race on versions_ between DBImpl::ResumeImpl() and threads waiting for recovery to complete (#9496)
  • Fixed a bug caused by race among flush, incoming writes and taking snapshots. Queries to snapshots created with these race condition can return incorrect result, e.g. resurfacing deleted data.
  • Fixed a bug that DB flush uses options.compression even options.compression_per_level is set.
  • Fixed a bug that DisableManualCompaction may assert when disable an unscheduled manual compaction.
  • Fix a race condition when cancel manual compaction with DisableManualCompaction. Also DB close can cancel the manual compaction thread.
  • Fixed a potential timer crash when open close DB concurrently.
  • Fixed a race condition for alive_log_files_ in non-two-write-queues mode. The race is between the write_thread_ in WriteToWAL() and another thread executing FindObsoleteFiles(). The race condition will be caught if __glibcxx_requires_nonempty is enabled.
  • Fixed a bug that Iterator::Refresh() reads stale keys after DeleteRange() performed.
  • Fixed a race condition when disable and re-enable manual compaction.
  • Fixed automatic error recovery failure in atomic flush.
  • Fixed a race condition when mmaping a WritableFile on POSIX.

Public API changes

  • Added pure virtual FilterPolicy::CompatibilityName(), which is needed for fixing major performance bug involving FilterPolicy naming in SST metadata without affecting Customizable aspect of FilterPolicy. This change only affects those with their own custom or wrapper FilterPolicy classes.
  • options.compression_per_level is dynamically changeable with SetOptions().
  • Added WriteOptions::rate_limiter_priority. When set to something other than Env::IO_TOTAL, the internal rate limiter (DBOptions::rate_limiter) will be charged at the specified priority for writes associated with the API to which the WriteOptions was provided. Currently the support covers automatic WAL flushes, which happen during live updates (Put()Write()Delete(), etc.) when WriteOptions::disableWAL == false and DBOptions::manual_wal_flush == false.
  • Add DB::OpenAndTrimHistory API. This API will open DB and trim data to the timestamp specified by trim_ts (The data with timestamp larger than specified trim bound will be removed). This API should only be used at a timestamp-enabled column families recovery. If the column family doesn’t have timestamp enabled, this API won’t trim any data on that column family. This API is not compatible with avoid_flush_during_recovery option.
  • Remove BlockBasedTableOptions.hash_index_allow_collision which already takes no effect.

Download