Dragonfly v1.38.0
This release is a major one with lots of significant improvements and stability fixes.
It may behave differently due to changes in its core data structures.
We worked hard to avoid any regressions so please let us know if you observe degradation around item expiry behavior, replication or backups.
Important stability fixes and changes:
Top-K command family (TOPK.*): RedisBloom-compatible heavy-hitter sketch with
RDB persistence. (#6920, #6931, #6932, #6950)
Count-Min Sketch family (CMS.*): Full CMS with INITBYDIM, INITBYPROB,
INCRBY, QUERY, MERGE, INFO. RDB persistence. (#6867, #6888, #6896, #6897)
~26% memory reduction for TTL workloads: Expire table eliminated, expiry embedded
in CompactKey via SDS_TTL_TAG. 10M keys: 900 MB -> 665 MB. (#6923, #6925, #6933)
HTTL command: Returns remaining TTL for specific hash fields. (#6879)
VECTOR_RANGE search: Radius-based vector similarity queries in FT.SEARCH and
FT.AGGREGATE for both FLAT and HNSW indexes, with optional YIELD_DISTANCE_AS.
(#6880, #6898, #6938)
FT.AGGREGATE: FILTER, APPLY, DIALECT, KNN: Full expression-based filtering,
computed fields, dialect selection, and KNN steps in the aggregation pipeline.
(#6968, #6982, #7066)
Smarter KNN prefilter: Falls back to brute-force scan when prefilter result set
is small. (#6730)
Replica expired-key deletion: Replicas proactively delete expired keys on reads.
--replica_delete_expired (default true). (#6985)
New Prometheus metrics: Pipeline latency histogram (dragonfly_pipeline_latency_seconds),
TLS handshake counter (tls_handshakes_total), stream access patterns
(stream_accesses_total), pipeline blocking counter. (#6688, #7015, #6767, #6685)
EKS Pod Identity for S3: AWS credential chain supports EKS Pod Identity and ECS
task roles. (#6917)
Early TLS filter and TCP_DEFER_ACCEPT: Drops zombie connections at OS level before
allocating fiber/OpenSSL state. (#6857)
Memcached io_loop_v2 enabled by default*: Async I/O with deferred replies. (#6700)
Incremental SBF replication: Scalable Bloom Filter chunks flushed incrementally,
bounding peak memory. (#7034)
What's Changed
- chore(server): Extend script slowlog stats by @dranikpg in #6656
- test(ci): exclude flaky BullMQ tests by @vyavdoshenko in #6673
- feat(fuzz): Add LLM-guided PR fuzzing with targeted seed generation by @vyavdoshenko in #6666
- refactor(journal): replace Journal class with free functions by @romange in #6672
- fix(search): Fix HNSW index creation failure, logs, drop by @dranikpg in #6662
- refactor: clean includes by @BorysTheDev in #6679
- chore: remove facade/acl_commands.h by @kostasrim in #6682
- fix: improve fuzz-pr crash detection and artifact packaging by @vyavdoshenko in #6686
- feat(tiering): Use Future::IsResolved to check if delayed entry is resolved by @mkaruza in #6665
- test: mark heavy skipped tests as large instead of skip by @vyavdoshenko in #6675
- feat: add pipeline latency histogram and Prometheus SUMMARY export by @romange in #6688
- refactor: move methods from OAHEntry to OAHSet by @BorysTheDev in #6692
- chore: remove duplicate info reported by "memory stats" by @romange in #6696
- server: Collect stats for why defragmentation was skipped by @abhijat in #6680
- fix: shrink memory usage for journaling items by @romange in #6695
- ci: exclude flaky BullMQ getWorkers shared connection test by @vyavdoshenko in #6697
- fix(tiering): Track all keys mutation callback for cluster migration by @mkaruza in #6691
- refactor: remove static fields from rdb_load by @BorysTheDev in #6684
- fix(tiering): Increase write depth by @dranikpg in #6705
- refactor(mc): async MGET/GAT by @glevkovich in #6674
- chore(ci): Replace direct push with PR for helm chart updates by @vyavdoshenko in #6717
- fix: validate stream command arguments to prevent crashes by @vyavdoshenko in #6718
- fix(fuzz): skip missing RECORD files in replay_crash.py by @vyavdoshenko in #6719
- chore: Skip KNN search if prefilter is empty by @mkaruza in #6720
- perf(seeder): batch STRING hashing with MGET to reduce tiering latency by @romange in #6721
- test: add test for replication HNSW index by @BorysTheDev in #6722
- skip tiering tests by @dranikpg in #6723
- ci: migrate ioloop-v2-regtests to OIDC for AWS auth by @AdiCloudbit-DevOps in #6725
- test: skip test_hnsw_search_replication_with_network_disruptions by @BorysTheDev in #6729
- feat: add RM command skeleton (phase 1) by @romange in #6731
- chore: add pipeline_blocking_commands_total counter by @kostasrim in #6685
- feat(set_family): Reserve vector on OpInter in advance by @abhijat in #6736
- ci: add memcache fuzzing to regular campaign by @vyavdoshenko in #6735
- test: consolidate slow marker into large by @vyavdoshenko in #6734
- feat(dash): segment merging by @kostasrim in #6676
- server: move blob loading logic into respective family classes by @romange in #6713
- feat(facade): enable experimental_io_loop_v2 by default and clean CI by @glevkovich in #6700
- core: Fix page usage accounting by @abhijat in #6737
- chore: do not reconcile slots after dfly takeover if cluster config updated by @kostasrim in #6728
- fix: orphan streams on error in OpCreate and OpAdd by @kostasrim in #6744
- fix: crash in SetFullJson when overwriting string key with JSON by @vyavdoshenko in #6750
- fix(server): flush replies on encountering blocking command in pipeline by @abhijat in #6743
- server: Add summary view to memory arena by @abhijat in #6677
- chore: stream family boilerplate to a function by @kostasrim in #6748
- fix: prevent SIGFPE in DEBUG POPULATE when val_size is 0 by @vyavdoshenko in #6754
- feat: implement RM scan-and-delete logic (phase 2) by @romange in #6732
- refactor: move stream functions from t_stream.c to their call sites by @romange in #6703
- ci: add optional memcache smoke fuzzing to PR workflow by @vyavdoshenko in #6758
- feat(server): Get and set byte at index position in string object by @mkaruza in #6752
- refactor: move stream alloc/free into compact_object and fix rdb_load cleanup by @romange in #6759
- fix: return WOULD_BLOCK for STATS/VERSION in MC pipeline by @vyavdoshenko in #6762
- ci: extract BullMQ skipped tests into a documented file by @vyavdoshenko in #6756
- chore: Retrieve single byte for GETBIT command by @mkaruza in #6757
- fix(bitops): Fetch only single byte for SETBIT operation by @mkaruza in #6745
- fix: prevent empty hash on HINCRBYFLOAT with NaN/Inf by @vyavdoshenko in #6772
- chore: revert our custom check-fails in mimalloc by @romange in #6774
- json: Add stats for interned string pool by @abhijat in #6701
- feat(server): add stream access pattern metrics by @romange in #6767
- feat: add deffered queue for hnsw write ops by @BorysTheDev in #6746
- feat(search): Use filtered search when prefilter result size is small by @mkaruza in #6730
- chore: upload coverage by @kostasrim in #6747
- tests: Protect connections with mutex by @abhijat in #6777
- fix(search): use 2D vectors in HnswSubsetKnnTest/CompareWithFilteredKnn f or COSINE (macOS build fix) by @vyavdoshenko in #6790
- Fix pipeline starvation in AsyncFiber during async v2 loop transition by @glevkovich in #6788
- test(tiering): add ABA stash-delete-restash test for OpManager by @romange in #6789
- fix(fuzz): treat AFL++ hangs as failures and upload artifacts by @vyavdoshenko in #6782
- feat: implement debug compact-table command by @kostasrim in #6741
- chore(tiering): move TieredColdRecord to tiering namespace, update op_manager types by @romange in #6792
- fix(ci): fix no-op cluster tests and move slow tests to large CI by @vyavdoshenko in #6795
- chore: simplify AsyncFiber routing logic and improve VLOGs by @glevkovich in #6797
- CI: make sure jobs only run on the org repo by @abhijat in #6791
- fix(facade): improve connection setup and buffer capacity tracking by @romange in #6794
- fix: test_network_disconnect_small_buffer by @kostasrim in #6793
- fix: restore HNSW index during replication with different shard counts by @vyavdoshenko in #6733
- fix(fuzz): fix log spam from RECORD files and hang artifact upload failure by @vyavdoshenko in #6798
- fix(pipeline): fix missing SetDeferredReply bugs by @glevkovich in #6806
- refactor: deduplicate build steps into reusable composite action by @romange in #6804
- CI: Remove the docker image and binary downloads by @abhijat in #6812
- refactor(tiering): introduce FragmentRef abstraction for serialization by @romange in #6796
- feat: add pending keys to HNSW index during full sync phase by @BorysTheDev in #6805
- ci(fuzz): switch AFL++ compiler from afl-clang-lto to afl-clang-fast by @vyavdoshenko in #6817
- chore: remove shard_cnt/EXPIRED arguments in journal by @romange in #6818
- doc(serialization): introduce the serialization design document by @romange in #6820
- fix: improve HNSW index restoration correctness during replication by @BorysTheDev in #6815
- fix: clear stale expiry when overwriting key with no expiration in AddOrUpdateInternal by @vyavdoshenko in #6821
- chore: Set HNSW label with setExternalLabel by @mkaruza in #6847
- fix: finalize memory accounting before tiered stash during RDB load by @romange in #6823
- fix(search): Don't apply limit on results in prefilter knn search by @mkaruza in #6724
- feat(server): First coroutine for Del by @dranikpg in #6706
- chore: Fix clang compilation issues for ZSetFamily structs by @mkaruza in #6849
- fix(tiering): fix flaky op_manager_test by waiting for cancelled stash IOs by @romange in #6852
- chore(tiering): pass FragmentRef instead of PrimeValue in tiered storage by @romange in #6784
- fix(cluster): Handle number overflow in DFLYCLUSTER CONFIG parsing by @vyavdoshenko in #6853
- feat(facade): add early TLS filter and TCP_DEFER_ACCEPT by @glevkovich in #6857
- Add Claude Code skill for reproducing fuzz crashes by @romange in #6859
- fix(cluster): Validate slot IDs in DFLYCLUSTER FLUSHSLOTS to prevent out-of-bounds crash by @vyavdoshenko in #6861
- fix: make SwitchState return previous state to prevent double DFLY LOAD by @romange in #6855
- doc(serialization): document ordering invariants in code comments by @romange in #6845
- doc(serialization): expand shard serialization design with tiered analysis and roadmap by @romange in #6827
- fix(tiering): Serialize delayed entries from SerializeBucket by @mkaruza in #6807
- chore: introduce SerializerBase by @kostasrim in #6854
- feat(server): Count unique strings across shards by @abhijat in #6811
- test: make test_hnsw_search_replication_with_network_disruptions stable by @BorysTheDev in #6863
- core: Yield while traversing by @abhijat in #6846
- Revert "refactor(string): migrate SET, APPEND, PREPEND to async SimpleContext" by @dranikpg in #6851
- Revert "core: Add data structure for estimate which decays over time … by @romange in #6866
- chore: another way to track per object memory by @romange in #6864
- chore(core): add CMS implementation with unit tests by @romange in #6867
- Document cluster node health feature by @Copilot in #6502
- fix(rdb_load): Fix dry run with shard optimization by @dranikpg in #6727
- fix(ci): Skip AWS auth step on forked PR by @dranikpg in #6875
- feat: add extract method into StringMap by @BorysTheDev in #6872
- Add Celery integration tests by @romange in #6876
- chore(server): Use coroutines for MGET/GAT/INCR by @dranikpg in #6869
- Add HTTL command for hash field TTL queries by @romange in #6879
- feat(search): implement VECTOR_RANGE operator for FLAT index by @vyavdoshenko in #6880
- fix: json family memory tracking and orphans by @kostasrim in #6785
- fix(search): Fix null vector detection in FlatVectorIndex by @vyavdoshenko in #6892
- chore: add CMS type in CompactObject and AclFamily by @kostasrim in #6888
- chore: add cms family by @kostasrim in #6896
- chore(facade): Simplify parsed command replies by @dranikpg in #6878
- chore: update pytests for more modern redis-py client by @romange in #6889
- fix: macOS build by @vyavdoshenko in #6900
- fix(facade): Support RESP in ioloopv2 by @dranikpg in #6870
- chore: disk backpressure class utility by @kostasrim in #6020
- chore: switch CI to use ubuntu:24 by @romange in #6873
- fix(search): don't crash on wrong-sized vector/numeric field values in FT.SEARCH by @vyavdoshenko in #6908
- chore: remove unneeded http pages by @romange in #6905
- fix(search): two crash fixes — non-finite NUMERIC values and KeepTopKSorted OOB by @vyavdoshenko in #6911
- chore: CMS serialization by @kostasrim in #6897
- fix(CI): conditionally use pip flags by @abhijat in #6914
- chore(server): Remove non-coroutine async structs by @dranikpg in #6877
- chore: add BullMQ/sidekiq tests, fix celery worker teardown timeout by @romange in #6901
- chore: stack reduction and small helio clean-ups by @romange in #6916
- chore: switch actions to ubuntu-dev:24 by @romange in #6922
- CI: allow commit to repeat test by @abhijat in #6919
- feat(core): embed TTL directly in CompactKey via SDS_TTL_TAG by @romange in #6925
- core: add dict_builder module with HLL compressibility estimation by @romange in #6903
- feat(search): Add HNSW vector range search (FT.SEARCH) by @vyavdoshenko in #6898
- feat(core): introduce TOPK data structure (Stage 1) by @glevkovich in #6920
- chore: improve naming and get rid of references for mcflag by @romange in #6930
- feat(core): integrate TOPK into CompactObj by @glevkovich in #6931
- chore(server): remove expire table by @romange in #6923
- fix(server): fix active expiry dilution after expire table removal by @romange in #6934
- perf(server): optimize OpStrLen for tiered storage by @asherlaau in #6913
- chore: macos stub for DiskBackedQueue by @kostasrim in #6935
- chore(server): remove expire table by @romange in #6933
- chore: fix release pipeline by @romange in #6937
- feat(aws): support EKS Pod Identity for S3 credentials by @kanaevad in #6917
- fix(search): guard LoadEntry against freed DocIds in global HNSW KNN by @vyavdoshenko in #6936
- feat(server): Use more coroutines in string_family by @claude in #6910
- fix(core): Optimize SortedMap::GetRange by @dranikpg in #6942
- chore: SliceSnapshot inherits from SerializerBase by @kostasrim in #6860
- feat(topk): implement RDB serialization for Top-K sketches by @glevkovich in #6932
- Exclude fork PRs from AWS credential steps in CI workflows by @Copilot in #6946
- fix(search): cap GROUPBY nargs before reserve to prevent DoS crash by @vyavdoshenko in #6949
- feat(search): support VECTOR_RANGE operator in FT.AGGREGATE with YIELD_DISTANCE_AS by @vyavdoshenko in #6938
- chore(core): retire LZ4 compression from qlist by @romange in #6952
- test: add logs for redis and valkey instances during tests by @BorysTheDev in #6954
- feat(topk): add TOPK command family by @glevkovich in #6950
- chore: fix qlist memory accounting bug by @romange in #6958
- chore(deps): bump the actions group across 1 directory with 10 updates by @dependabot[bot] in #6957
- fix: ft.dropindex deadlock by @BorysTheDev in #6956
- chore(server): Simplify SerializerBase interface by @dranikpg in #6948
- fix(test): Disable heartbeat during keys hash calculation in tiering replication test by @mkaruza in #6904
- fix(search): Simplify hnsw indexing by @dranikpg in #6962
- fix: update helio submodule by @vyavdoshenko in #6961
- chore(fuzz): bundle repro.env with crash archives for reliable reproduction by @vyavdoshenko in #6953
- fix: disable BullMQ flaky test by @vyavdoshenko in #6963
- chore(core): add ZSTD dictionary compression infrastructure to qlist by @romange in #6955
- test(fakeredis): add topk (fakeredis) test suite by @glevkovich in #6959
- refactor: DenseSet entry removing by @BorysTheDev in #6966
- feat(server): Locking control from lua scripts by @dranikpg in #6277
- fix: FT.AGGREGATE SORTBY crash with huge nargs value by @vyavdoshenko in #6972
- feat(search): add FILTER expression parser and evaluator for FT.AGGREGATE by @vyavdoshenko in #6968
- chore: Disable memory_test::test_throttle_on_commands_squashing_replies_bytes by @mkaruza in #6976
- fix: null ptr deref in JournalXReadGroupIfNeeded for XREAD BLOCK with replication by @romange in #6980
- fix: JSON.NUMINCRBY negative result overflow by @romange in #6981
- feat(search): add FILTER clause support in FT.AGGREGATE by @vyavdoshenko in #6982
- fix: memory calculations for sets by @BorysTheDev in #6983
- chore(core): implement ZSTD dict based compression by @romange in #6967
- fix: crash for lazy expired sets by @BorysTheDev in #6979
- fix: SPOP crash when set members have expired per-member TTLs by @vyavdoshenko in #6990
- fix(test): Increase RSS memory usage in test_eviction_on_rss_treshold by @mkaruza in #6986
- fix: heap buffer overflow in StringSet::AddBatch when updating TTL by @vyavdoshenko in #6988
- fix: SPOP crash when randomly picked set members have expired TTLs by @vyavdoshenko in #6997
- test: Add missing fuzz seeds and mutator commands for missing functionality by @vyavdoshenko in #6969
- Use SerializerBase in RestoreSteamer, unite tiered handling by @dranikpg in #6970
- fix: crash in srandmember durin lazy expiration by @BorysTheDev in #6999
- feat: add distroless Docker image with built-in healthcheck by @vyavdoshenko in #6902
- Add compression stats and ratio threshold for ZSTD dict by @romange in #7000
- feat(facade): Basic RESP support in ioloopv2, part 2 by @dranikpg in #6885
- fix: replication of set lazy expiration by @BorysTheDev in #6994
- chore(common): Some C++20 simplifications by @dranikpg in #6883
- fix: preserve M and EF_CONSTRUCTION in HNSW index restore command by @vyavdoshenko in #7014
- feat(server): Retire RdbSerializerBase by @abhijat in #7016
- chore(facade): improve IoLoopV2 readability and simplify recv flow by @romange in #7011
- chore(server): Move search serialization to own file by @dranikpg in #6996
- feat: add tls_handshakes_total labeled metric to /metrics by @romange in #7015
- fix: coverage yml by @kostasrim in #6987
- fix: remove race condition in proxy during close by @BorysTheDev in #7019
- fix(cms): reject NaN in CMS.INITBYPROB to prevent RENAME crash by @vyavdoshenko in #7026
- fix: remove set if lazy expiration by @BorysTheDev in #7005
- Add --showlocals option to pytest configuration by @dranikpg in #7029
- chore: remove dead code by @BorysTheDev in #7032
- fix: prevent master connection closing during await_synced_all by @BorysTheDev in #7035
- feat(rdb_saver): Flush each SBF chunk during serialization by @mkaruza in #7034
- fix: increase timeout for test_redis_replication_all by @BorysTheDev in #7037
- CI: Add script to handle pip install by @abhijat in #7038
- feat: upload logs for failed test on coverage by @BorysTheDev in #7042
- chore: Increase kFiberStackBase by 8KB by @mkaruza in #7039
- chore: Extend QList::Node to support tiered storage by @mkaruza in #7021
- fix: handle properly nan in geosearch by @kostasrim in #7043
- chore: use ranges and concepts by @kostasrim in #7040
- fix: reject FT.CREATE with oversized VECTOR DIM to prevent search crash by @vyavdoshenko in #7049
- fix: increase timeout for test_network_disconnect_during_migration by @BorysTheDev in #7050
- fix: prevent OOM crash in FT.CREATE with huge PREFIX/STOPWORDS count by @vyavdoshenko in #7046
- fix(search): emit complete field params in FT.INFO and BuildRestoreCommand by @vyavdoshenko in #7044
- fix: dev container build failure on ARM runners by @vyavdoshenko in #7051
- fix: support VECTOR_RANGE query without YIELD_DISTANCE_AS clause by @vyavdoshenko in #7053
- chore: Extend TieredStorage to support QList::Node by @mkaruza in #7022
- chore(deps): bump the actions group with 3 updates by @dependabot[bot] in #7028
- fix: migrate glog references to helio wrappers by @romange in #7002
- fix(search): add missing FinalizeInitialization in numeric benchmarks by @vyavdoshenko in #7059
- chore: add fiber name assertion in SerializerBase::OnChange by @romange in #7064
- feat(search): add DIALECT, APPLY, and KNN support to FT.AGGREGATE by @vyavdoshenko in #7066
- chore: add WITH_AWS_CLOUD support for S3 snapshot storage by @romange in #6992
- fix(core): prevent MergeNodes from merging non-adjacent head and tail nodes by @vyavdoshenko in #7079
- fix(server): SIGSEGV in HRANDFIELD with expired hash fields by @vyavdoshenko in #7076
- docs(pubsub): add detailed Pub/Sub architecture guide by @glevkovich in #7075
- chore(core): extract HllEstimator class from dict_builder by @romange in #7067
- feat(server): Make parallel delayed entry processing safe by @dranikpg in #7055
- feat(tiering): Experimental list node tiering by @mkaruza in #7023
- fix: proactively delete expired keys on replicas via replica_delete_expired flag by @shahyash2609 in #6985
- tests: Add a test that verifies metrics for nested eval by @abhijat in #7085
- Consolidate duplicate replication tests into parameterized variants by @Copilot in #7086
- chore: add replication stream tracing for double-apply bug by @romange in #7071
- fix: move FT.CREATE before data insertion in SearchSortByOptionNonSortableFieldJson test by @vyavdoshenko in #7094
- fix(core): Update CVCUponInsert by @dranikpg in #7068
- chore: Add pubsub-stress.py tool by @mkaruza in #7072
- fix(proactor_threads): fix that proactor_threads is not respected in k8s env by @starek4 in #7030
- chore: Disable clang format for src/redis directory by @mkaruza in #7097
- chore(snapshot): add version DCHECK invariant by @romange in #7096
- fix(hset): SIGABRT in HSETEX when called without field-value pairs by @vyavdoshenko in #7100
- feat(search): add optional term frequency storage to CompressedSortedSet by @vyavdoshenko in #7093
- fix(server): Improve async eval command handling by @dranikpg in #7020
- fix: incorrect memory access for StringSet ttl entries by @BorysTheDev in #7102
- fix: handle STORE in unsorted path (SORT BY nosort STORE) by @vyavdoshenko in #7114
- feat: preserve indexdata during serialization by @BorysTheDev in #7061
- fix(sort): delete destination key when SORT STORE produces empty result by @vyavdoshenko in #7106
- Skip unstable test_hnsw_search_replication_with_network_disruptions by @Copilot in #7121
- fix: replica stuck during flushall by @BorysTheDev in #7120
- fix: prevent inline execution while shard lock is held by @vyavdoshenko in #7116
- feat(fuzz): add optional snapshot serialization testing to AFL++ fuzzing by @vyavdoshenko in #7119
- docs: Add GPG signing guidance to CONTRIBUTING.md by @romange in #7122
- feat(memory): add MEMORY DECOMMIT COOL subcommand by @romange in #7124
- feat: search memory now more accurately reported by @BorysTheDev in #7115
- fix(server): allow DFLY LOAD from cloud storage when --dir is local by @romange in #7081
- refactor(rdb): use std::bit_cast (Fixes private #188) by @glevkovich in #7130
- fix: use after free in HNSW by @BorysTheDev in #7129
Huge thanks to all the contributors! ❤️
New Contributors
- @asherlaau made their first contribution in #6913
- @kanaevad made their first contribution in #6917
- @shahyash2609 made their first contribution in #6985
- @starek4 made their first contribution in #7030
Full Changelog: v1.37.0...v1.38.0