github ArcadeData/arcadedb 26.5.1

2 hours ago

ArcadeDB 26.5.1 Release Notes

Overview

ArcadeDB 26.5.1 is a major release with over 270 commits and 128 resolved issues. The headline news is the new sparse vector index with server-side hybrid retrieval and INT8 quantization end-to-end, a huge wave of OpenCypher correctness fixes, query partitioning, a new EXTERNAL property storage layout for heavy values, plus a long list of HA, wire-protocol, and Studio improvements.

Major Highlights

Sparse Vector Index + Hybrid Retrieval

A brand-new LSM_SPARSE_VECTOR index brings sparse-embedding retrieval (BM25/SPLADE-style) directly into the engine, with server-side fusion of dense and sparse results and diversified top-K. (#4065, #4066, #4067, #4068, #4070, #4078, #4119, #4130)

  • New index type LSM_SPARSE_VECTOR for sparse-embedding retrieval.
  • vector.fuse(...) server-side hybrid fusion with RRF, DBSF and LINEAR strategies.
  • vector.neighbors(...) gains groupBy / groupSize options for diversified retrieval, including dotted nested-field grouping (#4072) and traversal-integrated grouping (#4071).
  • WAND / BlockMax-WAND dynamic pruning to scale sparse retrieval to 100M+ documents.
  • Sparse-vector partitioning so a single index can be sharded by tenant / domain.
  • Reranker SQL functions for two-stage retrieval pipelines.
  • Bolt and HTTP wire support for the new sparse vector type, including $bytes / $int8 markers for INT8 query vectors.

INT8 Quantization for Dense Vectors

End-to-end INT8 support across the dense vector pipeline: ingest, storage and query path now share the same 8-bit representation, dramatically reducing disk and RSS without going through the FP32 path. (#3143, #4132, #4133, #4135, #4136)

EXTERNAL Property Storage

New paired-bucket layout for heavy property values (vectors, large strings, JSON). Hot row data stays compact in the main bucket while bulky payloads live in a sibling external bucket, sharply reducing scan cost on wide records. (#4027, #4028)

Query Partitioning

Query-level partitioning lands together with a partition-aware planner that prunes pruned partitions from SQL and Cypher plans, plus integrity guardrails for partitioned types. (#4087, #4088)

HA: Offline Cluster Bootstrap

You can now bootstrap a fresh HA cluster from a pre-seeded database (snapshot-and-restore), avoiding a full re-replication of large datasets when expanding or rebuilding a cluster. Includes regression integration tests for the cluster-formation edge cases. (#4147, #4205)

Production-Ready Helm Chart

The Helm chart has been reworked to align with the Raft-based HA subsystem introduced in 26.4.2 and is now suitable for production rollouts. (#4035)

Cypher: SHOW INDEXES / SHOW CONSTRAINTS

Standard Cypher administrative commands SHOW INDEXES and SHOW CONSTRAINTS are now supported. (#3972)

SQL: FIND REFERENCES

Restores the OrientDB-compatible FIND REFERENCES command for locating all records pointing to a given RID. (#4146)

C# End-to-End Tests

A new e2e suite exercises ArcadeDB over the Postgres wire via Npgsql and Testcontainers, validating the C# client path on every build. (#4036, #4038)

HA Operational Improvements

  • Optional human-readable peer names in HA_SERVER_LIST for friendlier cluster topology. (#3974)
  • Studio gains peer add/remove controls in the HA cluster panel. (#4145)

Studio Improvements

  • Full-screen mode for graph view. (#4032)
  • Clear query button / textbox. (#4121)
  • Session reset on token expire instead of silent failure. (#4082)
  • Error messages now persist instead of disappearing after a few seconds. (#4124)
  • Query history no longer auto-submits on selection. (#4022)
  • Inherited indexes are now visible. (#4140)

Major Fixes

OpenCypher Correctness

A large batch of correctness fixes landed across pattern matching, write clauses, subqueries, list/temporal expressions and the optimizer fast-path. Highlights:

  • valueType(...) now reports the NOT NULL suffix for non-null values. (#3991)
  • point(...) WGS-84-3D exposes .height as an alias for .z. (#3992)
  • CALL ... YIELD no longer nullifies variables carried in from WITH. (#3996, #4094)
  • collect(r) followed by a variable-length match no longer drops all rows. (#3997)
  • Variable-length pattern segments no longer re-traverse a relationship bound in a prior MATCH. (#4006)
  • MERGE with an unbound label-only endpoint now creates a fresh node instead of reusing an existing one. (#3998)
  • SET now propagates to all aliases bound to the same node within the same query. (#4000)
  • Self-referential property updates and SET :Label are now idempotent across row fanout. (#4016, #4017)
  • Temporal component access on date/datetime values no longer returns null. (#4018)
  • WITH-carried node variables are no longer nulled out by a later CREATE / MERGE. (#4019)
  • allReduce(...) no longer evaluates true cases as false. (#4043)
  • Anonymous middle nodes in multi-hop chains now match rows. (#4092)
  • Backslashes in string literals and property values are preserved. (#4093)
  • Consecutive directed and undirected relationship patterns must not reuse the same edge. (#4095, #4096)
  • EXISTS { ... } subqueries returning an outer-variable expression no longer evaluate as false. (#4097)
  • List literals containing duration(...) no longer drop rows. (#4099)
  • List subscript with an inline aggregate index no longer returns null. (#4100)
  • MATCH immediately after CREATE now sees newly created labeled nodes. (#4101)
  • MATCH on a null carried variable now correctly filters out the row. (#4102)
  • MERGE ... ON MATCH SET no longer returns the pre-update property value. (#4103)
  • MERGE patterns no longer reuse a newly created endpoint across input rows. (#4104)
  • Node label-union patterns now match when either label exists. (#4105)
  • Pattern comprehensions over existing relationships are no longer empty. (#4106)
  • reduce(...) over an inline aggregate expression is now evaluated correctly. (#4107)
  • Relationship type predicates on bound relationship variables no longer evaluate as false. (#4108)
  • Repeated relationship variables in WHERE patterns now match no rows (as expected). (#4109)
  • Uncorrelated pattern predicates now correctly reflect existing relationships. (#4110)
  • Variable-length pattern comprehensions no longer duplicate projected elements. (#4111)
  • WHERE false literal predicates are no longer ignored. (#4112)
  • datetime() values are now persisted on DATETIME-typed properties. (#4125)
  • OR EXISTS + AND NOT (EXISTS ... OR EXISTS ...) returns the correct rows under two-outer-MATCH binding. (#4126)
  • Optimizer fast path is now skipped when a write clause precedes MATCH. (#4131)
  • CALL subquery SET no longer leaves the carried outer variable stale. (#4182)
  • id(...) is now numeric and no longer breaks numeric predicates. (#4183)
  • shortestPath / allShortestPaths with variable-length relationship type alternation now match. (#4190)
  • Write-only CALL subqueries no longer return an extra empty row. (#4191)
  • MATCH on a parent edge type now matches sub-typed edges (polymorphic edge traversal). (#4192)
  • Batch fixes for #4184, #4185, #4186, #4188, #4189. (#4196)

SQL

  • CONTAINSALL now works when comparing a list of Identifiables against a list of RID strings. (#4002)
  • Correlated COLLECT { ... } / COUNT { ... } subqueries with outer-variable access now evaluate correctly instead of always returning empty/zero. (#4014, #4015)
  • SEARCH_INDEX and SEARCH_FIELDS now propagate return values in filters and correctly handle wildcards. (#4023, #4030)
  • SELECT with a non-unique LSM index no longer returns zero rows after partial deletes (per-RID tombstones no longer suppress the whole key). (#4024)
  • Edge creation with CONTENT no longer silently ignores properties. (#4033)
  • algo.dijkstra no longer yields a weight of zero. (#4042)
  • LIST of STRING in GraphBatch works again. (#4069)
  • UPDATE EDGE SET @in/@out correctly rewires the vertex edge lists. (#4074)
  • = combined with LIKE on time-series types no longer returns zero results. (#4128)
  • Range queries no longer raise a spurious "Non-existent edge type" error. (#4199)
  • point.withinBBox(...) now supports cross-meridian bounding boxes. (#3994)

Storage, Indexing and Schema

  • HASH index lookups now return rows when data encryption is enabled (keys are kept deterministic). (#4137)
  • Orphan TypeIndex wrapper is now dropped when its last bucket child is removed. (#4179)
  • Indexes on a subclass are no longer incorrectly related to superclass indexes. (#4120)
  • Manual index names are now respected on creation. (#4139)
  • Inherited indexes are now shown in Studio. (#4140)

High Availability

  • Schema changes now ship to followers, closing a WAL-gap source. (#4077)
  • Cluster inconsistency reports after node shutdowns resolved. (#4081)
  • Massive inserts over gRPC now replicate correctly. (#4076)
  • Correct leader is now reported in the resume table. (#4075)
  • ClassCastException (RaftReplicatedDatabase to LocalDatabase) on the leader during import / read-only property writes fixed. (#4144)
  • /api/v1/batch no longer fails with "Error on updating dictionary" on follower nodes. (#4039, #4122)
  • /batch endpoint no longer returns HTTP 500 NPE after a successful commit. (#4123)
  • Spurious index warnings from cluster followers removed. (#4063)
  • e2e-ha integration tests stabilized, with on-demand Toxiproxy support. (#4013, #4020)

Wire Protocols

  • PostgreSQL: empty SELECT results now include the RowDescription schema (#3971); SHOW server_version returns a proper value for SQLAlchemy (#4116); Cypher WHERE id(n) IN $array round-trips correctly after id() became numeric (#4200); binary array deserialization implemented to unblock JDBC setArray (#4203); named and positional parameters work via Npgsql (C#) (#4036).
  • Bolt: EXPLAIN / PROFILE plans are now included in PULL SUCCESS metadata, fixing Neo4j drivers' summary.Plan() (#4129); Bolt executor recognises the new sparse vector type (#4079).
  • gRPC: InsertStream throughput no longer collapses 20-30x after a few hundred unary executeQuery calls (leaked ResultSets closed) (#4197); commit-time constraint violations are surfaced as a stream-level INTERNAL error instead of being silently absorbed (#4198); DATE columns are no longer corrupted via parameter binding (#4181); ARRAY_OF_LONGS and DATETIME parameter binding preserve int64 / fractional-second precision (#4148, #4149); cluster replication on massive inserts via gRPC fixed (#4076).
  • HTTP: INT8 query vectors are routed to byte[] via $bytes / $int8 markers for end-to-end INT8 payload savings (#4135); RemoteGraphBatch now honors unique edge constraints (#4113); edge DATETIME parser accepts ISO suffixes (#4142).

Database Lifecycle

  • GraphAnalyticalView async restore no longer fails with "Transaction not started on current thread" when reopening a database. (#4180)
  • Database restore process and error logging improved. (#4026)
  • GraphBatch no longer errors on transaction commit in mixed-batch scenarios. (#4080)

Python Bindings

Python bindings refreshed with Codacy/Bandit cleanup, formatting fixes and updated workflow triggers. (#4011, #4041, #4084)

Dependencies

Notable upgrades in this release include:

  • Netty 4.2.13.Final
  • Undertow 2.4.0.Final
  • PostgreSQL JDBC 42.7.11
  • Neo4j Java Driver 6.1.0
  • Jackson Databind 2.21.3
  • Gson 2.14.0
  • Swagger 2.2.49
  • JLine 4.1.0
  • GraalVM 25.0.3
  • TestContainers 2.0.5
  • Apache TinkerPop / Gremlin compatibility maintained

Plus the usual round of Studio frontend updates (Cytoscape, ApexCharts, SwaggerUI, Marked, PostCSS, Terser, pdfmake and webpack toolchain), CI / GitHub Actions bumps, Docker base image refresh (Eclipse Temurin) and several security-critical Studio dependency updates.

Don't miss a new arcadedb release

NewReleases is sending notifications on new releases.