ArcadeDB 26.5.1 Release Notes

Overview

ArcadeDB 26.5.1 is a major release with over 270 commits and 128 resolved issues. The headline news is the new sparse vector index with server-side hybrid retrieval and INT8 quantization end-to-end, a huge wave of OpenCypher correctness fixes, query partitioning, a new EXTERNAL property storage layout for heavy values, plus a long list of HA, wire-protocol, and Studio improvements.

Major Highlights

Sparse Vector Index + Hybrid Retrieval

A brand-new LSM_SPARSE_VECTOR index brings sparse-embedding retrieval (BM25/SPLADE-style) directly into the engine, with server-side fusion of dense and sparse results and diversified top-K. (#4065, #4066, #4067, #4068, #4070, #4078, #4119, #4130)

New index type LSM_SPARSE_VECTOR for sparse-embedding retrieval.
vector.fuse(...) server-side hybrid fusion with RRF, DBSF and LINEAR strategies.
vector.neighbors(...) gains groupBy / groupSize options for diversified retrieval, including dotted nested-field grouping (#4072) and traversal-integrated grouping (#4071).
WAND / BlockMax-WAND dynamic pruning to scale sparse retrieval to 100M+ documents.
Sparse-vector partitioning so a single index can be sharded by tenant / domain.
Reranker SQL functions for two-stage retrieval pipelines.
Bolt and HTTP wire support for the new sparse vector type, including $bytes / $int8 markers for INT8 query vectors.

INT8 Quantization for Dense Vectors

End-to-end INT8 support across the dense vector pipeline: ingest, storage and query path now share the same 8-bit representation, dramatically reducing disk and RSS without going through the FP32 path. (#3143, #4132, #4133, #4135, #4136)

EXTERNAL Property Storage

New paired-bucket layout for heavy property values (vectors, large strings, JSON). Hot row data stays compact in the main bucket while bulky payloads live in a sibling external bucket, sharply reducing scan cost on wide records. (#4027, #4028)

Query Partitioning

Query-level partitioning lands together with a partition-aware planner that prunes pruned partitions from SQL and Cypher plans, plus integrity guardrails for partitioned types. (#4087, #4088)

HA: Offline Cluster Bootstrap

You can now bootstrap a fresh HA cluster from a pre-seeded database (snapshot-and-restore), avoiding a full re-replication of large datasets when expanding or rebuilding a cluster. Includes regression integration tests for the cluster-formation edge cases. (#4147, #4205)

Production-Ready Helm Chart

The Helm chart has been reworked to align with the Raft-based HA subsystem introduced in 26.4.2 and is now suitable for production rollouts. (#4035)

Cypher: SHOW INDEXES / SHOW CONSTRAINTS

Standard Cypher administrative commands SHOW INDEXES and SHOW CONSTRAINTS are now supported. (#3972)

SQL: FIND REFERENCES

Restores the OrientDB-compatible FIND REFERENCES command for locating all records pointing to a given RID. (#4146)

C# End-to-End Tests

A new e2e suite exercises ArcadeDB over the Postgres wire via Npgsql and Testcontainers, validating the C# client path on every build. (#4036, #4038)

HA Operational Improvements

Optional human-readable peer names in HA_SERVER_LIST for friendlier cluster topology. (#3974)
Studio gains peer add/remove controls in the HA cluster panel. (#4145)

Studio Improvements

Full-screen mode for graph view. (#4032)
Clear query button / textbox. (#4121)
Session reset on token expire instead of silent failure. (#4082)
Error messages now persist instead of disappearing after a few seconds. (#4124)
Query history no longer auto-submits on selection. (#4022)
Inherited indexes are now visible. (#4140)

Major Fixes

OpenCypher Correctness

A large batch of correctness fixes landed across pattern matching, write clauses, subqueries, list/temporal expressions and the optimizer fast-path. Highlights:

valueType(...) now reports the NOT NULL suffix for non-null values. (#3991)
point(...) WGS-84-3D exposes .height as an alias for .z. (#3992)
CALL ... YIELD no longer nullifies variables carried in from WITH. (#3996, #4094)
collect(r) followed by a variable-length match no longer drops all rows. (#3997)
Variable-length pattern segments no longer re-traverse a relationship bound in a prior MATCH. (#4006)
MERGE with an unbound label-only endpoint now creates a fresh node instead of reusing an existing one. (#3998)
SET now propagates to all aliases bound to the same node within the same query. (#4000)
Self-referential property updates and SET :Label are now idempotent across row fanout. (#4016, #4017)
Temporal component access on date/datetime values no longer returns null. (#4018)
WITH-carried node variables are no longer nulled out by a later CREATE / MERGE. (#4019)
allReduce(...) no longer evaluates true cases as false. (#4043)
Anonymous middle nodes in multi-hop chains now match rows. (#4092)
Backslashes in string literals and property values are preserved. (#4093)
Consecutive directed and undirected relationship patterns must not reuse the same edge. (#4095, #4096)
EXISTS { ... } subqueries returning an outer-variable expression no longer evaluate as false. (#4097)
List literals containing duration(...) no longer drop rows. (#4099)
List subscript with an inline aggregate index no longer returns null. (#4100)
MATCH immediately after CREATE now sees newly created labeled nodes. (#4101)
MATCH on a null carried variable now correctly filters out the row. (#4102)
MERGE ... ON MATCH SET no longer returns the pre-update property value. (#4103)
MERGE patterns no longer reuse a newly created endpoint across input rows. (#4104)
Node label-union patterns now match when either label exists. (#4105)
Pattern comprehensions over existing relationships are no longer empty. (#4106)
reduce(...) over an inline aggregate expression is now evaluated correctly. (#4107)
Relationship type predicates on bound relationship variables no longer evaluate as false. (#4108)
Repeated relationship variables in WHERE patterns now match no rows (as expected). (#4109)
Uncorrelated pattern predicates now correctly reflect existing relationships. (#4110)
Variable-length pattern comprehensions no longer duplicate projected elements. (#4111)
WHERE false literal predicates are no longer ignored. (#4112)
datetime() values are now persisted on DATETIME-typed properties. (#4125)
OR EXISTS + AND NOT (EXISTS ... OR EXISTS ...) returns the correct rows under two-outer-MATCH binding. (#4126)
Optimizer fast path is now skipped when a write clause precedes MATCH. (#4131)
CALL subquery SET no longer leaves the carried outer variable stale. (#4182)
id(...) is now numeric and no longer breaks numeric predicates. (#4183)
shortestPath / allShortestPaths with variable-length relationship type alternation now match. (#4190)
Write-only CALL subqueries no longer return an extra empty row. (#4191)
MATCH on a parent edge type now matches sub-typed edges (polymorphic edge traversal). (#4192)
Batch fixes for #4184, #4185, #4186, #4188, #4189. (#4196)

SQL

CONTAINSALL now works when comparing a list of Identifiables against a list of RID strings. (#4002)
Correlated COLLECT { ... } / COUNT { ... } subqueries with outer-variable access now evaluate correctly instead of always returning empty/zero. (#4014, #4015)
SEARCH_INDEX and SEARCH_FIELDS now propagate return values in filters and correctly handle wildcards. (#4023, #4030)
SELECT with a non-unique LSM index no longer returns zero rows after partial deletes (per-RID tombstones no longer suppress the whole key). (#4024)
Edge creation with CONTENT no longer silently ignores properties. (#4033)
algo.dijkstra no longer yields a weight of zero. (#4042)
LIST of STRING in GraphBatch works again. (#4069)
UPDATE EDGE SET @in/@out correctly rewires the vertex edge lists. (#4074)
= combined with LIKE on time-series types no longer returns zero results. (#4128)
Range queries no longer raise a spurious "Non-existent edge type" error. (#4199)
point.withinBBox(...) now supports cross-meridian bounding boxes. (#3994)

Storage, Indexing and Schema

HASH index lookups now return rows when data encryption is enabled (keys are kept deterministic). (#4137)
Orphan TypeIndex wrapper is now dropped when its last bucket child is removed. (#4179)
Indexes on a subclass are no longer incorrectly related to superclass indexes. (#4120)
Manual index names are now respected on creation. (#4139)
Inherited indexes are now shown in Studio. (#4140)

High Availability

Schema changes now ship to followers, closing a WAL-gap source. (#4077)
Cluster inconsistency reports after node shutdowns resolved. (#4081)
Massive inserts over gRPC now replicate correctly. (#4076)
Correct leader is now reported in the resume table. (#4075)
ClassCastException (RaftReplicatedDatabase to LocalDatabase) on the leader during import / read-only property writes fixed. (#4144)
/api/v1/batch no longer fails with "Error on updating dictionary" on follower nodes. (#4039, #4122)
/batch endpoint no longer returns HTTP 500 NPE after a successful commit. (#4123)
Spurious index warnings from cluster followers removed. (#4063)
e2e-ha integration tests stabilized, with on-demand Toxiproxy support. (#4013, #4020)

Wire Protocols

PostgreSQL: empty SELECT results now include the RowDescription schema (#3971); SHOW server_version returns a proper value for SQLAlchemy (#4116); Cypher WHERE id(n) IN $array round-trips correctly after id() became numeric (#4200); binary array deserialization implemented to unblock JDBC setArray (#4203); named and positional parameters work via Npgsql (C#) (#4036).
Bolt: EXPLAIN / PROFILE plans are now included in PULL SUCCESS metadata, fixing Neo4j drivers' summary.Plan() (#4129); Bolt executor recognises the new sparse vector type (#4079).
gRPC: InsertStream throughput no longer collapses 20-30x after a few hundred unary executeQuery calls (leaked ResultSets closed) (#4197); commit-time constraint violations are surfaced as a stream-level INTERNAL error instead of being silently absorbed (#4198); DATE columns are no longer corrupted via parameter binding (#4181); ARRAY_OF_LONGS and DATETIME parameter binding preserve int64 / fractional-second precision (#4148, #4149); cluster replication on massive inserts via gRPC fixed (#4076).
HTTP: INT8 query vectors are routed to byte[] via $bytes / $int8 markers for end-to-end INT8 payload savings (#4135); RemoteGraphBatch now honors unique edge constraints (#4113); edge DATETIME parser accepts ISO suffixes (#4142).

Database Lifecycle

GraphAnalyticalView async restore no longer fails with "Transaction not started on current thread" when reopening a database. (#4180)
Database restore process and error logging improved. (#4026)
GraphBatch no longer errors on transaction commit in mixed-batch scenarios. (#4080)

Python Bindings

Python bindings refreshed with Codacy/Bandit cleanup, formatting fixes and updated workflow triggers. (#4011, #4041, #4084)

Dependencies

Notable upgrades in this release include:

Netty 4.2.13.Final
Undertow 2.4.0.Final
PostgreSQL JDBC 42.7.11
Neo4j Java Driver 6.1.0
Jackson Databind 2.21.3
Gson 2.14.0
Swagger 2.2.49
JLine 4.1.0
GraalVM 25.0.3
TestContainers 2.0.5
Apache TinkerPop / Gremlin compatibility maintained

Plus the usual round of Studio frontend updates (Cytoscape, ApexCharts, SwaggerUI, Marked, PostCSS, Terser, pdfmake and webpack toolchain), CI / GitHub Actions bumps, Docker base image refresh (Eclipse Temurin) and several security-critical Studio dependency updates.

ArcadeData/arcadedb 26.5.1 on GitHub