4.1.0
Release Date: April 21, 2026
Shared-data Architecture
-
New Multi-Tenant Data Management
Shared-data clusters now support range-based data distribution and automatic splitting and merging of tablets. Tablets can be automatically split when they become oversized or hotspots, without requiring schema changes, SQL modifications, or data re-ingestion. This feature can significantly improve usability, directly addressing data skew and hotspot issues in multi-tenant workloads. #65199 #66342 #67056 #67386 #68342 #68569 #66743 #67441 #68497 #68591 #66672 #69155
-
Large-Capacity Tablet Support (Phase 1)
Supports significantly larger per-tablet data capacity for shared-data clusters, with a long-term target of 100 GB per tablet. Phase 1 focuses on enabling parallel Compaction and parallel MemTable finalization within a single Lake tablet, reducing ingestion and Compaction overhead as tablet size grows. #66586 #68677
-
Fast Schema Evolution V2
Shared-data clusters now support Fast Schema Evolution V2, which enables second-level DDL execution for schema operations, and further extends the support to materialized views. #65726 #66774 #67915
-
[Beta] Inverted Index on shared-data
Enables built-in inverted indexes for shared-data clusters to accelerate text filtering and full-text search workloads. #66541
-
Cache Observability
Query-level cache hit ratio is now exposed in audit logs and the monitoring system for better cache transparency and latency diagnosis. Additional Data Cache metrics include memory and disk quota usage, and page cache statistics. #63964
-
Added segment metadata filter for Lake tables to skip irrelevant segments based on sort key range during scans, reducing I/O for range-predicate queries. #68124
-
Supports fast cancel for Lake DeltaWriter, reducing latency for cancelled ingestion jobs in shared-data clusters. #68877
-
Added support for interval-based scheduling for automated cluster snapshots. #67525
-
Supports pipeline execution for MemTable flush and merge, improving ingestion throughput for cloud-native tables in shared-data clusters. #67878
-
Supports
dry_runmode for repairing cloud-native tables, allowing users to preview repair actions before execution. #68494 -
Added a thread pool for publish transactions in shared-nothing clusters, improving publish throughput. #67797
-
Supports dynamically modifying the
datacache.enableproperty for cloud-native tables. #69011
Data Lake Analytics
-
Iceberg DELETE Support
Supports writing position delete files for Iceberg tables, enabling DELETE operations on Iceberg tables directly from StarRocks. The support covers the full pipeline of Plan, Sink, Commit, and Audit. #67259 #67277 #67421 #67567
-
TRUNCATE for Hive and Iceberg Tables
Supports TRUNCATE TABLE on external Hive and Iceberg tables. #64768 #65016
-
Incremental materialized view on Iceberg
Extends the support for incremental materialized view refresh to Iceberg append-only tables, enabling query acceleration without full table refresh. #65469 #62699
-
VARIANT Type for Semi-Structured Data in Iceberg
Supports the VARIANT data type in Iceberg Catalog for flexible, schema-on-read storage and querying of semi-structured data. Supports read, write, type casting, and Parquet integration. #63639 #66539
-
Iceberg v3 Support
Added support for Iceberg v3 default value feature and row lineage. #69525 #69633
-
Iceberg Table Maintenance Procedures
Added support for
rewrite_manifestsprocedure and extendedexpire_snapshotsandremove_orphan_filesprocedures with additional arguments for finer-grained table maintenance. #68817 #68898 -
Iceberg
$propertiesMetadata TableAdded support for querying Iceberg table properties via the
$propertiesmetadata table. #68504 -
Supports reading file path and row position metadata columns from Iceberg tables. #67003
-
Supports reading
_row_idfrom Iceberg v3 tables, and supports global late materialization for Iceberg v3. #62318 #64133 -
Supports creating Iceberg views with custom properties, and displays properties in SHOW CREATE VIEW output. #65938
-
Supports querying Paimon tables with a specific branch, tag, version, or timestamp. #63316
-
Supports complex types (ARRAY, MAP, STRUCT) for Paimon tables. #66784
-
Supports Paimon views. #56058
-
Supports TRUNCATE for Paimon tables. #67559
-
Supports Partition Transforms with parentheses syntax when creating Iceberg tables. #68945
-
Supports ALTER TABLE REPLACE PARTITION COLUMN for Iceberg tables. #70508
-
Supports Iceberg global shuffle based on Transform Partition for improved data organization. #70009
-
Supports dynamically enabling global shuffle for Iceberg table sink. #67442
-
Introduced a Commit queue for Iceberg table sink to avoid concurrent Commit conflicts. #68084
-
Added host-level sorting for Iceberg table sink to improve data organization and reading performance. #68121
-
Enabled additional optimizations in ETL execution mode by default, improving performance for INSERT INTO SELECT, CREATE TABLE AS SELECT, and similar batch operations without explicit configuration. #66841
-
Added commit audit information for INSERT and DELETE operations on Iceberg tables. #69198
-
Supports enabling or disabling view endpoint operations in Iceberg REST Catalog. #66083
-
Optimized cache lookup efficiency in CachingIcebergCatalog. #66388
-
Supports EXPLAIN on various Iceberg catalog types. #66563
-
Supports partition projection for tables in AWS Glue Catalog tables. #67601
-
Added resource share type support for AWS Glue
GetDatabasesAPI. #69056 -
Supports Azure ABFS/WASB path mapping with endpoint injection (
azblob/adls2). #67847 -
Added a database metadata cache for JDBC catalog to reduce remote RPC overhead and impact of external system failures. #68256
-
Added
schema_resolverproperty for JDBC catalog to support custom schema resolution. #68682 -
Supports column comments for PostgreSQL tables in
information_schema. #70520 -
Improved Oracle and PostgreSQL JDBC type mapping. #70315 #70566
Query Engine
-
Recursive CTE
Supports Recursive Common Table Expressions (CTEs) for hierarchical traversals, graph queries, and iterative SQL computations. #65932
-
Improved Skew Join v2 rewrite with statistics-based skew detection, histogram support, and NULL-skew awareness. #68680 #68886
-
Improved COUNT DISTINCT over windows and added support for fused multi-distinct aggregations. #67453
-
Supports explicit skew hint for window functions, with automatic optimization of window functions with skewed partition keys by splitting into UNION. #68739 #67944
-
Supports materialization hints for CTEs. #70802
-
Enabled Global Lazy Materialization by default, improving query performance by deferring column reads until needed. #70412
-
Supports EXPLAIN and EXPLAIN ANALYZE for INSERT statements in Trino Parser. #70174
-
Supports EXPLAIN for query queue visibility. #69933
Functions and SQL Syntax
- Added the following functions:
array_top_n: Returns the top N elements from an array ranked by value. #63376arrays_zip: Combines multiple arrays element-wise into an array of structs. #65556json_pretty: Formats a JSON string with indentation. #66695json_set: Sets a value at a specified path within a JSON string. #66193initcap: Converts the first letter of each word to uppercase. #66837sum_map: Sums MAP values across rows with the same key. #67482current_timezone: Returns the current session timezone. #63653current_warehouse: Returns the name of the current warehouse. #66401sec_to_time: Converts the number of seconds to a TIME value. #62797ai_query: Calls an external AI model from SQL for inference workloads. #61583min_n/max_n: Aggregate functions that return the top N minimum/maximum values. #63807regexp_position: Returns the position of a regular expression match in a string. #67252is_json_scalar: Returns whether a JSON value is a scalar. #66050get_json_scalar: Extracts a scalar value from a JSON string. #68815raise_error: Raises a user-defined error in SQL expressions. #69661uuid_v7: Generates time-ordered UUID v7 values. #67694STRING_AGG: Syntactic sugar for GROUP_CONCAT. #64704
- Provides the following function or syntactic extensions:
- Supports a lambda comparator in
array_sortfor custom sort ordering. #66607 - Supports USING clause for FULL OUTER JOIN with SQL-standard semantics. #65122
- Supports DISTINCT aggregation over framed window functions with ORDER BY/PARTITION BY. #65815 #65030 #67453
- Supports ARRAY type in
lead/lag/first_value/last_valuewindow functions. #63547 - Supports VARBINARY for count distinct-like aggregate functions. #68442
- Supports
MULTIPLY/DIVIDEfor interval operations. #68407 - Supports date and string type casting in IN expressions. #61746
- Supports WITH LABEL syntax for BEGIN/START TRANSACTION. #68320
- Supports WHERE/ORDER/LIMIT clauses in SHOW statements. #68834
- Supports
ALTER TASKstatements for task management. #68675 - Supports SQL UDF creation via
CREATE FUNCTION ... AS <sql_body>. #67558 - Supports loading UDFs from S3. #64541
- Supports named parameters in Scala functions. #66344
- Supports multiple compression formats (GZIP/SNAPPY/ZSTD/LZ4/DEFLATE/ZLIB/BZIP2) for CSV file exports. #68054
- Supports
STRUCT_CAST_BY_NAMESQL mode for name-based struct field matching. #69845 - Supports
last_query_id()inANALYZE PROFILEfor easy query profile analysis. #64557
- Supports a lambda comparator in
Management & Observability
- Supports
warehouses,cpu_weight_percent, andexclusive_cpu_weightattributes for resource groups to improve multi-warehouse CPU resource isolation. #66947 - Introduces the
information_schema.fe_threadssystem view to inspect the FE thread state. #65431 - Supports SQL Digest Blacklist to block specific query patterns at the cluster level. #66499
- Supports Arrow Flight Data Retrieval from nodes that are otherwise inaccessible due to network topology constraints. #66348
- Introduces the REFRESH CONNECTIONS command to propagate global variable changes to existing connections without reconnecting. #64964
- Added built-in UI functions to analyze query profiles and view formatted SQL, making query tuning more accessible. #63867
- Implements
ClusterSummaryActionV2API endpoint to provide a structured cluster overview. #68836 - Added a global read-only system variable
@@run_modeto query the current cluster run mode (shared-data or shared-nothing). #69247 - Enabled
query_queue_v2by default for improved query queue management. #67462 - Supports user-level default warehouse for Stream Load and Merge Commit operations. #68106 #68616
- Added
skip_black_listsession variable to bypass backend blacklist verification when needed. #67467 - Added
enable_table_metrics_collectoption for the metrics API. #68691 - Added impersonate user support for query detail HTTP API. #68674
- Added
table_query_timeoutas a table-level property. #67547 - Added FE profile logging with configurable latency threshold. #69396
- Supports adding FE observer nodes. #67778
- Supports Merge Commit information in
information_schema.loadsfor better load job visibility. #67879 - Supports showing tablet status in cloud-native tables for better troubleshooting. #69616
- Added per-catalog-type query metrics for external catalog observability. #70533
- Added Debian (.deb) packaging support for FE and BE. #68821
Security
- [CVE-2026-33870] [CVE-2026-33871] Replaced AWS bundle and bumped Netty to 4.1.132.Final. #71017
- [CVE-2025-27821] Upgraded Hadoop to v3.4.2. #68529
- [CVE-2025-54920] Upgraded
spark-core_2.12to 3.5.7. #70862
Bug Fixes
The following issues have been fixed:
- Fixed data loss after tablet split by skipping data file deletion for range distribution tablets. #71135
- Fixed a memory leak in
DefaultValueColumnIteratorfor complex types. #71142 - Fixed a memory leak caused by
shared_ptrcycle betweenBatchUnitandFetchTaskContext. #71126 - Fixed use-after-free in parallel segment/rowset loading on error path. #71083
- Fixed potential hash table data loss in aggregation spill
set_finishing. #70851 - Fixed double-free crash in SystemMetrics due to concurrent getline access. #71040
- Fixed crash in SpillMemTableSink when eager merge consumes all blocks. #69046
- Fixed NPE in
visitDictionaryGetExprwhen dictionary backing table is dropped. #71109 - Fixed NPE when analyzing generated columns in Stream Load/Broker Load if a referenced column is missing. #71116
- Fixed NPE when auto-created partition is dropped by TTL cleaner. #68257
- Fixed NPE in
IcebergCatalog.getPartitionLastUpdatedTimewhen snapshot is expired. #68925 - Fixed incorrect predicate rewrite for outer join with constant-side column reference. #67072
- Fixed PK tablet rowset meta loss caused by GC race during disk re-migration (A→B→A). #70727
- Fixed DB read lock leak in SharedDataStorageVolumeMgr. #70987
- Fixed error query results after modify CHAR column length in shared-data. #68808
- Fixed MV refresh bug in the case of multiple tables. #61763
- Fixed incorrect MV recycle time if force refreshed. #68673
- Fixed all-null value handling bug in sync MV. #69136
- Fixed duplicate column id error when querying MV after fast schema change ADD COLUMN. #71072
- Fixed IVM refresh recording incomplete PCT partition metadata. #71092
- Fixed low-cardinality rewrite NPE caused by shared DecodeInfo. #68799
- Fixed low-cardinality join predicate type mismatch. #68568
- Fixed Segfault in Parquet Page Index Filter when
null_countsempty. #68463 - Fixed JSON flatten array and object conflict on identical paths. #68804
- Fixed Iceberg cache weigher inaccuracies. #69058
- Fixed Iceberg table cache memory limit. #67769
- Fixed Iceberg delete column nullability issue. #68649
- Fixed Azure ABFS/WASB FileSystem cache key to include container. #68901
- Fixed deadlock when the HMS connection pool is full. #68033
- Fixed incorrect length for VARCHAR field type in Paimon Catalog. #68383
- Fixed Paimon catalog refresh crash with ClassCastException on ObjectTable. #70224
- Fixed PaimonView resolving table references against default_catalog instead of the Paimon catalog. #70217
- Fixed FULL OUTER JOIN USING with constant subqueries. #69028
- Fixed join on clause bug with CTE scope. #68809
- Fixed missing partition predicate in short-circuit point lookup. #71124
- Fixed ConnectContext memory leaks by using bindScope() pattern. #68215
- Fixed memory leak in
CatalogRecycleBin.asyncDeleteForTablesfor shared-nothing clusters. #68275 - Fixed Thrift accept thread from exiting when it encounters any exception. #68644
- Fixed UDF resolution in routine load column mappings. #68201
- Fixed
DROP FUNCTION IF EXISTSignoringifExistsflag. #69216 - Fixed scan result error when dict page is too large. #68258
- Fixed range partition overlap. #68255
- Fixed query queue allocation time and pending timeout. #65802
- Fixed
array_mapcrash when processing null literal array. #70629 - Fixed stack overflow for
to_base64. #70623 - Fixed optimizer timeout issue. #70605
- Fixed case-insensitive username normalization for LDAP authentication. #67966
- Mitigated SSRF risk for API
proc_file. #68997 - Masked user auth strings in audit and SQL redaction. #70360
Behavior Changes
- ETL execution mode optimizations are now enabled by default. This benefits INSERT INTO SELECT, CREATE TABLE AS SELECT, and similar batch workloads without explicit configuration changes. #66841
- The third argument of
lag/leadwindow functions now supports column references in addition to constant values. #60209 - FULL OUTER JOIN USING now follows SQL-standard semantics: the USING column appears once in the output instead of twice. #65122
- Global Lazy Materialization is now enabled by default. #70412
query_queue_v2is now enabled by default. #67462- SQL transactions are gated behind the session variable
enable_sql_transactionby default. #63535