github StarRocks/starrocks 4.1.0

3 hours ago

4.1.0

Release Date: April 21, 2026

Shared-data Architecture

  • New Multi-Tenant Data Management

    Shared-data clusters now support range-based data distribution and automatic splitting and merging of tablets. Tablets can be automatically split when they become oversized or hotspots, without requiring schema changes, SQL modifications, or data re-ingestion. This feature can significantly improve usability, directly addressing data skew and hotspot issues in multi-tenant workloads. #65199 #66342 #67056 #67386 #68342 #68569 #66743 #67441 #68497 #68591 #66672 #69155

  • Large-Capacity Tablet Support (Phase 1)

    Supports significantly larger per-tablet data capacity for shared-data clusters, with a long-term target of 100 GB per tablet. Phase 1 focuses on enabling parallel Compaction and parallel MemTable finalization within a single Lake tablet, reducing ingestion and Compaction overhead as tablet size grows. #66586 #68677

  • Fast Schema Evolution V2

    Shared-data clusters now support Fast Schema Evolution V2, which enables second-level DDL execution for schema operations, and further extends the support to materialized views. #65726 #66774 #67915

  • [Beta] Inverted Index on shared-data

    Enables built-in inverted indexes for shared-data clusters to accelerate text filtering and full-text search workloads. #66541

  • Cache Observability

    Query-level cache hit ratio is now exposed in audit logs and the monitoring system for better cache transparency and latency diagnosis. Additional Data Cache metrics include memory and disk quota usage, and page cache statistics. #63964

  • Added segment metadata filter for Lake tables to skip irrelevant segments based on sort key range during scans, reducing I/O for range-predicate queries. #68124

  • Supports fast cancel for Lake DeltaWriter, reducing latency for cancelled ingestion jobs in shared-data clusters. #68877

  • Added support for interval-based scheduling for automated cluster snapshots. #67525

  • Supports pipeline execution for MemTable flush and merge, improving ingestion throughput for cloud-native tables in shared-data clusters. #67878

  • Supports dry_run mode for repairing cloud-native tables, allowing users to preview repair actions before execution. #68494

  • Added a thread pool for publish transactions in shared-nothing clusters, improving publish throughput. #67797

  • Supports dynamically modifying the datacache.enable property for cloud-native tables. #69011

Data Lake Analytics

  • Iceberg DELETE Support

    Supports writing position delete files for Iceberg tables, enabling DELETE operations on Iceberg tables directly from StarRocks. The support covers the full pipeline of Plan, Sink, Commit, and Audit. #67259 #67277 #67421 #67567

  • TRUNCATE for Hive and Iceberg Tables

    Supports TRUNCATE TABLE on external Hive and Iceberg tables. #64768 #65016

  • Incremental materialized view on Iceberg

    Extends the support for incremental materialized view refresh to Iceberg append-only tables, enabling query acceleration without full table refresh. #65469 #62699

  • VARIANT Type for Semi-Structured Data in Iceberg

    Supports the VARIANT data type in Iceberg Catalog for flexible, schema-on-read storage and querying of semi-structured data. Supports read, write, type casting, and Parquet integration. #63639 #66539

  • Iceberg v3 Support

    Added support for Iceberg v3 default value feature and row lineage. #69525 #69633

  • Iceberg Table Maintenance Procedures

    Added support for rewrite_manifests procedure and extended expire_snapshots and remove_orphan_files procedures with additional arguments for finer-grained table maintenance. #68817 #68898

  • Iceberg $properties Metadata Table

    Added support for querying Iceberg table properties via the $properties metadata table. #68504

  • Supports reading file path and row position metadata columns from Iceberg tables. #67003

  • Supports reading _row_id from Iceberg v3 tables, and supports global late materialization for Iceberg v3. #62318 #64133

  • Supports creating Iceberg views with custom properties, and displays properties in SHOW CREATE VIEW output. #65938

  • Supports querying Paimon tables with a specific branch, tag, version, or timestamp. #63316

  • Supports complex types (ARRAY, MAP, STRUCT) for Paimon tables. #66784

  • Supports Paimon views. #56058

  • Supports TRUNCATE for Paimon tables. #67559

  • Supports Partition Transforms with parentheses syntax when creating Iceberg tables. #68945

  • Supports ALTER TABLE REPLACE PARTITION COLUMN for Iceberg tables. #70508

  • Supports Iceberg global shuffle based on Transform Partition for improved data organization. #70009

  • Supports dynamically enabling global shuffle for Iceberg table sink. #67442

  • Introduced a Commit queue for Iceberg table sink to avoid concurrent Commit conflicts. #68084

  • Added host-level sorting for Iceberg table sink to improve data organization and reading performance. #68121

  • Enabled additional optimizations in ETL execution mode by default, improving performance for INSERT INTO SELECT, CREATE TABLE AS SELECT, and similar batch operations without explicit configuration. #66841

  • Added commit audit information for INSERT and DELETE operations on Iceberg tables. #69198

  • Supports enabling or disabling view endpoint operations in Iceberg REST Catalog. #66083

  • Optimized cache lookup efficiency in CachingIcebergCatalog. #66388

  • Supports EXPLAIN on various Iceberg catalog types. #66563

  • Supports partition projection for tables in AWS Glue Catalog tables. #67601

  • Added resource share type support for AWS Glue GetDatabases API. #69056

  • Supports Azure ABFS/WASB path mapping with endpoint injection (azblob/adls2). #67847

  • Added a database metadata cache for JDBC catalog to reduce remote RPC overhead and impact of external system failures. #68256

  • Added schema_resolver property for JDBC catalog to support custom schema resolution. #68682

  • Supports column comments for PostgreSQL tables in information_schema. #70520

  • Improved Oracle and PostgreSQL JDBC type mapping. #70315 #70566

Query Engine

  • Recursive CTE

    Supports Recursive Common Table Expressions (CTEs) for hierarchical traversals, graph queries, and iterative SQL computations. #65932

  • Improved Skew Join v2 rewrite with statistics-based skew detection, histogram support, and NULL-skew awareness. #68680 #68886

  • Improved COUNT DISTINCT over windows and added support for fused multi-distinct aggregations. #67453

  • Supports explicit skew hint for window functions, with automatic optimization of window functions with skewed partition keys by splitting into UNION. #68739 #67944

  • Supports materialization hints for CTEs. #70802

  • Enabled Global Lazy Materialization by default, improving query performance by deferring column reads until needed. #70412

  • Supports EXPLAIN and EXPLAIN ANALYZE for INSERT statements in Trino Parser. #70174

  • Supports EXPLAIN for query queue visibility. #69933

Functions and SQL Syntax

  • Added the following functions:
    • array_top_n: Returns the top N elements from an array ranked by value. #63376
    • arrays_zip: Combines multiple arrays element-wise into an array of structs. #65556
    • json_pretty: Formats a JSON string with indentation. #66695
    • json_set: Sets a value at a specified path within a JSON string. #66193
    • initcap: Converts the first letter of each word to uppercase. #66837
    • sum_map: Sums MAP values across rows with the same key. #67482
    • current_timezone: Returns the current session timezone. #63653
    • current_warehouse: Returns the name of the current warehouse. #66401
    • sec_to_time: Converts the number of seconds to a TIME value. #62797
    • ai_query: Calls an external AI model from SQL for inference workloads. #61583
    • min_n / max_n: Aggregate functions that return the top N minimum/maximum values. #63807
    • regexp_position: Returns the position of a regular expression match in a string. #67252
    • is_json_scalar: Returns whether a JSON value is a scalar. #66050
    • get_json_scalar: Extracts a scalar value from a JSON string. #68815
    • raise_error: Raises a user-defined error in SQL expressions. #69661
    • uuid_v7: Generates time-ordered UUID v7 values. #67694
    • STRING_AGG: Syntactic sugar for GROUP_CONCAT. #64704
  • Provides the following function or syntactic extensions:
    • Supports a lambda comparator in array_sort for custom sort ordering. #66607
    • Supports USING clause for FULL OUTER JOIN with SQL-standard semantics. #65122
    • Supports DISTINCT aggregation over framed window functions with ORDER BY/PARTITION BY. #65815 #65030 #67453
    • Supports ARRAY type in lead/lag/first_value/last_value window functions. #63547
    • Supports VARBINARY for count distinct-like aggregate functions. #68442
    • Supports MULTIPLY/DIVIDE for interval operations. #68407
    • Supports date and string type casting in IN expressions. #61746
    • Supports WITH LABEL syntax for BEGIN/START TRANSACTION. #68320
    • Supports WHERE/ORDER/LIMIT clauses in SHOW statements. #68834
    • Supports ALTER TASK statements for task management. #68675
    • Supports SQL UDF creation via CREATE FUNCTION ... AS <sql_body>. #67558
    • Supports loading UDFs from S3. #64541
    • Supports named parameters in Scala functions. #66344
    • Supports multiple compression formats (GZIP/SNAPPY/ZSTD/LZ4/DEFLATE/ZLIB/BZIP2) for CSV file exports. #68054
    • Supports STRUCT_CAST_BY_NAME SQL mode for name-based struct field matching. #69845
    • Supports last_query_id() in ANALYZE PROFILE for easy query profile analysis. #64557

Management & Observability

  • Supports warehouses, cpu_weight_percent, and exclusive_cpu_weight attributes for resource groups to improve multi-warehouse CPU resource isolation. #66947
  • Introduces the information_schema.fe_threads system view to inspect the FE thread state. #65431
  • Supports SQL Digest Blacklist to block specific query patterns at the cluster level. #66499
  • Supports Arrow Flight Data Retrieval from nodes that are otherwise inaccessible due to network topology constraints. #66348
  • Introduces the REFRESH CONNECTIONS command to propagate global variable changes to existing connections without reconnecting. #64964
  • Added built-in UI functions to analyze query profiles and view formatted SQL, making query tuning more accessible. #63867
  • Implements ClusterSummaryActionV2 API endpoint to provide a structured cluster overview. #68836
  • Added a global read-only system variable @@run_mode to query the current cluster run mode (shared-data or shared-nothing). #69247
  • Enabled query_queue_v2 by default for improved query queue management. #67462
  • Supports user-level default warehouse for Stream Load and Merge Commit operations. #68106 #68616
  • Added skip_black_list session variable to bypass backend blacklist verification when needed. #67467
  • Added enable_table_metrics_collect option for the metrics API. #68691
  • Added impersonate user support for query detail HTTP API. #68674
  • Added table_query_timeout as a table-level property. #67547
  • Added FE profile logging with configurable latency threshold. #69396
  • Supports adding FE observer nodes. #67778
  • Supports Merge Commit information in information_schema.loads for better load job visibility. #67879
  • Supports showing tablet status in cloud-native tables for better troubleshooting. #69616
  • Added per-catalog-type query metrics for external catalog observability. #70533
  • Added Debian (.deb) packaging support for FE and BE. #68821

Security

Bug Fixes

The following issues have been fixed:

  • Fixed data loss after tablet split by skipping data file deletion for range distribution tablets. #71135
  • Fixed a memory leak in DefaultValueColumnIterator for complex types. #71142
  • Fixed a memory leak caused by shared_ptr cycle between BatchUnit and FetchTaskContext. #71126
  • Fixed use-after-free in parallel segment/rowset loading on error path. #71083
  • Fixed potential hash table data loss in aggregation spill set_finishing. #70851
  • Fixed double-free crash in SystemMetrics due to concurrent getline access. #71040
  • Fixed crash in SpillMemTableSink when eager merge consumes all blocks. #69046
  • Fixed NPE in visitDictionaryGetExpr when dictionary backing table is dropped. #71109
  • Fixed NPE when analyzing generated columns in Stream Load/Broker Load if a referenced column is missing. #71116
  • Fixed NPE when auto-created partition is dropped by TTL cleaner. #68257
  • Fixed NPE in IcebergCatalog.getPartitionLastUpdatedTime when snapshot is expired. #68925
  • Fixed incorrect predicate rewrite for outer join with constant-side column reference. #67072
  • Fixed PK tablet rowset meta loss caused by GC race during disk re-migration (A→B→A). #70727
  • Fixed DB read lock leak in SharedDataStorageVolumeMgr. #70987
  • Fixed error query results after modify CHAR column length in shared-data. #68808
  • Fixed MV refresh bug in the case of multiple tables. #61763
  • Fixed incorrect MV recycle time if force refreshed. #68673
  • Fixed all-null value handling bug in sync MV. #69136
  • Fixed duplicate column id error when querying MV after fast schema change ADD COLUMN. #71072
  • Fixed IVM refresh recording incomplete PCT partition metadata. #71092
  • Fixed low-cardinality rewrite NPE caused by shared DecodeInfo. #68799
  • Fixed low-cardinality join predicate type mismatch. #68568
  • Fixed Segfault in Parquet Page Index Filter when null_counts empty. #68463
  • Fixed JSON flatten array and object conflict on identical paths. #68804
  • Fixed Iceberg cache weigher inaccuracies. #69058
  • Fixed Iceberg table cache memory limit. #67769
  • Fixed Iceberg delete column nullability issue. #68649
  • Fixed Azure ABFS/WASB FileSystem cache key to include container. #68901
  • Fixed deadlock when the HMS connection pool is full. #68033
  • Fixed incorrect length for VARCHAR field type in Paimon Catalog. #68383
  • Fixed Paimon catalog refresh crash with ClassCastException on ObjectTable. #70224
  • Fixed PaimonView resolving table references against default_catalog instead of the Paimon catalog. #70217
  • Fixed FULL OUTER JOIN USING with constant subqueries. #69028
  • Fixed join on clause bug with CTE scope. #68809
  • Fixed missing partition predicate in short-circuit point lookup. #71124
  • Fixed ConnectContext memory leaks by using bindScope() pattern. #68215
  • Fixed memory leak in CatalogRecycleBin.asyncDeleteForTables for shared-nothing clusters. #68275
  • Fixed Thrift accept thread from exiting when it encounters any exception. #68644
  • Fixed UDF resolution in routine load column mappings. #68201
  • Fixed DROP FUNCTION IF EXISTS ignoring ifExists flag. #69216
  • Fixed scan result error when dict page is too large. #68258
  • Fixed range partition overlap. #68255
  • Fixed query queue allocation time and pending timeout. #65802
  • Fixed array_map crash when processing null literal array. #70629
  • Fixed stack overflow for to_base64. #70623
  • Fixed optimizer timeout issue. #70605
  • Fixed case-insensitive username normalization for LDAP authentication. #67966
  • Mitigated SSRF risk for API proc_file. #68997
  • Masked user auth strings in audit and SQL redaction. #70360

Behavior Changes

  • ETL execution mode optimizations are now enabled by default. This benefits INSERT INTO SELECT, CREATE TABLE AS SELECT, and similar batch workloads without explicit configuration changes. #66841
  • The third argument of lag/lead window functions now supports column references in addition to constant values. #60209
  • FULL OUTER JOIN USING now follows SQL-standard semantics: the USING column appears once in the output instead of twice. #65122
  • Global Lazy Materialization is now enabled by default. #70412
  • query_queue_v2 is now enabled by default. #67462
  • SQL transactions are gated behind the session variable enable_sql_transaction by default. #63535

Don't miss a new starrocks release

NewReleases is sending notifications on new releases.