Major features and updates
Index SPI
Add the ability to include new index types at runtime in Apache Pinot. This opens the ability of adding third party indexes, including proprietary indexes. More details here
Null value support for pinot queries
NULL support for ORDER BY, DISTINCT, GROUP BY, and value transform functions.
Upsert enhancements
Delete support in upsert enabled tables (#10703)
Support added to extend upserts and allow deleting records from a realtime table. The design details can be found here.
Preload segments with upsert snapshots to speedup table loading (#11020)
Adds a feature to preload segments from table that uses the upsert snapshot feature. The segments with validDocIds snapshots can be preloaded in a more efficient manner to speed up the table loading (thus server restarts).
TTL configs for upsert primary keys (#10915)
Adds support for specifying expiry TTL for upsert primary key metadata cleanup.
Segment compaction for upsert real-time tables (#10463)
Adds a new minion task to compact segments belonging to a real-time table with upserts.
Pinot Spark Connector for Spark3 (#10394)
Added spark3 support for Pinot Spark Connector. Also added support to pass pinot query options to spark connector.
Query functions enhnacements
- Add PercentileKLL aggregation function (#10643)
- Support for ARG_MIN and ARG_MAX Functions (#10636)
- Integer Tuple Sketch support (#10427)
- Adding vector scalar functions (#11222)
- Realtime pre-aggregation for Distinct Count HLL & Big Decimal (#10926)
- [feature] multi-value datetime transform variants #10841
- Add clpDecode transform function for decoding CLP-encoded fields. #10885
- FUNNEL_COUNT Aggregation Function (#10867)
- [multistage] Add support for RANK and DENSE_RANK ranking window functions (#10700)
- add theta sketch scalar (#11153)
PinotBufferFactory and PinotDataBuffer pluggability (#10528)
Support for extending existing PinotDataBuffer interface and adding two new implementations that uses Unsafe
APIs.
Tier level index config override (#10553)
Allows overriding index configs at tier level, allowing for more flexible index configurations for different tiers.
Config to use customized broker query thread pool (#10614)
Added new configuration options below which allow use of a bounded thread pool and allocate capacities for it.
pinot.broker.enable.bounded.http.async.executor
pinot.broker.http.async.executor.max.pool.size
pinot.broker.http.async.executor.core.pool.size
pinot.broker.http.async.executor.queue.size
This feature allows better management of broker resources.
Kinesis stream header extraction (#9713)
Kinesis tables can now opt to extract record headers and populate them as table columns. To enable header extraction, set metadata.populate
to true. The table schema should include the metadata columns in order to access it as a regular Pinot table column.
UI enhancements
Adds persistance of authentication details in the browser session. This means that even if you refresh the app, you will still be logged in until the authentication session expires (#10389)
AuthProvider logic updated to decode the access token and extract user name and email. This information will now be available in the app for features to consume. (#10925)
Pinot docker image improvements and enhancements
- Make Pinot base build and runtime images support Amazon Corretto and MS OpenJDK (#10422)
- Support multi-arch pinot docker image (#10429)
- Update dockerfile with recent jdk distro changes (#10963)
Rebalance status API (#10359)
Adds a controller API to be able to check the status of a table rebalance operation.
Multistage engine enhancements
// TODO
- [multistage] Fix Predicate Pushdown by Using Rule Collection (#10409)
- try fixing mailbox cancel race condition (#10432)
- [multistage] Implement ordering for SortExchange #10408
- [multistage] Catch Throwable To Propagate Proper Error Message (#10438)
- [multistage] Support array type for select query (#10434)
- [Multistage] Pushdown explain plan queries from controller to broker (#10505)
- [multistage] Initial (phase 1) Query runtime for window functions with ORDER BY within the OVER() clause (#10449)
- [multistage] fix tenant detection issues (#10546)
- Turn on v2 engine by default (#10543)
- [multistage] Split MailboxReceiveOperator into sorted and non-sorted versions (#10570)
- [multistage] Add support for the ranking ROW_NUMBER() window function (#10527)
- [multistage] Table level Access Validation, QPS Quota, Phase Metrics for multistage queries (#10534)
- [multistage] Handle Integer.MIN_VALUE in hashCode based FieldSelectionKeySelector (#10596)
- [multistage] populate queryOption down to leaf (#10626)
- [multistage] Run ExpandSearch Rule After All Filter Pushdown Rules (#10627)
- [multistage] improve error message in case of non-existent table queried from controller (#10599)
- [multistage] Make Intermediate Stage Worker Assignment Tenant Aware (#10617)
- [multi-stage] Support SetOperations(UNION/INTERSECT/MINUS) compilation in query planner (#10535)
- [multistage] Add Callbacks for Complete Events (#10564)
- global index virtualID (#10665)
- Enhance mailbox receive operator (#10669)
- [multistage] Add some additional planner tests for ROW_NUMBER() window function (#10684)
- [multistage] Modify empty LogicalProject for window functions to have a literal (#10635)
- [multistage] Add support for the ranking ROW_NUMBER() window function (#10587)
Drop resutls support
Adds a parameter to queryOptions to drop the resultTable from the response. This mode can be used to troubleshoot a customer's query (which may have sensitive data in the result) using metadata only.
Full list of features added
- Allow queries on multiple tables of same tenant to be executed from controller UI #10336
- Encapsulate changes in IndexLoadingConfig and SegmentGeneratorConfig #10352
- [Index SPI] IndexType (#10191)
- Simplify filtered aggregate transform operator creation (#10410)
- Introduce BaseProjectOperator and ValueBlock (#10405)
- Add support to create realtime segment in local (#10433)
- Refactor: Pass context instead on individual arguments to operator (#10413)
- Add "processAll" mode for MergeRollupTask (#10387)
- Upgrade h2 version from 1.x to 2.x (#10456)
- Added optional force param to the table configs update API (#10441)
- Enhance broker reduce to handle different column names from server response #10454
- Adding fields to enable/disable dictionary optimization. (#10484)
- Remove converted H2 type NUMERIC(200, 100) from BIG_DECIMAL (#10483)
- Add JOIN support to PinotQuery #10421
- Add testng on verifier (#10491)
- Clean up temp consuming segment files during server start (#10489)
- make pinot k8s sts and deployment start command configurable (#10509)
- Fix Bottleneck for Server Bootstrap by Making maxConnsPerRoute Configurable (#10487)
- Type match between resultType and function's dataType (#10472)
- create segment zk metadata cache (#10455)
- Allow ValueBlock length to increase in TransformFunction (#10515)
- Allow configuring helix timeouts for EV dropped in Instance manager (#10510)
- Enhance error reporting (#10531)
- Combine "GET /segments" API & "GET /segments/{tableName}/select" (#10412)
- Exposed the CSV header map as part of CSVRecordReader (#10542)
- Moving Zk updates for reload,force_commit to their own Znodes which will spread out Zk write load across jobTypes (#10451)
- Enabling dictionary override optimization on the segment reload path as well. (#10557)
- Make broker's rest resource packages configurable (#10588)
- Check EV not exist before allowing creating the table (#10593)
- Adding an parameter (toSegments) to the endSegmentReplacement API (#10630)
- update target tier for segments if tierConfigs is provided (#10642)
- Add support for custom compression factor for Percentile TDigest aggregation functions (#10649)
- Utility to convert table config into updated format (#10623)
- Segment lifecycle event listener support #10536
- Add server metrics to capture gRPC activity (#10678)
- API to expose the contract/rules imposed by pinot on tableConfig #10655
- Add description field to metrics in Pinot (#10744)
changing the dedup store to become pluggable #10639 - Make the TimeUnit in the DATETRUNC function case insensitive. (#10750)
- [feature] Consider tierConfigs when assigning new offline segment #10746
- Compress idealstate according to estimated size #10766
- 10689: Update for pinot helm release version 0.2.7 (#10723)
- Update the pinot tenants tables api to support returning broker tagged tables (#11184)
- Fail the query if a filter's rhs contains NULL. (#11188)
- Support Off Heap for Native Text Indices #10842
- refine segment reload executor to avoid creating threads unbounded #10837
- compress nullvector bitmap upon seal (#10852)
- Enable case insensitivity by default (#10771)
- Push out-of-order events metrics for full upsert (#10944)
- [feature] add requestId for BrokerResponse in pinot-broker and java-client #10943
- Provide results in CompletableFuture for java clients and expose metrics #10326
- Add minion observability for segment upload/download failures (#10978)
- Enhance early terminate for combine operator (#10988)
- Add fromController method that accepts a PinotClientTransport (#11013)
- Add CLPDecodeRewriter to make it easier to call clpDecode with a column-group name rather than the individual columns. (#11006)
- Ensure min/max value generation in the segment metadata. (#10891)
- Apply some allocation optimizations on GrpcSendingMailbox (#11015)
- When enable case-insensitive, don't allow to add newly column name which have the same lowercase name with existed columns. (#10991)
- Replace Long attributes with primitive values to reduce boxing (#11059)
- retry KafkaConsumer creation in KafkaPartitionLevelConnectionHandler.java (#253) (#11040)
- Support for new dataTime format in
DateTimeGranularitySpec
without explicitly setting size (#11057) - Add a new controller endpoint for segment deletion with a time window (#10758)
- Use PUT request to enable/disable table/instance (#11109)
- Returning 403 status code in case of authorization failures (#11136)
- Simplify compatible test to avoid test against itself (#11163)
- Instance retag validation check api (#11077)
- Updated code for setting value of segment min/max property. (#10990)
- Add stat to track number of segments that have valid doc id snapshots (#11110)
- Add brokerId and brokerReduceTimeMs to the broker response stats (#11142)
- safely multiply integers to prevent overflow (#11186)
- Move largest comparison value update logic out of map access (#11157)
- Optimize DimensionTableDataManager to abort unnecesarry loading (#11192)
- Refine isNullsLast and isAsc functions. (#11199)
Vulnerability, bugfixes, cleanups and deprecations
- Fix JDBC driver check for username (#10416)
- [Clean up] Remove getColumnName() from AggregationFunction interface (#10431)
- fix jersey TerminalWriterInterceptor MessageBodyWriter not found issue (#10462)
- Bug fix: Start counting operator execution time from first NoOp block (#10450)
- Fix unavailable instances issues for StrictReplicaGroup #10466
- Change shell to bash (#10469)
- Fix the double destroy of segment data manager during server shutdown (#10475)
- Remove "isSorted()" precondition check in the ForwardIndexHandler (#10476)
- Fix null handling in streaming selection operator #10453
- Fix jackson dependencies (#10477)
- optimize queries where lhs and rhs of predicate are equal (#10444)
- Trivial fix on a warning detected by static checker (#10492)
- wait for full segment commit protocol on force commit (#10479)
- Fix bug and add test for noDict -> Dict conversion for sorted column (#10497)
- Make column order deterministic in segment (#10468)
- Type match between resultType and function's dataType (#10472)
- Allow empty segmentsTo for segment replacement protocol (#10511)
- Use string as default compatible type for coalesce (#10516)
- Use threadlocal variable for genericRow to make the MemoryOptimizedTable threadsafe (#10502)
- Fix shading in spark2 connector pom file (#10490)
- Fix ramping delay caused by long lasting sequence of unfiltered messa… (#10418)
- Do not serialize metrics in each Operator (#10473)
- Make pinot-controller apply webpack production mode when bin-dist profile is used. (#10525)
- Fix FS props handling when using /ingestFromUri (#10480)
- Clean up v0_deprecated batch ingestion jobs (#10532)
- Deprecate kafka 0.9 support (#10522)
- Reduce timeout for codecov and not fail the job in any case (#10547)
- Fix DataTableV3 serde bug for empty array (#10583)
- Do not record operator stats when tracing is enabled (#10447)
- Forward auth token for logger APIs from controller to other controllers and brokers (#10590)
- Bug fix: Partial upsert default strategy is null (#10610)
- Fix flaky test caused by EV check during table creation (#10616)
- Fix withDissabledTrue typo (#10624)
- Cleanup unnecessary mailbox id ser/de (#10629)
- no error metric for queries where all segments are pruned (#10589)
- bug fix: to keep QueryParser thread safe when handling many read requests on class RealtimeLuceneTextIndex (#10620)
- Fix static DictionaryIndexConfig.DEFAULT_OFFHEAP being actually onheap (#10632)
- 10567: [cleanup pinot-integration-test-base], clean query generations and some other refactoring. (#10648)
- Fixes backward incompatability with SegmentGenerationJobSpec for segment push job runners (#10645)
- Bug fix to get the toSegments list correctly (#10659)
- 10661: Fix for failing numeric comparison in where clause for IllegalStateException. (#10662)
- Fixes partial upsert not reflecting multiple comparison column values (#10693)
- Fix Bug in Reporting Timer Value for Min Consuming Freshness (#10690)
- Fix typo of rowSize -> columnSize (#10699)
- update segment target tier before table rebalance (#10695)
- Fix a bug in star-tree filter operator which can incorrecly filter documents (#10707)
- Enhance the instrumentation for a corner case where the query doesn't go through DocIdSetOp (#10729)
- bug fix: add missing properties when edit instance config (#10741)
- Fix githubEvents table for quickstart recipes (#10716)
- Minor Realtime Segment Commit Upload Improvements (#10725)
- Return 503 for all interrupted queries. Refactor the query killing code. (#10683)
- Add decoder initialization error to the server's error cache (#10773)
- bug fix: add @JsonProperty to SegmentAssignmentConfig (#10759)
- ensure we wait the full no query timeout before shutting down (#10784)
- Clean up KLL functions with deprecated convention (#10795)
- Redefine the semantics of SEGMENT_STREAMED_DOWNLOAD_UNTAR_FAILURES metric to count individual segment fetch failures. (#10777)
- fix excpetion during exchange routing causes stucked pipeline (#10802)
- [bugfix] fix floating point and integral type backward incompatible issue (#10650)
- [pinot-core] Start consumption after creating segment data manager (#11227)
- Fix IndexOutOfBoundException in filtered aggregation group-by (#11231)
- Fix null pointer exception in segment debug endpoint #11228
- Clean up RangeIndexBasedFilterOperator. (#11219)
- Fix the escape/unescape issue for property value in metadata (#11223)
- Fix a bug in the order by comparator (#10818)
- Keeps nullness attributes of merged in comparison column values (#10704)
- Add required JSON annotation in H3IndexResolution (#10792)
- Fix a bug in SELECT DISTINCT ORDER BY. (#10827)
- jsonPathString should return null instead of string literal "null" (#10855)
- Bug Fix: Segment Purger cannot purge old segments after schema evolution (#10869)
- Fix #10713 by giving metainfo more priority than config (#10851)
- Close PinotFS after Data Manager Shutdowns (#10888)
- bump awssdk version for a bugfix on http conn leakage (#10898)
- Fix MultiNodesOfflineClusterIntegrationTest.testServerHardFailure() (#10909)
- Fix a bug in SELECT DISTINCT ORDER BY LIMIT. (#10887)
- Fix an integer overflow bug. (#10940)
- Return true when _resultSet is not null (#10899)
- Fixing table name extraction for lateral join queries (#10933)
- Fix casting when prefetching mmap'd segment larger than 2GB (#10936)
- Null check before closing reader (#10954)
- Fixes SQL wildcard escaping in LIKE queries (#10897)
- [Clean up] Do not count DISTINCT as aggregation (#10985)
- do not readd lucene readers to queue if segment is destroyed #10989
- Message batch ingestion lag fix (#10983)
- Fix a typo in snapshot lock (#11007)
- When extracting root-level field name for complex type handling, use the whole delimiter (#11005)
- update jersey to fix Denial of Service (DoS) (#11021)
- Update getTenantInstances call for controller and separate POST operations on it (#10993)
- update freemaker to fix Server-side Template Injection (#11019)
- format double 0 properly to compare with h2 results (#11049)
- Fix double-checked locking in ConnectionFactory (#11014)
- Remove presto-pinot-driver and pinot-java-client-jdk8 module (#11051)
- Make RequestUtils always return a string array when getTableNames (#11069)
- Fix BOOL_AND and BOOL_OR result type (#11033)
- [cleanup] Consolidate some query and controller/broker methods in integration tests (#11064)
- Fix grpc regression on multi-stage engine (#11086)
- Delete an obsolete TODO. (#11080)
- Minor fix on AddTableCommand.toString() (#11082)
- Allow using Lucene text indexes on mutable MV columns. (#11093)
- Allow offloading multiple segments from same table in parallel (#11107)
- Added serviceAccount to minion-stateless (#11095)
- Bug fix: TableUpsertMetadataManager is null (#11129)
- Fix reload bug (#11131)
- Allow extra aggregation types in RealtimeToOfflineSegmentsTask (#10982)
- Fix a bug when use range index to solve EQ predicate (#11146)
- Sanitise API inputs used as file path variables (#11132)
- Fix NPE when nested query doesn't have gapfill (#11155)
- Fix the NPE when query response error stream is null (#11154)
- Make interface methods non private, for java 8 compatibility (#11164)
- Increment nextDocId even if geo indexing fails (#11158)
- Fix the issue of consuming segment entering ERROR state due to stream connection errors (#11166)
- In TableRebalancer, remove instance partitions only when reassigning instances (#11169)
- Remove JDK 8 unsupported code (#11176)
- Fix compat test by adding -am flag to build pinot-integration-tests (#11181)
- dont duplicate register scalar function in CalciteSchema (#11190)
- Fix the storage quota check for metadata push (#11193)
- Delete filtering NULL support dead code paths. (#11198)