Summary
This release includes many new features on Pinot ingestion and connectors (e.g., support for filtering during ingestion which is configurable in table config; support for json during ingestion; proto buf input format support and a new Pinot JDBC client), query capability (e.g., a new GROOVY transform function UDF) and admin functions (a revamped Cluster Manager UI & Query Console UI). It also contains many key bug fixes. See details below.
The release was cut from the following commit:
d1b4586
and the following cherry-picks:
Notable New Features
- Allowing update on an existing instance config: PUT /instances/{instanceName} with Instance object as the pay-load (#PR4952)
- Add PinotServiceManager to start Pinot components (#PR5266)
- Support for protocol buffers input format. (#PR5293)
- Add GenericTransformFunction wrapper for simple ScalarFunctions (PR#5440)
— Adding support to invoke any scalar function via GenericTransformFunction - Add Support for SQL CASE Statement (PR#5461)
- Support distinctCountRawThetaSketch aggregation that returns serialized sketch. (PR#5465)
- Add multi-value support to SegmentDumpTool (PR#5487)
— add segment dump tool as part of the pinot-tool.sh script - Add json_format function to convert json object to string during ingestion. (PR#5492)
— Can be used to store complex objects as a json string (which can later be queries using jsonExtractScalar) - Support escaping single quote for SQL literal (PR#5501)
— This is especially useful for DistinctCountThetaSketch because it stores expression as literal
E.g. DistinctCountThetaSketch(..., 'foo=''bar''', ...) - Support expression as the left-hand side for BETWEEN and IN clause (PR#5502)
- Add a new field IngestionConfig in TableConfig
— FilterConfig: ingestion level filtering of records, based on filter function. (PR#5597)
— TransformConfig: ingestion level column transformations. This was previously introduced in Schema (FieldSpec#transformFunction), and has now been moved to TableConfig. It continues to remain under schema, but we recommend users to set it in the TableConfig starting this release (PR#5681). - Allow star-tree creation during segment load (#PR5641)
— Introduced a new boolean config enableDynamicStarTreeCreation in IndexingConfig to enable/disable star-tree creation during segment load. - Support for Pinot clients using JDBC connection (#PR5602)
- Support customized accuracy for distinctCountHLL, distinctCountHLLMV functions by adding log2m value as the second parameter in the function. (#PR5564)
—Adding cluster config: default.hyperloglog.log2m to allow user set default log2m value. - Add segment encryption on Controller based on table config (PR#5617)
- Add a constraint to the message queue for all instances in Helix, with a large default value of 100000. (PR#5631)
- Support order-by aggregations not present in SELECT (PR#5637)
— Example: "select subject from transcript group by subject order by count() desc"
This is equivalent to the following query but the return response should not contain count().
"select subject, count() from transcript group by subject order by count() desc" - Add geo support for Pinot queries (PR#5654)
— Added geo-spatial data model and geospatial functions - Cluster Manager UI & Query Console UI revamp (PR#5684 and PR#5732)
— updated cluster manage UI and added table details page and segment details page - Add Controller API to explore Zookeeper (PR#5687)
- Support BYTES type for dictinctCount and group-by (PR#5701 and PR#5708)
—Add BYTES type support toDistinctCountAggregationFunction
—Correctly handle BYTES type inDictionaryBasedAggregationOperator
for DistinctCount - Support for ingestion job spec in JSON format (#PR5729)
- Improvements to RealtimeProvisioningHelper command (#PR5737)
— Improved docs related to ingestion and plugins - Added GROOVY transform function UDF (#PR5748)
— Ability to run a groovy script in the query as a UDF. e.g. string concatenation:
SELECT GROOVY('{"returnType": "INT", "isSingleValue": true}', 'arg0 + " " + arg1', columnA, columnB) FROM myTable
Special notes
- Changed the stream and metadata interface (PR#5542)
— This PR concludes the work for the issue #5359 to extend offset support for other streams - TransformConfig: ingestion level column transformations. This was previously introduced in Schema (FieldSpec#transformFunction), and has now been moved to TableConfig. It continues to remain under schema, but we recommend users to set it in the TableConfig starting this release (PR#5681).
- Config key enable.case.insensitive.pql in Helix cluster config is deprecated, and replaced with enable.case.insensitive. (#PR5546)
- Change default segment load mode to MMAP. (PR#5539)
—The load mode for segments currently defaults toheap
.
Major Bug fixes
- Fix bug in distinctCountRawHLL on SQL path (#5494)
- Fix backward incompatibility for existing stream implementations (#5549)
- Fix backward incompatibility in StreamFactoryConsumerProvider (#5557)
- Fix logic in isLiteralOnlyExpression. (#5611)
- Fix double memory allocation during operator setup (#5619)
- Allow segment download url in Zookeeper to be deep store uri instead of hardcoded controller uri (#5639)
- Fix a backward compatible issue of converting BrokerRequest to QueryContext when querying from Presto segment splits (#5676)
- Fix the issue that PinotSegmentToAvroConverter does not handle BYTES data type. (#5789)
Backward Incompatible Changes
- PQL queries with HAVING clause will no longer be accepted for the following reasons: (#PR5570)
— HAVING clause does not apply to PQL GROUP-BY semantic where each aggregation column is ordered individually
— The current behavior can produce inaccurate results without any notice
— HAVING support will be added for SQL queries in the next release - Because of the standardization of the DistinctCountThetaSketch predicate strings, please upgrade Broker before Server. The new Broker can handle both standard and non-standard predicate strings for backward-compatibility. (#PR5613)