For installation instructions check out the getting started guide.
Added
- general: PostgreSQL CDC benchmarking suite added with Kafka Connect PostgreSQL benchmark infrastructure and configuration. (@ness-david-dedu, #4216)
- parquet_encode: Added configurable timestamp unit support (nanosecond, microsecond, millisecond) to make Parquet output readable by Apache Spark, Databricks, AWS Athena, and DuckDB. (@ankit481, #4294)
- oracledb_cdc: Added
transaction_idto message metadata. (@josephwoodward, #4328) - oracledb_cdc: Added
commit_ts_msto message metadata. (@josephwoodward, #4331)
Fixed
- confluent: Avro schema reference resolution now handles arbitrary schema shapes and correctly inlines transitive references, fixing misleading errors and missing nested reference resolution. (@twmb, #4247)
- mysql_cdc: IAM token refresh and canal recreation before streaming prevents connection failures when snapshots delay binlog streaming. (@josephwoodward, #4295)
- oracledb_cdc: Oracle numeric values with missing leading zeros (e.g., '.5') are now normalized to valid JSON format for proper CDC streaming. (@josephwoodward, #4322)
Unreleased
Added
- parquet_encode: Added
default_timestamp_unitfield (valuesNANOSECOND,MICROSECOND,MILLISECOND) controlling the precision of TIMESTAMP logical types. Default remainsNANOSECONDfor backwards compatibility. UseMICROSECONDwhen writing files for Apache Spark/Databricks, AWS Athena or DuckDB, which do not supportTIMESTAMP(NANOS). (#3570) - iceberg, parquet_encode, schema_registry_encode: Added support for the new
DecimalandBigDecimalbenthos common-schema types in metadata-driven encoding. Iceberg / Parquet / Avro encoders emit native fixed-precision decimal types forDecimal; JSON Schema emits a regex-validated string.BigDecimalis rejected by the bounded-format encoders with a clear error and accepted by JSON Schema as a permissive string pattern. (@Jeffail) - schema_registry_decode: When
store_schema_metadatais set, Avro decimal logical-type values are now normalised to canonical decimal strings to match the schema metadata's value contract. (@Jeffail)
Changed
- postgresql: NUMERIC and DECIMAL columns now emit
Decimal(p, s)schema metadata when precision/scale is declared, orBigDecimalfor unparameterisednumericcolumns. Values are emitted as canonical decimal strings (right-padded to the declared scale forDecimal). Previously these columns surfaced asStringwith the raw Postgres text. (@Jeffail) - mysql_cdc: DECIMAL and NUMERIC columns now emit
Decimal(p, s)schema metadata parsed from the column's raw type, and values are normalised to canonical decimal strings. Previously these columns surfaced asStringwith the driver's native form. (@Jeffail) - microsoft_sql_server_cdc: DECIMAL and NUMERIC columns now emit
Decimal(p, s)schema metadata sourced fromsql.ColumnType.DecimalSize(), and values are normalised to canonical decimal strings. MONEY and SMALLMONEY remain typed asString, but their wire form is now a quoted canonical decimal string instead of a rawjson.Number. Previously DECIMAL/NUMERIC werejson.Numbertyped asString. (@Jeffail) - oracledb_cdc: NUMBER columns with declared precision and scale > 0 now emit
Decimal(p, s)schema metadata; NUMBER withoutDATA_PRECISIONemitsBigDecimal. Decimal values flow through as canonical decimal strings; integer-width NUMBER (precision ≤ 18, scale 0) continues to emitint64. The previousjson.Numberwrapping for NUMBER-as-String columns is gone. (@Jeffail) - mongodb_cdc:
bson.Decimal128andbsonType: "decimal"validator fields now emitBigDecimalschema metadata. Decimal values in document bodies are emitted as canonical decimal strings instead of the previous{"$numberDecimal": "..."}ExtJSON wrapper. (@Jeffail)
The full change log can be found here.