github pathwaycom/pathway v0.17.0

13 hours ago

Added

  • pw.io.iceberg.read method for reading Apache Iceberg tables into Pathway.
  • methods pw.io.postgres.write and pw.io.postgres.write_snapshot now accept an additional argument init_mode, which allows initializing the table before writing.
  • pw.io.deltalake.read now supports serialization and deserialization for all Pathway data types.
  • New parser pathway.xpacks.llm.parsers.DoclingParser supporting parsing of pdfs with tables and images.
  • Output connectors now include an optional name parameter. If provided, this name will appear in logs and monitoring dashboards.
  • Automatic naming for input and output connectors has been enhanced.

Changed

  • BREAKING: pw.io.deltalake.read now requires explicit specification of primary key fields.
  • BREAKING: pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer now returns a dictionary from pw_ai_answer endpoint.
  • pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer allows optionally returning context documents from pw_ai_answer endpoint.
  • BREAKING: When using delay in temporal behavior, current time is updated immediately, not in the next batch.
  • BREAKING: The Pointer type is now serialized to Delta Tables as raw bytes.
  • pw.io.kafka.write now allows to specify key and headers for JSON and CSV data formats.
  • persistent_id parameter in connectors has been renamed to name. This new name parameter allows you to assign names to connectors, which will appear in logs and monitoring dashboards.
  • Changed names of parsers to be more consistent: ParseUnstrutured -> UnstructuredParser, ParseUtf8 -> Utf8Parser. ParseUnstrutured and ParseUtf8 are now deprecated.

Fixed

  • generate_class method in Schema now correctly renders columns of UnionType and None types.
  • a bug in delay in temporal behavior. It was possible to emit a single entry twice in a specific situation.
  • pw.io.postgres.write_snapshot now correctly handles tables that only have primary key columns.

Removed

  • BREAKING: pw.indexing.build_sorted_index, pw.indexing.retrieve_prev_next_values, pw.indexing.sort_from_index and pw.indexing.SortedIndex are removed. Sorting is now done with pw.Table.sort.
  • BREAKING: Removed deprecated methods pw.Table.unsafe_promise_same_universe_as, pw.Table.unsafe_promise_universes_are_pairwise_disjoint, pw.Table.unsafe_promise_universe_is_subset_of, pw.Table.left_join, pw.Table.right_join, pw.Table.outer_join, pw.stdlib.utils.AsyncTransformer.result.
  • BREAKING: Removed deprecated column _pw_shard in the result of windowby.
  • BREAKING: Removed deprecated functions pw.debug.parse_to_table, pw.udf_async, pw.reducers.npsum, pw.reducers.int_sum, pw.stdlib.utils.col.flatten_column.
  • BREAKING: Removed deprecated module pw.asynchronous.
  • BREAKING: Removed deprecated access to functions from pw.io in pw.
  • BREAKING: Removed deprecated classes pw.UDFSync, pw.UDFAsync.
  • BREAKING: Removed class pw.xpack.llm.parsers.OpenParse. It's functionality has been replaced with pw.xpack.llm.parsers.DoclingParser.
  • BREAKING: Removed deprecated arguments from input connectors: value_columns, primary_key, types, default_values. Schema should be used instead.

Don't miss a new pathway release

NewReleases is sending notifications on new releases.