[0.20.0] - 2025-02-25
Added
- Added structure-aware chunking for
DoclingParser
. - Added
table_parsing_strategy
forDoclingParser
. - Column expressions
as_int()
,as_float()
,as_str()
, andas_bool()
now accept additional arguments,unwrap
anddefault
, to simplify null handling. - Support for python tuples in expressions.
Changed
- BREAKING: Changed the argument in
DoclingParser
fromparse_images
(bool) intoimage_parsing_strategy
(Literal["llm"] | None). - BREAKING:
doc_post_processors
argument in thepw.xpacks.llm.document_store.DocumentStore
now longer acceptspw.UDF
. - Better error messages when using
pathway spawn
with multiple workers. Now error messages are printed only from the worker experiencing the error directly.
Fixed
doc_post_processors
argument in thepw.xpacks.llm.document_store.DocumentStore
had no effect. This is now fixed.