⭐️ Highlights: Add support for narwhals backend
In this release, we're adding support for a narwhals backend that supports the polars, ibis and pyspark-sql schema APIs. To enable it, install the narwhals extra with the support framework of your choice:
pip install 'pandera[narwhals,polars]' # Polars
pip install 'pandera[narwhals,ibis]' # Ibis
pip install 'pandera[narwhals,pyspark]' # PySpark SQL
Then enable with an environment variable:
export PANDERA_USE_NARWHALS_BACKEND=TrueOr configure programmatically:
import pandera.polars as pa
# import pandera.pyspark as pa # 👈 for pyspark schemas
# import pandera.ibis as pa # 👈 for ibis schemas
# CONFIG.use_narwhals_backend is read here — not at import time above
pa.config.set_config(use_narwhals_backend=True)
schema = pa.DataFrameSchema({"name": pa.Column(str)}) # narwhals backends registered
schema.validate(df)See the docs for more information.
What's Changed
- fix: Ibis validation fails checks if column all nulls by @cosmicBboy in #2296
- fix: StringDtype validation fails for empty DataFrames by @cosmicBboy in #2295
- Clean up duplicate code and extra files from #2296 by @deepyaman in #2298
- Pyspark: Added unique_values_eq builtin check by @aditypan in #2300
- feat(narwhals): implement the new unifying backend by @deepyaman in #2223
- refactor(narwhals): make backend opt-in via config by @deepyaman in #2314
- refactor(narwhals): add per-backend test structure by @deepyaman in #2315
- [Narwhals] add docs, some code cleanup by @cosmicBboy in #2312
- fix(polars): handle PydanticModel dtype in DataFrameModel.empty() by @Dev-iL in #2313
- fix: ensure index fields have persistent metadata by @WPDOrdina in #2217
- refactor(polars): preserve input frame kind in
coerce_dtypeby @Dev-iL in #2317 - chore: delete extraneous isort hook for pre-commit by @deepyaman in #2320
- test(ibis): enable decimal datatype test after fix by @deepyaman in #2321
- fix(tests): drop deprecated axis=None from DataFrame.sum call (#2309) by @terryzm1 in #2311
- docs: clarify required extra for data synthesis strategies by @RedZapdos123 in #2304
- chore: add
Timeas a cross-engine semantic dtype by @deepyaman in #2323 - refactor: consolidate backend-specific test suites by @deepyaman in #2318
- Bugfix/2319 by @Concrete-Slab in #2324
- feat(strategies): aggregate check constraints to skip filter chains by @cosmicBboy in #2322
- Improvement/ enable raising schema error in custom check by @Jarek-Rolski in #2340
- fix(polars): stringify nested dtypes in lazy failure-case formatter by @lukew-cogapp in #2341
- docs: clarify optional missing columns by @remsky in #2338
- fix: handle SchemaErrors in check_types Union fallback (#2325) by @SAY-5 in #2326
- Fix mypy type hinting for Polars DataFrameModel field annotations by @RedZapdos123 in #2303
- Fixed issue where Pandera cannot handle metadata through Annotated types by @NGHades in #2111
- fix narwhals incompatibility by @cosmicBboy in #2342
- refactor: pydantic/polars cleanup and benchmark coverage by @Dev-iL in #2357
- Clarify ignore_na null handling in checks docs by @vanta722 in #2355
- fix(polars): avoid coercing missing required columns by @remsky in #2346
- feat(pyspark): register Narwhals backend for Spark by @deepyaman in #2339
- Address comments from full Narwhals backend review by @deepyaman in #2360
- docs(pydantic): clarify v2 string coercion semantics by @RedZapdos123 in #2348
- update narwhals docs by @cosmicBboy in #2365
- update config, add set_config by @cosmicBboy in #2366
- Harden CI dependency installation by @RedZapdos123 in #2369
- fix(extensions): reject extra positional args in register_check_method (#480) by @jbbqqf in #2347
- fix: handle mixed-timezone pandas datetime coercion by @RedZapdos123 in #2350
- fix(pandas): keep ArrowString resolution stable by @RedZapdos123 in #2353
- feat: allow
rename_columnsto work as long as final columns are unique by @AchillesTurtle in #2373 - fix(polars): don't error on optional regex column with no match by @gaoflow in #2374
- fix: reload built-in check dispatcher by its own name so a custom name= works (#2042) by @vineethsaivs in #2371
- fix(polars): report concrete column name in regex coercion errors (#2363) by @gaoflow in #2376
- feat: enable runtime narwhals backend toggling via set_config by @cosmicBboy in #2375
- fix(pandas): preserve drop_invalid_rows in schema IO by @RedZapdos123 in #2354
- docs: note regex engine differences between pandas and polars backends (#2349) by @cycsmail in #2378
New Contributors
- @WPDOrdina made their first contribution in #2217
- @Concrete-Slab made their first contribution in #2324
- @lukew-cogapp made their first contribution in #2341
- @remsky made their first contribution in #2338
- @SAY-5 made their first contribution in #2326
- @NGHades made their first contribution in #2111
- @vanta722 made their first contribution in #2355
- @jbbqqf made their first contribution in #2347
- @AchillesTurtle made their first contribution in #2373
- @gaoflow made their first contribution in #2374
- @vineethsaivs made their first contribution in #2371
- @cycsmail made their first contribution in #2378
Full Changelog: v0.31.1...v0.32.0