⭐️ Highlights
This is the release for all of you geospatial and multidimensional array-handling folks out there.
xarray support
Pandera now provides full support for xarray 🚀
Write a DatasetSchema with the object-based API
import numpy as np
import xarray as xr
import pandera.xarray as pa
schema = pa.DatasetSchema(
data_vars={
"temperature": pa.DataVar(dtype=np.float64, dims=("x", "y")),
"pressure": pa.DataVar(dtype=np.float64, dims=("x", "y")),
},
coords={"x": pa.Coordinate(dtype=np.float64)},
)
ds = xr.Dataset(
{
"temperature": (("x", "y"), np.random.rand(3, 4)),
"pressure": (("x", "y"), np.random.rand(3, 4)),
},
coords={"x": np.arange(3, dtype=np.float64)},
)
schema.validate(ds)Or a DatasetModel with the class-based API:
from pandera.typing.xarray import Coordinate
class Surface(pa.DatasetModel):
temperature: np.float64 = pa.Field(dims=("x", "y"))
pressure: np.float64 = pa.Field(dims=("x", "y"))
x: Coordinate[np.float64]
Surface.validate(ds)First-class geopandas support
This release also adds first-class GeoDataFrameSchema and GeoDataFrameModel APIs for geopandas: https://pandera.readthedocs.io/en/latest/geopandas.html
What's Changed
- Bugfix/2213: handle schema errors correctly when strict and ordered are True by @cosmicBboy in #2242
- fix column errors by @cosmicBboy in #2244
- Bugfix/polars performance warning and housekeeping by @fabianbergermann in #2254
- Fix pandas typing empty-list constructor inference (#2247) by @RedZapdos123 in #2251
- test pandas model optional union with typing alias (resolves issue: #2211) by @RedZapdos123 in #2252
- fix(mypy): type DataFrameModel field class attributes as str by @RedZapdos123 in #2255
- Use docstring_substitution for strategy and example in DataFrameModel by @terryzm1 in #2248
- Included pyspark str_matches builtin check by @aditypan in #2249
- fix io serialization for custom check errors (#2143) by @RedZapdos123 in #2253
- Fix pyspark.pandas coerce+nullable failure case reporting by @RedZapdos123 in #2256
- fix(pandas): preserve tz-aware MultiIndex level metadata by @RedZapdos123 in #2258
- fix(strategies): support exact_value in str_length generation by @RedZapdos123 in #2261
- fix(pandas): preserve nulls for nullable typed model fields with coercion by @RedZapdos123 in #2259
- fix(pandas): handle all-NaT nullable Date defaults in added columns by @RedZapdos123 in #2260
- Add support for Xarray by @cosmicBboy in #2266
- Add io serialization support for all supported backends. by @cosmicBboy in #2267
- Dataset/DataFrameModel classes can be serialized directly by @cosmicBboy in #2272
- fix(pandas): preserve DataFrameModel Index metadata in generated schema components by @RedZapdos123 in #2270
- fix(ibis): preserve failed_cases row index positions by @RedZapdos123 in #2269
- add geopandas schema/model support by @cosmicBboy in #2273
- Fix JSON serialization for tuple-labeled schema columns by @RedZapdos123 in #2268
- update type annotation for DataFrame.from_records to include Sequence[Mapping[str, Any]] by @cosmicBboy in #2274
- fix inplace mutation check_types behavior by @cosmicBboy in #2276
- fix DataFrame.from_records bug by @cosmicBboy in #2278
- Fix DataFrame.from_records parser execution with coerce=True by @RedZapdos123 in #2279
- Fix applymap deprecation warning in pandas parser backend by @RedZapdos123 in #2281
- docs: update xarray integration info for v0.31.0 by @cosmicBboy in #2283
- fix(pandas): export errors in pandera.pandas public API by @RedZapdos123 in #2287
- tests(mypy): add regression coverage for polars Column Decimal/Struct typing by @RedZapdos123 in #2286
- fix: support Spark Connect DataFrame in cache_check_obj decorator by @yurivski in #2282
- test(pyspark): add str_length regressions for issues #1311 and #1314 by @RedZapdos123 in #2288
New Contributors
- @fabianbergermann made their first contribution in #2254
- @RedZapdos123 made their first contribution in #2251
- @terryzm1 made their first contribution in #2248
- @aditypan made their first contribution in #2249
- @yurivski made their first contribution in #2282
Full Changelog: v0.30.1...v0.31.0