Community Announcements
Pandera now has a discord community! Join us if you need help, want to discuss features/bugs, or help other community members 🤝
Highlights
Schema support for Dask, Koalas, Modin
Excited to announce that 0.8.0
is the first release that adds built-in support for additional dataframe types beyond Pandas: you can now use the exact same DataFrameSchema
objects or SchemaModel
classes to validate Dask, Modin, and Koalas dataframes.
import dask.dataframe as dd
import pandas as pd
import pandera as pa
from pandera.typing import dask, koalas, modin
class Schema(pa.SchemaModel):
state: Series[str]
city: Series[str]
price: Series[int] = pa.Field(in_range={"min_value": 5, "max_value": 20})
@pa.check_types
def dask_function(ddf: dask.DataFrame[Schema]) -> dask.DataFrame[Schema]:
return ddf[ddf["state"] == "CA"]
@pa.check_types
def koalas_function(df: koalas.DataFrame[Schema]) -> koalas.DataFrame[Schema]:
return df[df["state"] == "CA"]
@pa.check_types
def modin_function(df: modin.DataFrame[Schema]) -> modin.DataFrame[Schema]:
return df[df["state"] == "CA"]
And DataFramaSchema
objects will work on all dataframe types:
schema: pa.DataFrameSchema = Schema.to_schema()
schema(dask_df)
schema(modin_df)
schema(koalas_df)
Pydantic Integration
pandera.SchemaModel
s are fully compatible with pydantic:
import pandas as pd
import pandera as pa
from pandera.typing import DataFrame, Series
import pydantic
class SimpleSchema(pa.SchemaModel):
str_col: Series[str] = pa.Field(unique=True)
class PydanticModel(pydantic.BaseModel):
x: int
df: DataFrame[SimpleSchema]
valid_df = pd.DataFrame({"str_col": ["hello", "world"]})
PydanticModel(x=1, df=valid_df)
invalid_df = pd.DataFrame({"str_col": ["hello", "hello"]})
PydanticModel(x=1, df=invalid_df)
Error:
Traceback (most recent call last):
...
ValidationError: 1 validation error for PydanticModel
df
series 'str_col' contains duplicate values:
1 hello
Name: str_col, dtype: object (type=value_error)
Mypy Integration
Pandera now supports static type-linting of DataFrame
types with mypy out of the box so you can catch certain classes of errors at lint-time.
import pandera as pa
from pandera.typing import DataFrame, Series
class Schema(pa.SchemaModel):
id: Series[int]
name: Series[str]
class SchemaOut(pa.SchemaModel):
age: Series[int]
class AnotherSchema(pa.SchemaModel):
foo: Series[int]
def fn(df: DataFrame[Schema]) -> DataFrame[SchemaOut]:
return df.assign(age=30).pipe(DataFrame[SchemaOut]) # mypy okay
def fn_pipe_incorrect_type(df: DataFrame[Schema]) -> DataFrame[SchemaOut]:
return df.assign(age=30).pipe(DataFrame[AnotherSchema]) # mypy error
# error: Argument 1 to "pipe" of "NDFrame" has incompatible type "Type[DataFrame[Any]]";
# expected "Union[Callable[..., DataFrame[SchemaOut]], Tuple[Callable[..., DataFrame[SchemaOut]], str]]" [arg-type] # noqa
schema_df = DataFrame[Schema]({"id": [1], "name": ["foo"]})
pandas_df = pd.DataFrame({"id": [1], "name": ["foo"]})
fn(schema_df) # mypy okay
fn(pandas_df) # mypy error
# error: Argument 1 to "fn" has incompatible type "pandas.core.frame.DataFrame";
# expected "pandera.typing.pandas.DataFrame[Schema]" [arg-type]
Enhancements
- 735e7fe implement dataframe types (#672)
- 46dc3a2 Support mypy (#650)
- 02063c8 Add Basic Dask Support (#665)
- b7f6516 Modin support (#660)
- cdf4667 Add Pydantic support (#659)
- 12378ea Support Koalas (#658)
- 62d689d improve lazy validation performance for nullable cases (#655)
Bugfixes
- 7a98e23 bugfix: support nullable empty strategies (#638)
- 5ec4611 Fix remaining unrecognized numpy dtypes (#637)
- 96d6516 Correctly handling single string constraints (#670)
Docs Improvements
- 1860685 add pyproject.toml, update doc typos
- 3c086a9 add discord link, update readme, docs (#674)
- d75298f more detailed docstring of pandera.model_components.Field (#671)
- 96415a0 Add strictly typed pandas to readme (#649)
Testing Improvements
Internals Improvements
- fdcdb91 Reuse coerce in engines.utils (#645)
- 655dd85 remove assumption from nullable strategies (#641)
Contributors
Big shout out to the following folks for your contributions on this release 🎉🎉🎉
- @sbrugman
- @rbngz
- @jeffzi
- @bphillips-exos
- @thorben-flapo
- @tfwillems: special shout out here for contributing a good chunk of the code for the pydantic plugin #659