github unionai-oss/pandera v0.25.0
v0.25.0: 🦩 Support Ibis table validation

latest releases: v0.26.1, v0.26.0
one month ago

⭐️ Highlight

Pandera now supports Ibis 🦩! You can now validate data on all available ibis backends using the pandera.ibis module.

In-memory table example:

import ibis
import pandera.ibis as pa

class Schema(pa.DataFrameModel):
    state: str
    city: str
    price: int = pa.Field(in_range={"min_value": 5, "max_value": 20})

t = ibis.memtable(
    {
        'state': ['FL','FL','FL','CA','CA','CA'],
        'city': [
            'Orlando',
            'Miami',
            'Tampa',
            'San Francisco',
            'Los Angeles',
            'San Diego',
        ],
        'price': [8, 12, 10, 16, 20, 18],
    }
)
Schema.validate(t).execute()

Sqlite example:

con = ibis.sqlite.connect()
t = con.create_table(
    "table",
    schema=ibis.schema(dict(state="string", city="string", price="int64"))
)

con.insert(
    "table",
    obj=[
        ("FL", "Orlando", 8),
        ("FL", "Miami", 12),
        ("FL", "Tampa", 10),
        ("CA", "San Francisco", 16),
        ("CA", "Los Angeles", 20),
        ("CA", "San Diego", 18),
    ]
)

Schema.validate(t).execute()

What does this mean?

This release unlocks in database validation in some of the most widely used data platforms, including PostGres, Snowflake, BigQuery, MySQL, and more ✨. It means that you can validate data at scale, on your database/data framework of your choice, before fetching it for downstream analysis/modeling work.

Naturally, this also means that you can develop your schemas locally on a duckdb or sqlite backend and then use the same schemas in production on a remote database like postgres.

Learn more about the integration here.

What's Changed

  • Add Polars pydantic integration with format support and native JSON schema generation by @halicki in #1979
  • exclude python 3.12 and pyspark combo in ci by @cosmicBboy in #2005
  • Delete previously-added foo.txt and new_example.py by @deepyaman in #2013
  • Pin PySpark due to test failures/incompatibilities by @deepyaman in #2010
  • Temporarily pin polars due to test failure in CI by @deepyaman in #2011
  • Replace event_loop removed in pytest-asyncio 1.0 by @deepyaman in #2014
  • Fix typehint in unique_values_eq (issue #1492) by @AhmetZamanis in #2015
  • fix pyarrow string issue, fix docs failing issues by @cosmicBboy in #2026
  • bugfix: PANDERA_VALIDATION_ENABLED=False should disable validation by @cosmicBboy in #2028
  • Expect Python slice index errors after Python 3.10 by @deepyaman in #2033
  • Ibis dev by @deepyaman in #2040
  • handle dataframe-level failure cases: convert row to dict by @cosmicBboy in #2050
  • bugfix/1927 by @Jarek-Rolski in #2019
  • [🐻‍❄️ polars] Limit reported failure cases if Check.n_failure_cases is defined. by @cosmicBboy in #2055
  • [🦩 ibis] Limit reported failure cases if Check.n_failure_cases is defined. by @cosmicBboy in #2056
  • Add link to the documentation about Ibis datatypes by @deepyaman in #2057
  • Test column presence, mark other features not impl by @deepyaman in #2060
  • Run pre-commit on all files to fix linter issues by @deepyaman in #2063
  • Implement regex option and add additional checks by @deepyaman in #2061
  • Implement binary and boolean types (and test them) by @deepyaman in #2064
  • Add unit test suite for Ibis components, fix a bug by @deepyaman in #2065
  • bugfix: fix format_vectorized_error_message to properly format nested pyarrow failed cases by @AndrejIring in #2036
  • handle empty dataframes with PydanticModel: show warning by @cosmicBboy in #2066
  • bugfix/2031: Allow strict='filter' and coerce='True' at the same time for PySpark schemas by @gfilaci in #2032
  • Set validation scope for pandas run_checks methods by @amerberg in #2003
  • DataFrameSchema.update_index correctly sets title, description, and metadata by @cosmicBboy in #2067
  • [ibis 🦩] remove inplace=True in column validate call by @cosmicBboy in #2068
  • [ibis 🦩] check backend: use positional join for duckdb and polars, fix ibis DataFrameModel.validate types by @cosmicBboy in #2071

New Contributors

Full Changelog: v0.24.0...v0.25.0

Don't miss a new pandera release

NewReleases is sending notifications on new releases.