Version 0.7 of Great Expectations is HUGE. It introduces several major new features
and a large number of improvements, including breaking API changes.
The core vocabulary of expectations remains consistent. Upgrading to
the new version of GE will primarily require changes to code that
uses data contexts; existing expectation suites will require only changes
to top-level names.
-
Major update of Data Contexts. Data Contexts now offer significantly
more support for building and maintaining expectation suites and
interacting with existing pipeline systems, including providing a namespace for objects.
They can handle integrating, registering, and storing validation results, and
provide a namespace for data assets, making batches first-class citizens in GE.
Read more: :ref:data_context
or :py:mod:great_expectations.data_context
-
Major refactor of autoinspect. Autoinspect is now built around a module
called "profile" which provides a class-based structure for building
expectation suites. There is no longer a default "autoinspect_func" --
calling autoinspect requires explicitly passing the desired profiler. See :ref:profiling
-
New "Compile to Docs" feature produces beautiful documentation from expectations and expectation
validation reports, helping keep teams on the same page. -
Name clarifications: we've stopped using the overloaded terms "expectations
config" and "config" and instead use "expectation suite" to refer to a
collection (or suite!) of expectations that can be used for validating a
data asset.- Expectation Suites include several top level keys that are useful
for organizing content in a data context: data_asset_name,
expectation_suite_name, and data_asset_type. When a data_asset is
validated, those keys will be placed in themeta
key of the
validation result.
- Expectation Suites include several top level keys that are useful
-
Major enhancement to the CLI tool including
init
,render
and more flexibility withvalidate
-
Added helper notebooks to make it easy to get started. Each notebook acts as a combination of
tutorial and code scaffolding, to help you quickly learn best practices by applying them to
your own data. -
Relaxed constraints on expectation parameter values, making it possible to declare many column
aggregate expectations in a way that is always "vacuously" true, such as
expect_column_values_to_be_between
None
andNone
. This makes it possible to progressively
tighten expectations while using them as the basis for profiling results and documentation. -
Enabled caching on dataset objects by default.
-
Bugfixes and improvements:
-
New expectations:
- expect_column_quantile_values_to_be_between
- expect_column_distinct_values_to_be_in_set
-
Added support for
head
method on all current backends, returning a PandasDataset -
More implemented expectations for SparkDF Dataset with optimizations
- expect_column_values_to_be_between
- expect_column_median_to_be_between
- expect_column_value_lengths_to_be_between
-
Optimized histogram fetching for SqlalchemyDataset and SparkDFDataset
-
Added cross-platform internal partition method, paving path for improved profiling
-
Fixed bug with outputstrftime not being honored in PandasDataset
-
Fixed series naming for column value counts
-
Standardized naming for expect_column_values_to_be_of_type
-
Standardized and made explicit use of sample normalization in stdev calculation
-
Added from_dataset helper
-
Internal testing improvements
-
Documentation reorganization and improvements
-
Introduce custom exceptions for more detailed error logs
-