[v0.3.0] - 2022-11-16
Added
- Full Complexity EBMs with higher order interactions supported: GA3M, GA4M, GA5M, etc...
3-way and higher-level interactions lose exact global interpretability, but retain exact local explanations
Higher level interactions need to be explicitly specified. No automatic FAST detection yet - Mac m1 support
- support for ordinals
- merge_ebms now supports merging models with interactions, including higher-level interactions
- added classic composition option during Differentially Private binning
- support for different kinds of feature importances (avg_weight, min_max)
- exposed interaction detection API (FAST algorithm)
- API to calculate and show the importances of groups of features and terms.
Changed
- memory efficiency: About 20x less memory is required during fitting
- predict time speed improvements. About 50x faster for Pandas CategoricalDType,
and varying levels of improvements for other data types - handling of the differential privacy DPOther bin, and non-DP unknowns has been unified by having a universal unknown bin
- bin weights have been changed from per-feature to per-term and are now multi-dimensional
- improved scikit-learn compliance: We now conform to the scikit-learn 1.0 feature names API by using
self.feature_names_in_ for the X column names and self.n_features_in_.
We use the matching self.feature_types_in_ for feature types, and self.term_names_ for the additive term names.
Fixed
- merge_ebms now distributes bin weights proportionally according to volume when splitting bins
- DP-EBMs now use sample weights instead of bin counts, which preserves privacy budget
- improved scikit-learn compliance: The following init attributes are no longer overwritten
during calls to fit: self.interactions, self.feature_names, self.feature_types - better handling of floating point overflows when calculating gain and validation metrics
Breaking Changes
- EBMUtils.merge_models function has been renamed to merge_ebms
- renamed binning type 'quantile_humanized' to 'rounded_quantile'
- feature type 'categorical' has been specialized into separate 'nominal' and 'ordinal' types
- EBM models have changed public attributes:
-
feature_groups_ -> term_features_ global_selector -> n_samples_, unique_val_counts_, and zero_val_counts_ domain_size_ -> min_target_, max_target_ additive_terms_ -> term_scores_ bagged_models_ -> BaseCoreEBM has been depricated and the only useful attribute has been moved into the main EBM class (bagged_models_.model_ -> bagged_scores_) feature_importances_ -> has been changed into the function term_importances(), which can now also generate different types of importances preprocessor_ & pair_preprocessor_ -> attributes have been moved into the main EBM model class (details below)
-
- EBMPreprocessor attributes have been moved to the main EBM model class
-
col_names_ -> feature_names_in_ col_types_ -> feature_types_in_ col_min_ -> feature_bounds_ col_max_ -> feature_bounds_ col_bin_edges_ -> bins_ col_mapping_ -> bins_ hist_counts_ -> histogram_counts_ hist_edges_ -> histogram_edges_ col_bin_counts_ -> bin_weights_ (and is now a per-term tensor)
-