Main feature of this release is total uncertainty prediction support via virtual ensembles.
You can read the theoretical background in the preprint Uncertainty in Gradient Boosting via Ensembles from our research team.
We introduced new training parameter
posterior_sampling, that allows to estimate total uncertainty.
posterior_sampling=True implies enabling Langevin boosting, setting
1/(2*N) and setting
N is dataset size.
CatBoost object method
virtual_ensembles_predict splits model into
model.virtual_ensembles_predict(.., prediction_type='TotalUncertainty') returns mean prediction, variance (and knowledge uncertrainty for models, trained with
RMSEWithUncertainty loss function).
model.virtual_ensembles_predict(.., prediction_type='VirtEnsembles') returns
virtual_ensembles_count predictions of virtual submodels for each object.
Supported non-owning model deserialization for models with categorical feature counters
We've done lot's of speedups for sparse data loading. For example, on bosch sparse dataset preprocessing speed got 4.5x speedup while running in 28 thread setting.
Fixed target check for PairLogitPairwise on GPU. Issue #1217
n_features_in_attribute required for using CatBoost in sklearn pipelines. Issue #1363