catboost/catboost v0.24.1 on GitHub

Uncertainty prediction

Main feature of this release is total uncertainty prediction support via virtual ensembles.
You can read the theoretical background in the preprint Uncertainty in Gradient Boosting via Ensembles from our research team.
We introduced new training parameter posterior_sampling, that allows to estimate total uncertainty.
Setting posterior_sampling=True implies enabling Langevin boosting, setting model_shrink_rate to 1/(2*N) and setting diffusion_temperature to N, where N is dataset size.
CatBoost object method virtual_ensembles_predict splits model into virtual_ensembles_count submodels.
Calling model.virtual_ensembles_predict(.., prediction_type='TotalUncertainty') returns mean prediction, variance (and knowledge uncertrainty for models, trained with RMSEWithUncertainty loss function).
Calling model.virtual_ensembles_predict(.., prediction_type='VirtEnsembles') returns virtual_ensembles_count predictions of virtual submodels for each object.

New functionality

Supported non-owning model deserialization for models with categorical feature counters

Speedups

We've done lot's of speedups for sparse data loading. For example, on bosch sparse dataset preprocessing speed got 4.5x speedup while running in 28 thread setting.

Bugfixes:

Fixed target check for PairLogitPairwise on GPU. Issue #1217
Supported n_features_in_ attribute required for using CatBoost in sklearn pipelines. Issue #1363

catboost/catboost v0.24.1 0.24.1 on GitHub

Uncertainty prediction

New functionality

Speedups

Bugfixes:

catboost/catboost v0.24.1
0.24.1

on GitHub