Uncertainty prediction
Main feature of this release is total uncertainty prediction support via virtual ensembles.
You can read the theoretical background in the preprint Uncertainty in Gradient Boosting via Ensembles from our research team.
We introduced new training parameter posterior_sampling
, that allows to estimate total uncertainty.
Setting posterior_sampling=True
implies enabling Langevin boosting, setting model_shrink_rate
to 1/(2*N)
and setting diffusion_temperature
to N
, where N
is dataset size.
CatBoost object method virtual_ensembles_predict
splits model into virtual_ensembles_count
submodels.
Calling model.virtual_ensembles_predict(.., prediction_type='TotalUncertainty')
returns mean prediction, variance (and knowledge uncertrainty for models, trained with RMSEWithUncertainty
loss function).
Calling model.virtual_ensembles_predict(.., prediction_type='VirtEnsembles')
returns virtual_ensembles_count
predictions of virtual submodels for each object.
New functionality
- Supported non-owning model deserialization for models with categorical feature counters
Speedups
- We've done lot's of speedups for sparse data loading. For example, on bosch sparse dataset preprocessing speed got 4.5x speedup while running in 28 thread setting.