Impressive speedup of CPU training for datasets with predominantly binary features (up to 5-6x).
Speedup prediction and shap values array casting on large pools (issue #684).
We've introduced a new type of feature importances -
This type of feature importances works well in all the modes, but is especially good for ranking. It is more expensive to calculate, thus we have not made it default. But you can look at it by selecting the type of feature importance.
Now we support online statistics for categorical features in
QuerySoftMaxmode on GPU.
We've intoduced new sampling_type
MVS, which speeds up CPU training if you use it.
classes_attribute in python.
One more new option for working with categorical features is
This option can be used if your initial target values are not binary and you do regression or ranking. It is equal to 1 by default, but you can try increasing it.
Added new option
sampling_unitthat allows to switch sampling from individual objects to entire groups.
More strings are interpreted as missing values for numerical features (mostly similar to pandas' read_csv).
We've improved classification mode on CPU, there will be less cases when the training diverges.
You can also try to experiment with new
It is now possible to output evaluation results directly to
stderrin command-line CatBoost in
calcmode by specifying
--output-pathparameter argument. (PR #646). Thanks @towelenee for your contribution!
Changed defaults for
one_hot_max_sizetraining parameter for groupwise loss function training.
We've added new tutorial for GPU training on Google Colaboratory.