- We fixed bug in CatBoost. Pool initialization from
pandas.dataframewith string values that can cause slight inconsistence while using trained model from older versions. Around 1% of cat feature hashes were treated incorrectly. If you expirience quality drop after update you should consider retraining your model.
Major Features And Improvements
- Algorithm for finding most influential training samples for a given object from the 'Finding Influential Training Samples for Gradient Boosted Decision Trees' paper is implemented. This mode for every object from input pool calculates scores for every object from train pool. A positive score means that the given train object has made a negative contribution to the given test object prediction. And vice versa for negative scores. The higher score modulo - the higher contribution.
get_object_importancemodel method in Python package and
ostrmode in cli-version. Tutorial for Python is available here.
More details and examples will be published in documentation soon.
- We have implemented new way of exploring feature importance - SHAP values from paper. This allows to understand which features are most influent for a given object. You can also get more insite about your model, see details in a tutorial.
- Save model as code functionality published. For now you could save model as Python code with categorical features and as C++ code w/o categorical features.
Bug Fixes and Other Changes
_catboostreinitialization issues #268 and #269.
- GPU parameter
gpu_cat_features_storagewith posible values
GpuRam. Default is
Thanks to our Contributors
This release contains contributions from CatBoost team.
As usual we are grateful to all who filed issues or helped resolve them, asked and answered questions.