catboost/catboost v0.8 on GitHub

Breaking changes

We fixed bug in CatBoost. Pool initialization from numpy.array and pandas.dataframe with string values that can cause slight inconsistence while using trained model from older versions. Around 1% of cat feature hashes were treated incorrectly. If you expirience quality drop after update you should consider retraining your model.

Algorithm for finding most influential training samples for a given object from the 'Finding Influential Training Samples for Gradient Boosted Decision Trees' paper is implemented. This mode for every object from input pool calculates scores for every object from train pool. A positive score means that the given train object has made a negative contribution to the given test object prediction. And vice versa for negative scores. The higher score modulo - the higher contribution.
See get_object_importance model method in Python package and ostr mode in cli-version. Tutorial for Python is available here.
More details and examples will be published in documentation soon.
We have implemented new way of exploring feature importance - SHAP values from paper. This allows to understand which features are most influent for a given object. You can also get more insite about your model, see details in a tutorial.
Save model as code functionality published. For now you could save model as Python code with categorical features and as C++ code w/o categorical features.

Fix _catboost reinitialization issues #268 and #269.
GPU parameter use_cpu_ram_for_cat_features renamed to gpu_cat_features_storage with posible values CpuPinnedMemory and GpuRam. Default is GpuRam.

This release contains contributions from CatBoost team.

As usual we are grateful to all who filed issues or helped resolve them, asked and answered questions.