github catboost/catboost v1.2
1.2

latest releases: v1.2.5, v1.2.3, v1.2.2...
12 months ago

Release 1.2

Major changes

CatBoost's build system has been switched from Ya Make (Yandex's build system) to CMake. This means more transparency in the build process and more familiar tools for Open Source developers.
For now it is possible to build CatBoost for:

  • Linux on x86-64 with or without CUDA
  • Linux on aarch64 with or without CUDA
  • macOS on x86-64 and arm64, including creating universal binaries
  • Windows on x86-64 with or without CUDA
  • Android (only model applier) on All supported ABIs.

This allowed us to prepare the Python package in the source distribution form (also known as sdist). #830

  • msvs subdirectory with the Microsoft Visual Studio solution has been removed. Visual Studio solutions can be generated using CMake instead.
  • make subdirectory with Makefiles has been removed. Use CMake + ninja (recommended) or CMake + make instead.

Python package

  • Switch to the standard Python build and installation method that uses setup.py instead of the custom mk_wheel.py script. All common scenarios (sdist, build, install, editable install, bdist_wheel) are supported.
  • Switch wheel platform tag on Linux from obsolete manylinux1 to manylinux2014.
  • The source distribution is now available on PyPI. #830
  • Wheels for Linux aarch64 are now available on PyPI. #2091
  • Support Python 3.11. #2213
  • Drop support for obsolete Python 3.6.
  • Make wheels PEP427-compliant. #2165
  • Fix wrong checksums in wheels that caused problems with poetry. #2331
  • Improved performance due to caching TBB local executors. #2203
  • Add fixed_binary_splits to the regressor, classifier, and ranker.
  • Compatibility with pandas 2.0. #2320
  • CatBoost widget is now compatible with ipywidgets 8.x. #2266

Rust package

  • Support CUDA applier. #1925, thanks to @getumen.
  • Properly forward debug/release setting to native library build.
  • Passing features: switch from String and Vec types for features to AsRef of slices to make code more generic
  • Support text and embedding features.
  • Support multidimensional output in predictions.

New features

  • [JVM applier]: Support CUDA.
  • [Spark]: Support Spark 3.4.x (if you want to use Spark with python 3.11 use this version).
  • Static model applier library now works on Windows.
  • Add binary-classification-threshold parameter to the CLI model applier.
  • Support Multi-target regression with text features (but only Bag-of-Words features are generated for now). #2229
  • Support RMSEWithUncertainty loss function on GPU.
  • Support MultiLogloss and MultiCrossEntropy loss functions with numerical features on GPU.
  • Support MultiLogloss loss function with text features on CPU and GPU. #1885
  • Enable univariate metrics for models with uncertainty
  • Add Focal loss (CPU-only for now). #1807, thanks to @diditforlulz273.

Improvements

  • Removed legacy dependency on Python 2 interpreter in the build process. #2297
  • Calc metrics: Throw catboost exception if column index exceeds column count.
  • Speedup MultiLogloss on CPU by 8% per tree (110K samples, 20 targets, 480 float features, 3 cat features, 16 cores CPU).
  • Update .NET projects from obsolete .NET Core 2.1 to .NET Core 3.1.
  • Code generation for new CUDA Compute Architectures 8.6, 8.9 and 9.0 is enabled by default (requires CUDA 11.8 to build from source).
  • Check that evaluator implementation is available in TFullModel::SetEvaluatorType (it was possible to get a Segmentation fault when calling it for non-available implementstion). Add TFullModel::GetSupportedEvaluatorTypes.
  • Cross Validation on GPU no longer requires allow_write_files=True.

Bugfixes

  • [Python-package]: Clear model params before load_model. Fixes #2225.
  • [Python-package]: Fix CatBoostRanker score computation. #2231
  • [Python-package]: Fix _get_embedding_feature_indices. #2273
  • [Python-package]: Fix set_feature_names with text or embedding features. #2090
  • [Python-package]: pandas.Categorical.categories is not necessarily a numpy.ndarray. #1965
  • [Spark]: Pass classpath in a file to avoid hitting cmdline length limits. #1842
  • [CUDA Applier]: Apply scale and bias.
  • [CUDA Applier]: Fix that libs/model_interface applier always produced an error in CUDA mode.
  • Fix CUDA error 700 in pairwise ranking.
  • Fix kernel registration for distributed training on GPU.
  • Fix `floating point exception' on CPU for small datasets on GPU.
  • Fix wrong log message 'There are invalid params and some of them will be ignored'. #2253
  • Fix incorrect results and crashes for GPU applier on Nvidia Ampere - based GPUs.
  • Fix 'CUDA error 9' in Multi-GPU training.
  • Fix serialization of embedding features structures in the model.
  • Fix GPU buffer overrun in distributed multi-classification training.
  • Fix catboost/cuda/cuda_util/sort.cpp:166: CUDA error 9 on Nvidia Ampere - based GPUs.
  • Fix inf/nan parsing in dataset input files.
  • Fix floating point exception for very small datasets on GPU.
  • Fix: built static applier library lacked the part with 'global' objects. #2187
  • Fix sum of models with categorical features with CTRs.
  • Fix: model_interface/cmake_example failed build "‘runtime_error’ is not a member of ‘std’". #2324, thanks to @Mandelag.
  • Fix Segmentation fault in Cross Validation and hyperparameter search functions that use it on GPU.
  • Fix Segmentation fault in utils.eval_metrics for groupwise metrics when group data has not been specified. #2343
  • Fix errors when running Cross Validation repeatedly on GPU. #2221

P.S. There's an issue with somewhat unexpected binary size increases. We're investingating in #2369

Don't miss a new catboost release

NewReleases is sending notifications on new releases.