github catboost/catboost v0.20

New submodule for text processing!
It contains two classes to help you make text features ready for training:

  • Tokenizer -- use this class to split text into tokens (automatic lowercase and punctuation removal)
  • Dictionary -- with this class you create a dictionary which maps tokens to numeric identifiers. You then use these identifiers as new features.

New features:

  • Enabled boost_from_average for MAPE loss function

Bug fixes:

  • Fixed Pool creation from pandas.DataFrame with discontinuous columns, #1079
  • Fixed standalone_evaluator, PR #1083

Speedups:

  • Huge speedup of preprocessing in python-package for datasets with many samples (>10 mln)

We also release precompiled packages for Python 3.8

latest releases: v1.0.0, v0.26.1, v0.26...
23 months ago