catboost/catboost v0.20
on GitHub

latest releases: v1.2.5, v1.2.3, v1.2.2...

4 years ago

New submodule for text processing!
It contains two classes to help you make text features ready for training:

Tokenizer -- use this class to split text into tokens (automatic lowercase and punctuation removal)
Dictionary -- with this class you create a dictionary which maps tokens to numeric identifiers. You then use these identifiers as new features.

New features:

Enabled boost_from_average for MAPE loss function

Bug fixes:

Fixed Pool creation from pandas.DataFrame with discontinuous columns, #1079
Fixed standalone_evaluator, PR #1083

Speedups:

Huge speedup of preprocessing in python-package for datasets with many samples (>10 mln)

We also release precompiled packages for Python 3.8

Check out latest releases or
releases around catboost/catboost v0.20

Don't miss a new catboost release

NewReleases is sending notifications on new releases.

Get notifications