github microsoft/LightGBM v4.0.0

latest releases: v4.5.0, stable, v4.4.0...
14 months ago

Changes

This release contains all previously-unreleased changes since v3.3.1 over 1.5 years ago (link).

Summary of improvements:

  • totally-rewritten CUDA implementation, and more operations in the CUDA implementation performed on the GPU
  • quantized training can be used for greatly improved training speeds on CPU (paper link)
  • support for C++17
  • Python package:
    • now uses scikit-build-core (link) as its build backend
    • manylinux_2_28 Linux wheels now support GPU (OpenCL-based, not CUDA) build automatically... just pip install lightgbm then pass {"device": "gpu"} in params (thanks @jgiannuzzi!)
    • much more use of inline type hints, exported with py.typed so any code using LightGBM can benefit
    • support for Python 3.10, 3.11
    • support for pandas nullable types
    • configurable threshold (lgb.early_stopping(..., min_delta=n)) for how much eval metrics must improve to be considered "improved" for early stopping
    • custom objective functions in Dask
    • scikit-learn is no longer a required dependency
    • all callbacks are now pickleable (for better interoperability with e.g. ray, Dask) (thanks @Yard1!)
  • R package:
    • efficient support for more data types in prediction, like dgCMatrix and dsparseMatrix (thanks @david-cortes!)
    • much more idomatic interface... e.g. support for saveRDS() and readRDS() for Booster, print() and summary() methods for Dataset (thanks @david-cortes!)
    • various bug fixes related to multiple competing ways to provide parameters
    • support for R 4.2, 4.3

Summary of breaking changes:

  • Python package:
    • dropped most testing, promise of support for Python 3.6 (although it should still technically be installable)
    • dropped support for macOS Mojave (10.14)
    • made many functions and class attributes private, including significantly reducing what is pulled in by from lightgbm import *
    • removed setup.py, pip install --install-optiion supporrt
    • remove support for pip install --install-option (to work with newer pip, see pypa/pip#11358)
    • dropped support for installation with MSBUild.exe ... that now requires compiling lib_lightgbm.dll separately and then building a wheel that bundles it
  • R package:
    • dropped support for Solaris
    • removed most support for passing parameters through ...
    • removed lgb.unloader()
    • switched to predict(newdata, type = ...) in predict(), for consistency with base R and most other machine learning projects

💡 New Features

🔨 Breaking

  • [python-package] make Booster and Dataset 'handle' attributes private (fixes #5313) @jameslamb (#5947)
  • [python-package] remove hard dependency on 'scikit-learn', fix minimal runtime dependencies @jameslamb (#5942)
  • [python-package] [ci] switch to PEP 517 / 518 builds (remove setup.py) (fixes #5061) @jameslamb (#5759)
  • [ci] [python-package] replace 'python setup.py' with a shell script @jameslamb (#5837)
  • [R-package] use C++17 in the CRAN package @jameslamb (#5690)
  • [python-package] make some Booster and Dataset attributes private @jameslamb (#5723)
  • [CUDA] consolidate CUDA versions @jameslamb (#5677)
  • [python-package] make public API members explicit with module-level all variables @jameslamb (#5655)
  • [ci] migrate CI from macOS 10.15 to 11 (fixes #5391) @StrikerRUS (#5396)
  • [ci] switch to manylinux_2_28 for Linux artifacts (fixes #5514, fixes #5589) @jameslamb (#5580)
  • fix: Adjust LGBM_DatasetCreateFromSampledColumn to handle distributed data @svotaw (#5344)
  • [python-package] allow custom weighing in fobj for scikit-learn API (closes #5027) @jmoralez (#5211)
  • [R-package] Use type argument to control prediction types @david-cortes (#5133)
  • [python-package] Use scikit-learn interpretation of negative n_jobs and change default to number of cores @david-cortes (#5105)
  • [python-package] remove Booster.set_attr() and Booster.attr() @jameslamb (#5272)
  • remove support for Solaris (fixes #5216) @jameslamb (#5226)
  • [R-package] stop automatically calculating eval metrics on training data in lightgbm() @jameslamb (#5209)
  • [R-package] remove lgb.unloader() @jameslamb (#5204)
  • [python-package] remove 'fobj' in favor of passing custom objective function in params @TremaMiguel (#5052)
  • [python-package] remove is_reshape argument in Booster.predict (fixes #5115) @jmoralez (#5117)
  • [R-package] Remove reshape argument in predict @david-cortes (#4971)
  • [R-package] Promote number of threads to top-level argument in lightgbm() and change default to number of cores @david-cortes (#4972)
  • [R-package] Rename data -> newdata in predict @david-cortes (#4973)
  • Build Windows artifacts in windows-2019 image instead of vs2017-win2016 @StrikerRUS (#5059)
  • [python-package] use 2d collections for predictions, grads and hess in multiclass custom objective @jmoralez (#4925)
  • [R-package] prefer params to keyword argument in lgb.train() @jameslamb (#5007)
  • [R-package] remove behavior where lightgbm() saves model to disk @david-cortes (#4974)
  • [python-package] make record_evaluation compatible with cv (fixes #4943) @jmoralez (#4947)
  • [python] remove early_stopping_rounds argument of train() and cv() functions @StrikerRUS (#4908)
  • [python] remove evals_result argument of train() function @StrikerRUS (#4882)
  • [python][sklearn] do not replace empty dict with None for evals_result_ @StrikerRUS (#4884)
  • [python] Drop Python 3.6 support @StrikerRUS (#4891)
  • [python] remove verbose_eval argument of train() and cv() functions @StrikerRUS (#4878)
  • [python] remove verbose argument of model_from_string() method of Booster class @StrikerRUS (#4877)
  • [python][sklearn] Remove early_stopping_rounds argument of fit() method @StrikerRUS (#4846)
  • [R-package] remove support for '...' in slice() @jameslamb (#4872)
  • [R-package] remove support for '...' in lgb.Dataset() @jameslamb (#4874)
  • [R-package] remove support for '...' in dim.lgb.Dataset() @jameslamb (#4873)
  • [R-package] remove support for '...' in lgb.train() @jameslamb (#4863)
  • [R-package] remove support for '...' in create_valid() @jameslamb (#4865)
  • [R-package] remove support for 'info' in Dataset @jameslamb (#4866)
  • [R-package] remove Dataset getinfo() @jameslamb (#4864)
  • [R-package] remove support for '...' in lgb.cv() @jameslamb (#4860)
  • [R-package] remove Dataset setinfo() @jameslamb (#4854)
  • [R-package] remove support for '...' in predict() @jameslamb (#4857)
  • [R-package] remove support for '...' in Booster reset_parameter() @jameslamb (#4856)
  • [python][sklearn] unify values of best_iteration for sklearn and standard APIs @StrikerRUS (#4845)
  • [ci] migrate CI from macOS 10.14 to 10.15 and drop support of Mojave @StrikerRUS (#4849)
  • [R-package] enable saving Booster with saveRDS() and loading it with readRDS() (fixes #4296) @david-cortes (#4685)
  • [python][sklearn] remove verbose argument from fit() method @StrikerRUS (#4832)
  • [python] remove learning_rates argument of train() function @StrikerRUS (#4831)
  • [python] remove "auto" value of ylabel argument of plot_metric() function @StrikerRUS (#4818)
  • [python] Remove print_evaluation() function @StrikerRUS (#4819)
  • [python] Remove silent argument @StrikerRUS (#4800)

🚀 Efficiency Improvement

  • Add quantized training (CPU part) @shiyu1994 (#5800)
  • [python-package] replace .values usage with .to_numpy() @superlaut (#5612)
  • clear memory allocated for sampled data when constructing Dataset from text file @xuchuanyin (#4890)
  • [python] Faster categorical column names selection @Neronuser (#4787)
  • [R-package] parallelize compilation in CMake-based builds @jameslamb (#4525)
  • [python-package] simplify Dataset processing of label @jameslamb (#5456)
  • [python-package] make a shallow copy on dataframe rename (fixes #4596) @jmoralez (#5254)
  • [python-package] make a shallow copy when replacing categorical features with codes (fixes #4596) @jmoralez (#5225)
  • [R-package] reduce cost of repeated parameter alias checks @jameslamb (#5141)
  • reduce duplicate computation in poisson, gamma, and tweedie objectives @lorentzenchr (#4950)

🐛 Bug Fixes

  • move LightGBM-vendored json11 into a LightGBM-specific namespace (fixes #5944) @maskedcoder1337 (#5946)
  • [dask] hold ports until training @jmoralez (#5890)
  • update MSBuild solution to Windows SDK v10.0, add inet_pton define (fixes #5856) @jameslamb (#5884)
  • Fix DEBUG-mode GPU builds @GinkoBalboa (#5778)
  • cast data_index as size_t in cuda_row_data to avoid integer overflow @SiNZeRo (#5706)
  • [ci] [R-package] fix clang 15 warning about unqualified calls (fixes #5661) @jameslamb (#5662)
  • Check feature indexes in forced split file (fixes #5517) @btrotta (#5653)
  • fix feature index in Dataset::AddFeaturesFrom (fixes #5410) @jameslamb (#5650)
  • Check feature indexes in forced split file (fixes #5517) @btrotta (#5653)
  • fix feature index in Dataset::AddFeaturesFrom (fixes #5410) @jameslamb (#5650)
  • [ci] [python-package] fix missing import, test that lightgbm can be imported with only required dependencies (fixes #5631) @jameslamb (#5632)
  • Fix OpenMP thread allocation in Linux @svotaw (#5551)
  • [R-package] correctly quote paths on Windows for CMake-based builds @jameslamb (#5607)
  • [ci] [python-package] correct tag on x86_64 wheels @jameslamb (#5598)
  • [tests][dask] fix workers without data test (fixes #5537) @jmoralez (#5544)
  • prefer 'vsnprintf' to 'vsprintf' @jameslamb (#5561)
  • [ci] fix R-package CI jobs and compatibility with OpenMP 15+ (fixes #5549, #5562) @jameslamb (#5563)
  • include parameters from reference dataset on subset (fixes #5402) @jmoralez (#5416)
  • [python-package] ignore training set on early stopping callback (fixes #5354) @jmoralez (#5412)
  • [fix] change the destructor of ScoreUpdater to virtual (fixes #5400) @shiyu1994 (#5403)
  • Add default definition for GetColWiseData and GetColWiseData @shiyu1994 (#5413)
  • Fix potential overflow in linear trees @StrikerRUS (#5395)
  • Use double precision in threaded calculation of linear tree coefficients (fixes #5226) @btrotta (#5368)
  • [R-package] raise an informative error when custom objective produces incorrect output (fixes #5323) @jmoralez (#5329)
  • Clear split info buffer in cost efficient gradient boosting before every iteration (fix partially #3679) @shiyu1994 (#5164)
  • [c++][fix] check nullable of bin mappers in dataset_loader.cpp (fix #5221) @shiyu1994 (#5258)
  • [python] Fix training on subset constructed without params @StrikerRUS (#5213)
  • Check existence of inet_pton for win32 in CMakeLists.txt (fixes #5019) @shiyu1994 (#5159)
  • Fix potential overflow "Multiplication result converted to larger type" @StrikerRUS (#5189)
  • [R-package] ensure that callbacks respect verbosity from params @jameslamb (#5199)
  • fix precision lost in tree's ToIfElse @Grass-CLP (#5187)
  • fix some wrong format specifiers @StrikerRUS (#5190)
  • [R-package] allow use of categorical_features in Dataset when raw data does not have column names (fixes #4374) @jmoralez (#5184)
  • [c-api] check number of features when retrieving number of bins @jmoralez (#5183)
  • [CUDA] Fix integer overflow in cuda row-wise data @shiyu1994 (#5167)
  • [R-package] ensure values in params override keyword arguments to predict() (fixes #4670) @jameslamb (#5122)
  • check nullable of bin_mappers in DatasetLoader::CheckCategoricalFeatureNumBin (fix #5145) @shiyu1994 (#5146)
  • [CUDA] Fix row-wise histogram construction with dense data matrix @shiyu1994 (#5103)
  • [fix] fix duplicate added initial scores for single-leaf trees @shiyu1994 (#5050)
  • [python] fixes for supporting 2d numpy arrays for predictions, grads and hess in multiclass custom objective and eval @StrikerRUS (#5030)
  • CUDATreeLearner: free GPU memory in destructor if any allocated @denmoroz (#4963)
  • Use delete[] where appropriate instead of delete @david-cortes (#4984)
  • Pass train dataset parser config to valid dataset loading parser @chjinche (#4985)
  • [R-package] Fix custom objective detection in print.lgb.Booster() @StrikerRUS (#4941)
  • gpu allocate memory overflow (fixes #4926) @jiapengwen (#4928)
  • [R-package] respect 'verbose' argument in lgb.cv() (fixes #4667) @jameslamb (#4903)
  • [R-package] Apply patch for R4.2 on Windows @shiyu1994 (#4923)
  • [R-package] respect aliases for objective and metric and lgb.train() and lgb.cv() @jameslamb (#4913)
  • [python] raise an informative error instead of segfaulting when custom objective produces incorrect output @yaxxie (#4815)
  • [R-package] fix handling of duplicate parameters (fixes #4521) @jameslamb (#4914)
  • [R-package] update parameter 'verbosity' based on keyword arg 'verbose' @jameslamb (#4899)
  • [R-package] fix CVBooster reset_parameter() method (fixes #4900) @jameslamb (#4901)
  • [python] reset storage in record evaluation callback each time before starting training @StrikerRUS (#4885)
  • [python] reset storages in early stopping callback after finishing training @StrikerRUS (#4868)
  • [R-package] fix --no-build-vignettes option for build-cran-package.sh @jameslamb (#4848)
  • [python][docs] fix type hints for custom functions and remove vague array-like wording @StrikerRUS (#4816)
  • Always respect forced splits, even when feature_fraction < 1.0 (fixes #4601) @tongwu-msft (#4725)
  • Reset OpenMP thread number if num_threads <= 0 @hzy46 (#4704)

📖 Documentation

🧰 Maintenance

Don't miss a new LightGBM release

NewReleases is sending notifications on new releases.