If you're a new user, this section has tips for getting started and basic usage! If you don't know which version to choose (OpenCL, CUDA, TensorRT, Eigen, Eigen AVX2), see here.
Download the latest neural nets to use with this engine release at https://katagotraining.org/.
Also, for 9x9 boards or for boards larger than 19x19, see https://katagotraining.org/extra_networks/ for networks specially trained for those sizes!
KataGo is continuing to improve at https://katagotraining.org/ and if you'd like to donate your spare GPU cycles and support it, it could use your help there!
Notes about Precompiled Exes in this Release
For CUDA and TensorRT, the executables attached below are labeled with the versions of the libraries they are built for. E.g. trt10.2.0 for TensorRT 10.2.0.x, or cuda12.5 for CUDA 12.5.x, etc. It's recommended that you install and run these with the matching versions of CUDA and TensorRT rather trying to run with different versions.
The OpenCL version will often work as long as you have any semi-modern GPU hardware accelerator and appropriate drivers installed, whether for Nvidia or non-Nvidia GPUs, without needing any specific versions, although it may be a bit less performant.
Available also below are both the standard and +bs50 versions of KataGo. The +bs50 versions are just for fun, and don't support distributed training but DO support board sizes up to 50x50. They may also be slightly slower and will use much more memory, even when only playing on 19x19, so use them only when you really want to try large boards.
The Linux executables were compiled on a 22.04 Ubuntu machine using AppImage. You will still need to install e.g. correct versions of Cuda/TensorRT or have drivers for OpenCL, etc. on your own. Compiling from source is also not so hard on Linux, see the "TLDR" instructions for Linux here.
This release also incorporates significant improvements thanks to @ChinChangYang for the Metal backend for MacOS. Precompiled executables for MacOS are not provided here, but you can check out Homebrew which should update before too long and/or build on your own.
Changes this Release
Improved Metal backend for macOS
- The Metal backend now supports hybrid CPU+GPU+ANE (Apple Neural Engine). Each server thread can be configured to run on GPU (via MPSGraph) or ANE (via CoreML) using the
metalDeviceToUseThread<N>config option. - The CoreML model converter (katagocoreml) is now vendored into the repo, so building the Metal backend does not require an external Homebrew package.
- Various improvements to the internal implementation, error reporting
Other feature additions/changes
- Added option to report MCTS triple-ko (no-result) probability in GTP and analysis via
includeNoResultValue. - Minor improvements for book generation - multithreaded book processing, some additional commands and arguments.
User-facing Bugfixes
- Fixed crash on
clear_cachein the analysis engine. - Fixed endless recursion on Windows in threadsafequeue.h.
- Fixed
cpuctUtilityStdevPriorallowing a value of 0, which could cause a divide-by-zero. - Replaced recursive SGF parsing with iterative approach to avoid stack overflow, other small fixes.
- Fixed priority mutex bug where low-priority path called the high-priority path.
- Fixed bad interaction between certain hacks and avoid moves in book generation.
- Fixed inconsistent genmove params setting in GTP.
- Fixed incorrect error field name when reporting komi errors in the analysis engine.
- Fixed issue where firstReportDuringSearchAfter could fire before any results were available.
Build/compatibility fixes
- Added support for CUDA 13.0.
- Added support for CMake 4.* with the Metal backend.
- Fixed nvinfer library detection on Windows (use nvinfer_10).
- Fixed Eigen cmake configuration to be more general.
- Fixed Metal backend CMAKE_OSX_SYSROOT not being set.
- Fixed Makefile git revision detection.
- Now compiles with CMake build type flags.
Major Python/training script changes preparing for new architectures
- Added support for Muon and AdamW optimizers. Muon trains a lot faster than all prior optimizers in this repo.
- Added experimental support for training transformers, with a variety of basic features. No support on C++ side yet.
- Added support for torch.compile, pytorch side benchmarking script.
- Variety of python training script changes and fixes.
- Fixed some bash scripts to better handle whitespace in file paths, detect python, etc.
- Fixed
play.pycrash due to undefined var and escape sequence.
Internal/code quality/dev-relevant changes
- Upgraded to tclap 1.2.5.
- Various const correctness improvements and refactoring, slight CPU-side performance improvements.
- Tightened assertions and testing outside of hot paths.
- Removed demoplay command.
- Fixed out-of-bounds write in training data score distribution.
- Various other minor fixes and improvements