This is not the latest release - see a more recent release at v1.16.2!
Notes about Precompiled Exes in this Release
For CUDA and TensorRT, the executables attached below are labeled with the versions of the libraries they are built for.
- E.g. "trt10.2.0" for TensorRT 10.2.0.*
- E.g. "cuda12.5" for CUDA 12.5.*
- etc.
It's recommended that you install and run these with the matching versions of CUDA and TensorRT rather trying to run with different versions.
The OpenCL version will more often work as long as you have any semi-modern GPU hardware accelerator and appropriate drivers installed, whether for Nvidia or non-Nvidia GPUs, without needing any specific versions, although it may be a bit less performant.
Available also below are both the standard and +bs50
versions of KataGo. The +bs50
versions are just for fun, and don't support distributed training but DO support board sizes up to 50x50. They may also be slower and will use much more memory, even when only playing on 19x19, so use them only when you really want to try large boards.
The Linux executables were compiled on a 22.04 Ubuntu machine using AppImage. You will still need to install e.g. correct versions of Cuda/TensorRT or have drivers for OpenCL, etc. on your own. Compiling from source is also not so hard on Linux, see the "TLDR" instructions for Linux here.
Changes this Release
Mitigation for crashes due to infinites/nans:
With v1.16.0, some users were observing that KataGo would sometimes crash while contributing to katagotraining.org on 19x19 positions with extreme komis or results on TensorRT, and also KataGo would crash often when running on larger board sizes, in both cases due to a nonfinite (i.e. nan or infinite) policy output. For the latter, this is especially true for nets that were not trained for large boards, but also still had occasional crashes when using some nets that were trained for large boards. Our best guess of the cause is due to occasional extremely large activations in the net.
KataGo now internally scales the weights of the neural net in a few ways that should reduce the typical magnitude of the internal activations. When running with FP16, this should hopefully better make use of the available FP16 range and make it so that somewhat more extreme values than before are required before a crash, and should hopefully be enough of a buffer stop the 19x19 contribute crashes to entirely. TensorRT also changed convert output heads in FP32 one layer earlier, which might help as some of the larger activations tend to be in the head.
Other various notable changes (C++/Engine):
- Fixed a bug in the "-fixed-batch-size" and "-half-batch-size" arguments to benchmark in how they set the benchmark batch size limits.
- KataGo now tolerates simple ko violations in SGFs or GTP play commands or a few other locations, just like it tolerates superko violations.
- KataGo now compiles with C++17 standard rather than C++14.
- Fixed some issues where some board-size-related config arguments didn't accept values up to 50 for the +bs50 executables.
- Updated CMake logic to handle a change to the header define format in newer versions of TensorRT.
Other various notable changes (Python/Training):
- Significantly rearranged the python directory - all .py files that aren't intended to be directly run have been moved to a
./python/katago/...
package. The imports in all the scripts and the various self-play training scripts should be updated appropriately. - Some shuffling/training scripts no longer rely upon symlinking the data location, which doesn't work on windows