github lightvector/KataGo 1.5.0
OpenCL FP16 Tensor Core Support

latest releases: v1.15.1, v1.15.0, v1.14.1...
3 years ago

If you're a new user, don't forget to check out this section for getting started and basic usage!
The latest and strongest neural nets are still those from the former release: https://github.com/lightvector/KataGo/releases/tag/v1.4.5

Changes in this release:

OpenCL FP16 Tensor Cores

New in this release is support for FP16 tensor core GPUs in OpenCL, roughly doubling performance. Theoretically, non-tensor core GPUs that gain significant improvements via FP16 storage or compute may also see a benefit under this release. If you are upgrading from an earlier version of KataGo, the OpenCL tuner will need to re-run to re-tune itself.

The OpenCL FP16 implementation is still a little slower than the CUDA implementation on an FP16 tensor core GPU, so if you've gone through the hassle of installing CUDA and getting it to work on such a GPU, there is not a reason to switch to OpenCL, but now for users who can get OpenCL but not CUDA+CUDNN to work, the gap should be much smaller than before. Further optimization may be possible in the future, any GPU code experts are of course welcome to comment. :)

Other user-facing changes

  • New GTP extension command: set_position which allows a GTP controller to directly set an arbitrary position on the board, rather than hacking it via a series of "play" commands which might accidentally communicate an absurd move history. See documentation for KataGo GTP extensions here as usual.
  • By default, if absolutely no limits or time settings are specified for KataGo, and the GUI or tournament controller running it does not specify a time control either, KataGo will choose a small default of several seconds rather than treating time as unbounded.
  • Added a minor bit of logic for handling mirror Go. Nothing particularly robust or special, won't solve extreme cases, but hopefully fun.
  • Minor adjustments for detecting handicap stones for the purpose of computing PDA and/or when to resign.
  • Benchmark auto-tuning for number of threads is a little more efficient

Self-play

  • Hash-like game ID is now written to selfplay-generated SGFs.
  • Fixes a very rare bug in self-play game forking and initialization that could cause incorrect resolution of move legality as well as apparent neural net hash collisions upon the transition to cleanup phase for Japanese-like territory scoring rules.

Internal

  • Symmetries are now computed on the CPU rather than the GPU, simplifying GPU code a little.
  • A few internal performance optimizations and cleanups, partly thanks to some contributors.

Pure CPU implementation

Also as of this release, there is a pure-CPU implementation which can be compiled via -DUSE_BACKEND=EIGEN for cmake. There are no precompiled executables for it right now because the implementation is very basic and the performance is extremely poor - even worse than one would expect from CPU. So practically speaking, it's not ready for use. However, it's a start, hopefully, and contributors who want to help optimize it would be welcome. :)

Don't miss a new KataGo release

NewReleases is sending notifications on new releases.