LeelaChessZero/lc0 v0.31.0-rc1 on GitHub

In this version:

The blas, cuda, eigen, metal and onnx backends now have support for multihead network architecture and can run BT3/BT4 nets.
Updated the internal Elo model to better align with regular Elo for human players.
There is a new XLA backend that uses OpenXLA compiler to produce code to execute the neural network. See https://github.com/LeelaChessZero/lc0/wiki/XLA-backend for details. Related are new leela2onnx options to output the HLO format that XLA understands.
There is a vastly simplified lc0 interface available by renaming the executable to lc0simple.
The backends can now suggest a minibatch size to the search, this is enabled by --minibatch-size=0 (the new default).
If the cudnn backend detected an unsupported network architecture it will switch to the cuda backend.
Two new selfplay options enable value and policy tournaments. A policy tournament is using a single node policy to select the move to play, while a value tournament searches all possible moves at depth 1 to select the one with the best q.
While it is easy to get a single node policy evaluation (go nodes 1 using uci), there was no simple way to get the effect of a value only evaluation, so the --value-only option was added.
Button uci options were implemented and a button to clear the tree was added (as hidden option).
Support for the uci go mate option was added.
The rescorer can now be built from the lc0 code base instead of a separate branch.
A dicrete onnx layernorm implementation was added to get around a onnxruntime bug with directml - this has some overhead so it is only enabled for onnx-dml and can be switched off with the alt_layernorm=false backend option.
The --onnx2pytoch option was added to leela2onnx to generate pytorch compatible models.
There is a cuda min_batch backend option to reduce non-determinism with small batches.
New options were added to onnx2leela to fix tf exported onnx models.
The onnx backend can now be built for amd's rocm.
Fixed a bug where the Contempt effect on eval was too low for nets with natively higher draw rates.
Made the WDL Rescale sharpness limit configurable via the --wdl-max-s hidden option.
The search task workers can be set automatically, to either 0 for cpu backends or up to 4 depending on the number of cpu cores. This is enabled by --task-workers=-1 (the new default).
Several assorted fixes and code cleanups.