Lc0 release v0.29.0
It’s been a while since we released a new version of Lc0, but we finally put out v0.29.0 a few weeks ago. In this post, we’ll talk about what’s new in this release and why it took so long to come out.
The main attraction is “attention policy”, a new feature in the neural network architecture supported by Lc0, that required extensive and complicated changes in the backend code that handles the computation, with more details given below. The following sections contain the bullet list from the release announcement, rearranged in several categories with more detailed descriptions.
Attention policy support.
- Full attention policy support in cuda, cudnn, metal, onnx, blas, dnnl, and eigen backends.
- Partial attention policy support in onednn backend (good enough for T79).
- The default net is now 791556 for most backends except opencl and dx12 that get 753723 (as they lack attention policy support).
One of the first fruits of the ongoing experimentation with new neural network architectures is the already mentioned attention policy. The policy output of the neural net, for a given position, gives the probability each legal move is the best one. Attention policy is a new way to calculate this, giving more accurate results, but requires modifications of the code that does the neural net computation. This code is broken down in several backends, each one targeting a specific method of computation, either specific gpus or external libraries. Unfortunately, only a few members of our team are able to do these performance critical bits of code and had limited time available, so it took a while to get everything working. We have, finally, solutions that cover most of our users and an approach that can hopefully allow us to have support for new architectures a lot quicker - more on that later.
Currently all recent network series (T78, T79 and T80) utilize attention policy, meaning that the backends that haven’t been updated (dx12 and opencl) do not work with those nets. Further, the T78 nets require some of the optional attention policy features and as such are usually the last to be supported: the onednn backend is not yet at this point, but can be used with all other network architectures.
New and improved backends
- New metal backend for apple systems. This is now the default backend for macos builds.
The new metal backend was frequently requested by macos users. This is faster than opencl, the best that was previously available for macs, and moreover has all features required by the newer nets. The downside is that it currently only works with newer macos releases, but we expect we will be able to offer it for earlier versions soon.
- New onnx-dml backend to use DirectML under windows, has better net compatibility than dx12 and is faster than opencl. See the README for use instructions, a separate download of the DirectML dll is required.
- Now the onnx backends can use fp16 when running with a network file (not with .onnx model files). This is the default for onnx-cuda and onnx-dml, can be switched on or off with by setting the
fp16
backend option totrue
orfalse
respectively. - The onnx backend now allows selecting gpu to use.
The onnx backend gets a new flavor (onnx-dml) in addition to the existing ones (onnx-cuda and onnx-cpu). This new variation uses windows’ DirectML library to handle the computation, working on every system our previous dx12 backend supported and more. While it is a bit slower than the dx12 backend (for now), it supports the new networks and their improved strength more than overcomes this speed loss. To run onnx-dml a very recent DirectML dll is required, and can be downloaded from https://www.nuget.org/api/v2/package/Microsoft.AI.DirectML/1.10.0. This is a nuget installer package, but can be used as a normal zip file by changing the extension to .zip - the dll needed is /bin/x64-win/DirectML.dll
.
The really exciting part is that the onnx backend is quite easy to expand, so we hope it will offer a way to use new network architectures (several promising ones are currently under development) in a timely manner - having a baseline working backend is a huge help for working and updating other backends. Also onnx offers a way to support systems, that we don’t currently serve well, by leveraging vendor libraries.
- The onednn package comes with the latest dnnl compiled to allow running on an intel gpu by adding
gpu=0
to the backend options.
Miscellaneous
- Support for using pgn book with long lines in training: selfplay can start at a random point in the book.
The plan is to use a book with long opening lines (or even full games) for training experiments, where the selfplay games deviate from random points of each line.
- Support for double Fischer random chess (dfrc).
Double Fischer random chess is a variation of Fischer random chess (FRC - also known as chess 960) where the opening position is not necessarily symmetric - this is a huge superset of FRC.
- New “simple” time manager.
Seems to offer similar performance with the current default (“legacy”) time manager, with less code. Can be enabled with --time-manager=simple
.
- Added TC-dependent output to the backendbench assistant.
The backendbench assistant now proposes different batch sizes for 3 common time controls, specifically 1sec, 15sec and 3min per move.
- Starting with this version, the check backend compares policy for valid moves after softmax.
The check backend is used to validate the correctness of the neural network computations, as a guard against buggy drivers. This change makes the test more robust, in particular when using fp16 computation.
- Improved error messages for unsupported network files.
- Non multigather (legacy) search code and
--multigather
option are removed. - Some assorted fixes and code cleanups.