Data sets:
- FEN dataset: All positions from http://data.lczero.org/files/training_pgns/pgns-run1-20190419-1854.tar.bz2 (12'582'379 positions).
- Tree nodes after
go nodes 1000000
from startpos with network id42482
. - Tree nodes after
go nodes 1000000
from (middlegame) r2br3/ppq2pkp/2b2np1/4p3/6N1/2PB1Q1P/PP1B1PP1/R3R1K1 w - - 1 2 with network id42482
. - Tree nodes after
go nodes 1000000
from (late middlegame) br5r/2bpkp2/B6p/2p4q/1PN1np2/P3P3/1BQ3PP/R4RK1 w - - 0 26 with network id42482
. - Tree nodes after
go nodes 1000000
from (endgame) 8/8/1p1n2k1/3P2p1/3P1b2/1P1K1B2/8/4B3 w - - 2 55 with network id42482
. - Same as dataset 2 (startpos), but with 10'000'000 nodes (10 times more).
- Perpetual check position r6k/pp4pp/2p5/5Q2/1q5P/8/2P2PP1/1K1R3R w - - 1 25 from Leko vs Kramnik game.
go nodes 1000000
with network id42482
. - KRN vs KR draw enggame, 6r1/6k1/8/8/8/8/3N4/2KR4 w - - 0 1.
go nodes 1000000
with network id42482
.
Stats
Note: Ignore MaterialKey for now, that was crem’s idea for Lc2
- Percentage of positions requiring MaterialKey to be computed:
- Dataset 1: 29.8%
- Dataset 2: 46.7%
- Dataset 3: 48.1%
- Dataset 4: 33.1%
- Dataset 5: 7.1%
- Dataset 6: 46.6%
- Bucket size distribution (Remaining bits encode pawns at ranks 2 and 3):
- Dataset 1: (just for completeness; not really useful stats as positions are sampled from different games).
- 2312346 buckets. 5.44 positions per bucket (4.7 if only unique).
- Fun fact: 4.8% of all positions in training data is RKvsRK.
- Dataset 6 (unique only):
- 313978 buckets, 11.8 posisions per bucket.
- Top bucket countains 0.672% of nodes (24951).
- Bucket size distribution (Remainig bits encode hash of pawns)
- Dataset 6 (unique only): 388226 buckets (9.5)
- 98.5% of larger buckets contain 50% of nodes (remaining 50% nodes are contained in size < 100 buckets).
- Size < 16 buckets still contain 25% of Nodes.
- Dataset 5 (unique only): 573 buckets, (240.0)
- Dataset 1 (unique only): 2491831 buckets, (4.36)
Number of transpositions with exact and non-exact history
Conclusions
- Keeping one copy of unique position leads to 50% tree reduction in most cases (and closer to endgames even more).
- Including history into the position key destroys large part of benefit.
- On the other hand, including rule50 counter into the position key doesn’t spoil things much, and removes need of an NN cache (also adds a nice benefit of non-loop PV, e.g. move graph becomes DAG, although it’s not really needed).
- Threefold and twofold repetitions are very rare, and it’s worthwhile to have slower logic when handling them. Also it’s fine to only support threefold and not twofold.
- Memory footprint for handling repetitions (“pseudonodes”) is very reasonable.
Dataset | 3 midg. | 4 late midg. | 5 endg. | 6 startp. | 8 shuf. |
---|
Positions | 1000350 | 1000107 | 1000000 | 10111141 | 1000145 |
Unique positions | 537946 (53.8%) | 692459 (69.2%) | 103738 (10.4%) | 3290287 (32.5%) | 243723 |
Unique pos with history | 702473 (70.2%) | 875692 (87.6%) | 897186 (89.7%) | 4535616 (44.9%) | 949180 |
Unique pos rep2 history | 538307 (53.8%) | 700134 (70.0%) | 111754 (11.2%) | 3291930 (32.6%) | 245762 |
Unique +rule50 | 584265 (58.4%) | 750441 (75.0%) | 129460 (12.9%) | 3685536 (36.5%) | 278887 |
Threefold | 69 | 3128 | 404 | 323 | 33 |
Twofold | 413 | 3438 | 20606 | 3794 | 6853 |
Norep | 999223 (99.9%) | 970586 (97.0%) | 918339 (91.8%) | 10097097 (99.9%) | 983750 |
Pseudonodes | 878 | 8002 | 21020 | 2651 | 10415 |
Dataset | 2 startp. | 7 perpet. |
---|
Positions | 1000232 | 1001502 |
Unique positions | 409988 (41.0%) | 23244 |
Unique pos with history | 535639 (53.6%) | 64001 |
Unique pos rep2 history | 410229 (41.0%) | 34520 |
Unique +rule50 | 449766 (45.0%) | 29254 |
Threefold | 60 | 109884 |
Twofold | 355 | 99294 |
Norep | 999036 (99.9%) | 88295 |
Pseudonodes | 363 | 436 |
- Dataset – title of dataset.
- Positions – number of positions in the set.
- Unique positions – number of unique positions in the set (what makes a cache hit in Lc0).
- Unique positions with history – number of unique positions in the set, where all positions from the last capture or pawn move (in any order) are included into positions.
- Unique positions with rep2 history – the same, but only history positions which appeared more than once are included into the position.
- Unique +rule50 – Number of unique positions (without history) if rule50 count is included into position hash.
- Threefold – Number of three-fold repetition positions.
- Threefold – Number of exact two-fold repetition positions.
- Norep – Number of positions which do not include any repetitions in PV.
- Pseudonodes – Number of additional “pseudo nodes” due to repeptition handles.