Tuesday, January 10, 2006

Counting Go

This week I am making my nearly yearly visit to CWI in Amsterdam, where I spent a sabbatical year in the 90's. I will sit on the opposition on Troy Lee's Ph.D. defense on Wednesday. Troy was a paranimf at Hein Röhrig's defense where I also sat in the opposition two years ago.

At CWI we also find John Tromp who works on various puzzles. Now he is counting Go positions. Tromp and his co-author Gunnar Farnebäck show that legal positions are colorings of the grid to {white, black, empty} such that every white or black connected component borders an empty node. They have exactly counted the number of positions on boards up to 16x16 and have an asymptotic bound on larger boards.

If you really want to know the exact number of positions of the standard 19x19 board and have a server with ten terabytes of disk space to spare, John would love to hear from you.

1. Hey, Lance, here we go: last time you bashed people who are using computers to find large primes, and now you endorse a guy who is counting positions in Go. How is Go counting different from large-prime-finding?

2. My first guess as to how they're doing this is by extending upwards. First you count all possibilites for the 1x19 with an exposed edge, then the 2x19 with an exposed edge, then 3x19, etc. For each one, you calculate the number of times each bordering edge occurs, then for the 19x19 add up all the ones which don't have a dead region on the bordering edge. The state of the bordering edge is that each cell either contains nothing, white which isn't alive, black which isn't alive, white which is alive, or black which is alive. You then try all possibilities against all possibilities for the next row, with the limitation that a piece which was part of a not alive region can't get sealed off.

Obviously you can't have a piece which isn't alive border one which is, so the number of possibilities for the boundary isn't all that much more than 3^19, which is about a gigabyte, but the real number is somewhat more than that. The ten terabytes number appears to be taken from the wildly conservative 5^19 number, a value which leaves the vast majority of the disk unused, because 3/4 of the time a piece will be made alive because there's an empty space on one or both of its sides (or the sides of the extended line of identical-side pieces it's part of).

Does all this make sense? If it does, then it should be possible to calculate the whole value with vastly less disk space.

3. Actually getting off my ass and reading the paper, I see that I missed the possibility that libertyless regions can be connected to each other, and that dramatically increases the number of possibilities. I also missed that you can move the border one piece at a time instead of doing the whole layer at once, which removed a quadratic blowup in exchange for a factor of 19 multiplier and a small amount of extra disk space. It also dramatically reduces the amount of disk seeks you have to do. On real computers the runtime for problems like this is mostly directly proportional to disk seeks.

The value of the border size (table 1 in the paper) is about 300 billion, and the authors use chinese remaindering to get the space required for each number under control, at the expense of having to do more passes.

With 300 billion positions and ten terabytes of storage, the amount of space for storing each number is about 30 bytes, which can store a number up to about 10^72, which is several times as long as the modulus listed on their page for a partial result on the size 17, but at least I'm well within an order of magnitude, so the rest of the slop is probably implementation details.

Optimizing the algorithm for reducing disk seeks is an interesting and apparently nontrivial problem, and one which the paper doesn't discuss at all. The main difficulty is that if a piece is connected to other pieces in the back, then if it gets a liberty then that can affect counts of distantly related positions, causing disk seeks. Perhaps a different grouping of positions could fix this problem.

4. last time you bashed people who are using computers to find large primes

No he didn't. He was just asking why it qualified as news.

The difference is a very wide chasm. For example, if there was a story that said "man stops at red light" you'd ask "why is this news?" That would not be the same as "bashing" someone for stopping at a red light. (In particular, you might even be glad that people do stop at red lights.)

5. Out of curiosity, what is a "paranimf"