Monday, August 10, 2009

Computer Go

While we have computer programs that have solved Checkers (it's a tie) and beat the world's best chess players, but until recently Go programs played a very mediocre game.

Computers playing these kinds of games have a rough similar structure: Do a search through the game tree for some number of levels, evaluate the resulting game boards and do mini-max through the game tree to pick the best move for the player. There are many tricks to speed up the search (allowing a larger depth) such as alpha-beta pruning, selectively extending the tree search and various other tools but the basics run the same.

The problem in Go is two fold: A very large number of moves at every turn forcing a very shallow tree and game boards that are very hard to evaluate. Much faster computers has helped in a getting at least a reasonable depth in the search. But what about evaluation?

Consider the following seemingly crazy way to evaluate a game board: Have each player play randomly and see who wins. Repeat a few hundred times and score the position by the percent of time that White won.

Imagine that strategy for chess: Each player would often put their pieces in jeopardy and the opponent would fail to take them. Most randomly simulated games would end in a draw because no one would execute the checkmate.

But for Go the random process to evaluate positions works. Combined with a very fast machine, a well-designed tree search and lots of fine tuning this process had led to Computer Go programs that can play good games against strong amateurs. 

We're a long way from having Computer Go programs as the top Go players in the world but when a simple idea takes the power of Go programs from lousy to not bad in a relatively short time one can hope.

14 comments:

  1. I just returned from the US Go Congress and was able to watch Myungwan Kim defeat Many Faces of Go live.

    While I agree that the Monte Carlo methods (i.e. randomly play games and see what happens) have vastly improved the strength of go-playing programs, there are some side effects as well. When it was clear to me (as a 9k) that Kim was going to win, Many Faces began playing obviously bad moves that would only work if Kim did not respond with the obvious good move. My interpretation of this situation is that the percentage of Monte Carlo wins was still high enough to convince Many Faces to play moves that would only aid in a victory were Kim to completely ignore the move (i.e. play the next move at random). From a human perspective, it usually considered an insult to play a losing sequence when both players clearly know that it is a losing sequence. It would be ideal if our AI could be polite in addition to being smart.

    With respect to chess, I would not discount a Monte Carlo approach so quickly. There are extensive endgame tablebases for chess, so I would stop playing randomly when the current board position has already been solved. I think that this approach would solve the problem that most random chess games would end it a draw.

    ~Tyson Williams

    ReplyDelete
  2. It is indeed amazing how much computer go programs improved over the last few years. Especially on small boards (9x9) where the game tree is not very wide they have become very strong.

    Especially surprising progress was made by MoGo, a program which, as you write, evaluates a position by finishing the game in a quick manner -- however, saying that they ``have each player play randomly and see who wins'' is actually an oversimplification. It wouldn't work in go either. Instead, MoGo considers the local arrangement of stones and weights the probability of moves according to that, i.e., they do some pattern matching. Some details can be found in the technical report linked from the page above.

    To me it seems conceivable that a similar strategy could work in chess to evaluate a position. Moves which capture an opponent piece without putting anything in danger would have a very high probability, for example. However, in chess evaluating a position seems less difficult than in go, which is probably why no such tricks are needed.

    Thomas

    ReplyDelete
  3. Believe it or not, I've actually thought about this a good bit lately (thanks for writing my blog post for me, Lance! :D) and I'll try to chip in some of my thoughts.

    1. You can model either game as a rooted directed graph with weights from {0, 1} attached to the sinks -- this is well-known. In some formulations of Go, the graph's acyclic; it's not in chess, but it can be considered to be so as a simplification.

    The value at each node is a function of the values at the sinks reachable from that node, which is defined recursively from the bottom up. On the other hand, the graph is so huge that actually computing this function is impractical. So we make some simplifications.

    What chess programs do is use a good evaluation function to estimate the game value at a constant number of layers down from the current position. Why is this possible? Well, partially because the material value of pieces is the dominant term in the evaluation function, so when this is nonzero, we're usually golden. In addition, the theory of position is well-understood. It also helps that chess is a very local game, and furthermore local in a precise sense that means we often don't need to calculate the evaluation function from scratch. Add that to a reasonably low branching factor, and it makes sense.

    Monte Carlo methods in Go, on the other hand, try to approximate the game value by the L1 norm of the reachable sinks! This isn't always a great approximation, but it's not terrible; if you win 99% of the games that could be played from a position, then it's not very likely that your opponent can get to that 1%. But this L1 norm is as hard to compute as the original function, so we use Monte Carlo instead.

    This is helped by a couple of factors. First, the DAG is only slightly noncommutative. Usually, if you play at X, your opponent plays at Y, and you play at Z, you end up with the same situation as if you'd played Z-Y-X, and unlike in chess, if one of these sequences is legal, then the other usually is too. This is useful because it restricts the behavior of the L1 norm, and also because it means that we don't need to collect as much data.

    The other one is that Go doesn't have the same wild behavior as chess; if you're leading by a lot in the endgame, you'll probably win. This makes the norm a better approximation than in the purely abstract case.

    I could be totally wrong with all this, but these are the thoughts I've had, and there's no point in not sharing them.

    ReplyDelete
  4. The problem with Monte Carlo methods, as Tyson already mentioned, is that in many tactical situations there is exactly one correct move, and this move usually falls into the area of 1%.

    However, I would guess that a well-thought-out combination of Monte Carlo methods with a good tactical reader would work much better.

    ReplyDelete
  5. And that is basically what Many Faces does. Many Faces was originally just based on "go knowledge". Then MoGo showed the go world how great the Monte Carlo approach worked, so now Many Faces is a blend of both go knowledge and Monte Carlo approaches.

    When explaining this to other people, I say that the Monte Carlo approach will generate a few moves that seem the best. Then Many Faces uses its go knowledge to select which is best.

    ~Tyson Williams

    ReplyDelete
  6. Of course it's true that pure Monte Carlo methods aren't going to be able to beat a smart Go beginner; but my point was there are plenty of theoretical reasons why they seem to be useful.

    ReplyDelete
  7. The game is two-player constant sum, so it can be solved in polynomial time with a linear programming. I don't see what the problem is here.

    ReplyDelete
  8. The problem is that the number of strategies (and hence the size of the natural linear program) is exponential in the size of the board. For the standard board size (19-by-19), the number of possible strategies is prohibitively large to even write down.

    ReplyDelete
  9. Finding the best move in Go is EXPTIME-Complete, so there is no linear program that is linear in the size of the input (where the input is the length of the side of the board).

    ~Tyson Williams

    ReplyDelete
  10. And so ... Go joins the set of problems that accord with the (surprisingly reliable) real-world heuristic that EXPTIME-complete problems are intractable ... except in practice.

    ReplyDelete
  11. This comment has been removed by the author.

    ReplyDelete
  12. I don't think that go is that easy in practice. It has taken us humans thousands of years to reach our current skill level. Also, we do not know how good our moves are. It is possible that there is still room for substantial improvement.

    ~Tyson Williams

    ReplyDelete
  13. An important point has been forgotten in the discussion.

    Monte-Carlo Go, as it can be understood from this post and its comments, exists since 1993 (unpublished well known paper by Bruegman). It was interesting, but not a revolution.

    The revolution in Monte-Carlo Go is the use of a growing tree, which iteratively adds new nodes, following the Monte-Carlo simulations.

    Then, the Monte-Carlo is adapted: the naive Monte-Carlo moves are replaced by a subtle technique, at least for situations which are already in the tree. This was done first by Coulom, Chaslot and others for the game of Go, and by Kocsis and Szepesvari in the nice setting of UCT. The revolution is here, much more than in the use of Monte-Carlo itself.

    There are other important milestones in UCT, but I think the point above is the first which really had to be emphasized.

    Olivier Teytaud

    ReplyDelete
  14. We normally use our collaborative voices to advocate to the world’s leaders on the fight against terrorism, corruption, poverty and global disease. Today, I was doing some reading that made me think we might be able to use our collective computing power as well for the betterment of the humanity.
    If you own a fast computer and are not running memory intensive applications, then you could take the initiative by allowing the community grid service applications to run even when your machine is not idle. You don’t have to do anything extra with this application; neither has to be a tech savvy, you just need let this application to run of its own. You can carry on with your tasks but in real you might be a step closer to solve one of the problems facing humanity.

    ReplyDelete