Google last week announced AlphaGo, a program that uses ML techniques to optimize MCTS. This program beat the European Go champion five games to none, a huge advance over beating amateurs. In March AlphaGo will play the world Go champion, Lee Sedol, in a five game match. The world will follow the match closely (on YouTube naturally). For now we should keep our expectations in check, Deep Blue failed to beat Kasparov in its first attempt.
Google researchers describe AlphaGo in detail in a readable Nature article. To oversimplify they train deep neural nets to learn two functions, the probability of a win from a given position, and the probability distribution used to choose the next move in the random game play in MCTS. First they train with supervised learning based on historical game data between expert players and then reinforcement learning by basically having the program play itself. AlphaGo uses these functions to guide the Monte Carlo Tree Search.
AlphaGo differs quite a bit from chess algorithms.
- AlphaGo uses no built-in strategy for Go. The same approach could be used for most other two player games. I guess this approach would fail miserably for Chess but I would love to see it tried.
- Machine learning has the nice property that you can train offline slowly and then apply the resulting neural nets quickly during gameplay. While we can refine the chess algorithms offline but all the computation generally happens during the game.
- If a computer chess program makes a surprising move, good or bad, one can work through the code and figure out why the program made that particular move. If AlphaGo makes a surprising move, we'll have no clue why.
- I wonder if the same applies to human play. A chess player can explain the reasoning behind a particular move. Can Go players do the same or do great Go players rely more on intuition?
Machine learning applications like AlphaGo seemingly tackle difficult computational problems with virtually no built-in domain knowledge. Except for generating the game data for the supervised learning, humans play little role in how AlphaGo decides what to do. ML uses the same techniques to translate languages, to weed out spam, to set the temperature in my house, and in the near future to drive my car. Will they prove our theorems? Time will tell.