Deepmind presents Artificial General Intelligence for board games

In a paper recently published in the journal Science, researchers from DeepMind describe Alpha Zero, a system that mastered three very complex games, Go, chess, and shogi, using only self-play and reinforcement learning. What is different in this system (a preliminary version was previously referred in this blog), when compared with previous ones, like AlphaGo Zero, is that the same learning architecture and hyperparameters were used to learn different games, without any specific customization for each different game.
Historically, the best programs for each game were heavily customized to use and exploit specific characteristics of that game. AlphaGo Zero, the most impressive previous result, used the spatial symmetries of Go and a number of other specific optimizations. Special purpose chess program like Stockfish took years to develop, use enormous amounts of field-specific knowledge and can, therefore, only play one specific game.
Alpha Zero is the closest thing to a general purpose board game player ever designed. Alpha Zero uses a deep neural network to estimate move probabilities and position values. It performs the search using a Monte Carlo tree search algorithm, which is general-purpose and not specifically tuned to any particular game. Overall, Alpha Zero gets as close as ever to the dream of artificial general intelligence, in this particular domain. As the authors say, in the conclusions, “These results bring us a step closer to fulfilling a longstanding ambition of Artificial Intelligence: a general game-playing system that can master any game.
While mastering these ancient games, AlphaZero also teaches us a few things we didn’t know about the games. For instance, that, in chess, white has a strong upper hand when playing the Ruy Lopez opening, or when playing against the French and Caro-Kann defenses. Sicilian defense, on the other hand, gives black much better chances. At least, that is what the function learned by the deep neural network obtains…
Actualization: The NY Times just published an interesting piece on this topic, with some additional information.

AlphaZero masters the game of Chess

DeepMind, a company that was acquired by Google, made headlines when the program AlphaGo Zero managed to become the best Go player in the world, without using any human knowledge, a feat reported in this blog less than two months ago.

Now, just a few weeks after that result, DeepMind reports, in an article posted in, that the program AlphaZero obtained a similar result for the game of chess.

Computer programs have been the world’s best players for a long time now, basically since Deep Blue defeated the reigning world champion, Garry Kasparov, in 1997. Deep Blue, as almost all the other top chess programs, was deeply specialized in chess, and played the game using handcrafted position evaluation functions (based on grand-master games) coupled with deep search methods. Deep Blue evaluated more than 200 million positions per second, using a very deep search (between 6 and 8 moves, sometimes more) to identify the best possible move.

Modern computer programs use a similar approach, and have attained super-human levels, with the best programs (Komodo and Stockfish) reaching a Elo Rating higher than 3300. The best human players have Elo Ratings between 2800 and 2900. This difference implies that they have less than a one in ten chance of beating the top chess programs, since a difference of 366 points in Elo Rating (anywhere in the scale) mean a probability of winning of 90%, for the most ranked player.

In contrast, AlphaZero learned the game without using any human generated knowledge, by simply playing against another copy of itself, the same approach used by AlphaGo Zero. As the authors describe, AlphaZero learned to play at super-human level, systematically beating the best existing chess program (Stockfish), and in the process rediscovering centuries of human-generated knowledge, such as common opening moves (Ruy Lopez, Sicilian, French and Reti, among others).

The flexibility of AlphaZero (which also learned to play Go and Shogi) provides convincing evidence that general purpose learners are within the reach of the technology. As a side note, the author of this blog, who was a fairly decent chess player in his youth, reached an Elo Rating of 2000. This means that he has less than a one in ten chance of beating someone with a rating of 2400 who has less than a one in ten chance of beating the world champion who has less than a one in ten chance of beating AlphaZero. Quite humbling…

Image by David Lapetina, available at Wikimedia Commons.