December 14, 2018
Alphabet’s London-based DeepMind loosed AlphaZero, its AI-powered system that can master games without human intervention, on Stockfish, the highest rated chess game engine, and crushed it. DeepMind developed the self-training method, dubbed deep reinforcement learning, specifically to attack strategy board game “Go,” and an earlier iteration of the system beat one of the world’s best “Go” players, although it needed human guidance. AlphaZero trained itself in chess in three days, rejecting red-marked moves after a mere 1,000 simulations.
IEEE Spectrum reports that AlphaZero, “after another 100,000 simulations … chose the move marked in green over the one marked in orange, [and] went on to win, thanks in large part to having opened the diagonal for its bishop.” DeepMind’s David Silver published the research in Science, “accompanied by a commentary by Murray Campbell, an AI researcher at the IBM Thomas J. Watson Research Center.”
“This work has, in effect, closed a multi-decade chapter in AI research,” said Campbell, who was a member of the team that designed IBM’s Deep Blue, which in 1997 defeated world chess champion Garry Kasparov. “AI researchers need to look to a new generation of games to provide the next set of challenges.”
AlphaZero is limited to beating any game “that provides all the information that’s relevant to decision making.” Campbell’s reference to “new generation of games” pointed to those with “imperfect” information, which include poker as well as “many multiplayer games, such as ‘StarCraft II,’ ‘Dota,’ and ‘Minecraft’.” Campbell noted that, “those multiplayer games are harder than ‘Go,’ but not that much higher,” reporting that “a group has already beaten the best players at ‘Dota 2,’ though it was a restricted version of the game.”
“’StarCraft’ may be a little harder,” he continued. “I think both games are within 2 to 3 years of solution.” AI that can solve for games with imperfect information will have applications well beyond games, to everything from financial modeling to self-driving cars. Campbell dubs multi-player games a “good interim step” on the way to more challenging games that include language, which “open up still greater realms of complexity.”
The win over Stockfish means that AlphaZero’s deep reinforcement learning has now been “generalized … to other games,” meaning that the team was “able to find tricks to preserve its playing strength after giving up certain advantages peculiar to playing ‘Go’.” The biggest advantage to tackling the latter was “the symmetry of the ‘Go’ board, which allowed the specialized machine to calculate more possibilities by treating many of them as mirror images.”
Campbell stated that, although “there was actually a significant debate over whether the approach would work,” AlphaZero simply took the “Go” board and rules, and replaced them with chessboard and chess rules. So far, AlphaZero has only mastered “Go,” chess and “Shogi,” a Japanese form of chess. Although “Go” and “Shogi” are “astronomically complex,” chess has been the “preferred test bed for AI for more than a lifetime,” as indicated by “the research of such pioneers as Alan Turing, Claude Shannon, and Herbert Simon.”