DeepMind’s Learning Algorithm Could Prove a Game-Changer

DeepMind recently released the full evaluation of AlphaZero, a single system capable of playing “Go,” chess, and shogi (Japanese chess). This new project builds on AlphaGo, a program that beat one of the best players in the world at the board game “Go” in 2016, and AlphaGo Zero, software capable of mastering the game from first principles. AlphaZero represents a dramatic step forward in AI research as it is one of the first intelligent systems capable of generalizing solutions to new problems with little to no human input.

Using hardware that will likely become standard in a few years time, AlphaZero successfully defeated several game playing engines specifically designed to excel at their respective games. In chess, the program won 155 games and only lost 6 out of 1,000 games played against previous champion Stockfish. In shogi, AlphaZero defeated previous world champion Elmo 91.2 percent of the time. And, in “Go,” AlphaZero beat its predecessor in 61 percent of games played.

While deep learning has long proven capable of mastering specific tasks, it often struggles to apply this knowledge to new problems or even slightly modified versions of the original task. AlphaZero was built such that it should learn a game from scratch based only on the rules and the current board configuration (where the pieces are at any given time). DeepMind published its findings in Science earlier this month.

The neural network at the heart of the program trains itself using a process called reinforcement learning. It improves by repeatedly playing itself and learning which moves ultimately lead to favorable outcomes (wins or draws). These networks can suffer from unstable learning (falling into a rut of selecting the same moves over and over). To overcome this, the program introduces some noise and randomness to its process of selecting its next move during training.

“Go” was selected as a new means of testing AI’s strength because moves cannot be “brute forced” as there are 10^170 possible board configurations (more than the number of atoms in the universe). As a result, the best “Go” players describe their approach in terms of intuition and feel. Despite this, there are certain game-specific attributes that can be exploited, like move symmetry, that can aid a “Go” engine.

It is these exploited attributes that make the system game dependent. Unlike “Go,” chess for example is asymmetric (pawns only move forward) and a draw can sometimes be the best attainable outcome. Therefore, any system capable of playing both would need to be flexible to these requirements and would be unable to take advantage of game specific traits out-of-the-box.

Chess has long been used as a “model system” to investigate intelligence and reasoning as the game requires strategy and predicting multiple outcomes. In an article also published in Science, former world chess champion Garry Kasparov said, “I admit that I was pleased to see that AlphaZero had a dynamic, open style like my own… Programs usually reflect priorities and prejudices of programmers, but because AlphaZero programs itself, I would say that its style reflects the truth.”

DeepMind’s success in using AlphaZero to learn and excel at multiple games demonstrates an important step toward a generalized solution to problem solving. The next step for these algorithms will be to expand this type of generic reinforcement learning and search algorithm to imperfect information games, like poker, where all of the information may not be available or trustworthy when an action is selected.

For more details on the capabilities of AlphaZero, visit the DeepMind blog.