Google Debuts Deep Planning Network Agent with DeepMind

Google unveiled the Deep Planning Network (PlaNet) agent, created in collaboration with DeepMind, to provide reinforcement learning via images. Reinforcement learning uses rewards to improve AI agents’ decision-making. Whereas model-free techniques work by getting agents to predict actions from observations, agents created with model-based reinforcement learning come up with a general model of the environment leveraged for decision-making. In unfamiliar surroundings, however, agents must create rules from experience.

VentureBeat reports that PlaNet is “able to solve a variety of image-based tasks with up to 5,000 percent the data efficiency … maintaining competitiveness with advanced model-free agents.” According to Danijar Hafner, who co-authored the academic paper on PlaNet’s architecture, “PlaNet works by learning dynamics models given image inputs, and plans with those models to gather new experience.”

More specifically, PlaNet “leverages a latent dynamics model — a model that predicts the latent state forward, and which produces an image and reward at each step from the corresponding latent state — to gain an understanding of abstract representations such as the velocities of objects.”

Predictive image generation is the means whereby the PlaNet agent learns, a quick process that, “in the compact latent state space … only needs to project future rewards, not images, to evaluate an action sequence.” Hafner noted that, rather than a policy network, PlaNet “chooses actions based on planning.”

“For example,” he said, “the agent can imagine how the position of a ball and its distance to the goal will change for certain actions, without having to visualize the scenario. This allows us to compare 10,000 imagined action sequences with a large batch size every time the agent chooses an action. We then execute the first action of the best sequence found and replan at the next step.”

Google reported that in one test, in which PlaNet performed “six continuous control tasks,” it “outperformed (or came close to outperforming) model-free methods like A3C and D4PG on image-based tasks.” When placed in random environments without knowing the task, PlaNet “managed to learn all six tasks without modification in as little as 2,000 attempts … [whereas] previous agents that don’t learn a model of the environment sometimes require 50 times as many attempts to reach comparable performance.”

Hafner and his fellow co-authors “believe that scaling up the processing power could produce an even more robust model.” “We advocate for further research that focuses on learning accurate dynamics models on tasks of even higher difficulty, such as 3D environments and real-world robotics tasks,” they said. “We are excited about the possibilities that model-based reinforcement learning opens up, including multi-task learning, hierarchical planning and active exploration using uncertainty estimates.”

For more information, visit the Google AI Blog.