DeepMind Intros Intriguing Deep Neural Network Algorithm

A research team at Google’s AI unit DeepMind, led by Ali Eslami and Danilo Rezende, has created software via a generative query network (GQN) to create a new perspective of a scene that the neural network has never seen. The U.K.-based unit developed the deep neural network-based software that applies the network to a handful of shots of a virtual scene to create a “compact mathematical representation” of the scene — and then uses that representation to render an image with a new perspective unfamiliar to the network.

Ars Technica reports that, “the researchers didn’t hard-code any prior knowledge about the kind of environments they would be rendering into the GQN.” Instead, the team started with a blank slate that was able to “effectively discover these rules by looking at images.”


“One of the most surprising results [was] when we saw it could do things like perspective and occlusion and lighting and shadows,” said Eslami. GQN is made up of two different, connected deep neural networks: “On the left, the representation network takes in a collection of images representing a scene (together with data about the camera location for each image) and condenses these images down to a compact mathematical representation.”

Then, the generation network reverses the process: “starting with the vector representing the scene, accepting a camera location as input, and generating an image representing how the scene would look like from that angle.” With camera location data, the network can easily reproduce the original image. But what makes this different is that, “this network can also be provided with other camera positions — positions for which the network has never seen a corresponding image.”

To get there, “the team used the standard machine learning technique of stochastic gradient descent to iteratively improve the two networks.” A conventional neural network “uses externally supplied labels to judge whether the output is correct or not,” but in this case, the GQN algorithm “uses scene images both as inputs to the representation network and as a way to judge whether the output of the generation network is correct.”

Ars Technica notes that, “if these techniques can be generalized to real-world objects — and it seems likely someone will figure out how to do so — the possibilities are tantalizing,” adding that autonomous vehicles are one “obvious application.”