Google, Nvidia Train Neural Networks to Post-Process Video

Google researchers have created a machine learning system that adds color to black & white videos, and can also choose which specific objects, people and pets receive the color treatment. The technology is based on what’s called a convolutional neural network, which is architecturally suited for object tracking and video stabilization. Meanwhile, Nvidia has debuted an algorithm that slows down video, without the jitters, after it’s been captured, by using a neural network to create “in between” frames required for smooth motion.

VentureBeat reports that Google, in a paper on “Tracking Emerges by Colorizing Videos,” describes how the system learns to follow multiple objects, even when they are occluded. The algorithm was trained by learning to colorize grayscale movies, with the Kinetics dataset, a collection of YouTube videos expressing “a diverse range of human-focused actions.”


“Tracking objects in video is a fundamental problem in computer vision,” wrote lead researcher Carl Vondrick in a blog post. “However, teaching a machine to visually track objects is challenging partly because it requires large, labeled tracking datasets for training, which are impractical to annotate at scale.”

Researchers next “trained the neural network to predict the original colors in subsequent frames, which turned out to be the eureka moment.” “Learning to copy colors from the single reference frame requires the model to learn to internally point to the right region in order to copy the right colors,” said Vondrick. “This forces the model to learn an explicit mechanism that we can use for tracking.”

The algorithm “outperforms several state-of-the-art colorization techniques.” Vondrick explained that, “video colorization provides a signal that can be used for learning to track objects in videos without supervision [and that] … improving the video colorization model can advance progress in self-supervised tracking.”

Elsewhere, VentureBeat reports researchers from Nvidia, the University of Massachusetts Amherst and the University of California Merced teamed to create an algorithm to slow down already-captured video, which will be presented at the 2018 Conference on Computer Vision and Pattern Recognition. The result is “an unsupervised, end-to-end neural network that can generate an arbitrary number of intermediate frames to create smooth slow-motion footage,” which the team has dubbed “variable-length multi-frame interpolation.”


“We’re taking a slow-motion effect and applying it to existing video,” said Jan Kautz, who leads Nvidia’s learning and perception team. “You can slow it down by a factor of eight or 15 — there’s no upper limit.”

Researchers trained the system “with 240 fps videos from YouTube and handheld cameras — including a series of clips from ‘The Slow Mo Guys’ (for a corpus of 11,000 videos total) and used Nvidia Tesla V100 GPUs and a cuDNN-accelerated PyTorch deep learning framework.”

The output doesn’t “exhibit the hallmark jitteriness and blurriness of slow-motion software filters … with the exception of a few jagged edges around the borders of fast-moving objects, it’s tough to tell them apart from footage shot natively at high frame rates.”

No Comments Yet

You can be the first to comment!

Sorry, comments for this entry are closed at this time.