Facebook’s 3D Photos feature — which uses depth data to create images that can be examined from different angles via virtual reality headsets — is now available on any of the latest handsets with a single camera, including Apple iPhone 7 or higher or any midrange (and above) Android phone. According to Facebook, the latest in machine learning techniques has made this feature possible. The company first unveiled 3D Photos in late 2018, when it required either a dual-camera phone or a depth map file on the desktop.
VentureBeat reports that 3D Photos “even works with selfies, paintings, and complex scenes.” “This advance makes 3D photo technology easily accessible for the first time to the many millions of people who use single-lens camera phones or tablets,” the company said in its blog. “It also allows everyone to experience decades-old family photos and other treasured images in a new way, by converting them to 3D.”
It added that 3D Photos are now “viewable by any Facebook user, as well as in VR through the Oculus Browser on Oculus Go or Firefox on the Oculus Rift … [and] they can also be shared through Facebook Stories, where they disappear after 24 hours.”
Among the limits to 3D Photos is that they can’t be edited and, if shared, can’t be grouped with other photos in a single post. They also “can’t be added to an album, and if you’re posting a 3D photo from a Page, you won’t be able to boost it or use it in advertisements.”
Technical obstacles included “training a model that correctly guesses how objects might look from different perspectives and that can run on typical mobile processors in ‘a fraction of a second’.” The solution that the team relied on was a convolutional neural network trained “on millions of pairs of 3D images and their accompanying depth maps.”
The team then “used building blocks inspired by FBNet — a family of models for resource-constrained environments — to optimize the model for mobile devices.”
The team used ChamNet, an algorithm developed by Facebook AI Research that fueled an automated process to “find the optimal architecture configuration.” According to VB, “ChamNet iteratively samples points from a search space to train an accuracy predictor, which accelerates the search for a model that maximizes accuracy while satisfying resource constraints.” Facebook stated that finding the model buttressing 3D Photos “took roughly three days using 800 Nvidia Tesla V100 graphics cards.”
Facebook stated that “it intends to apply these techniques to depth estimation for videos taken with mobile devices … [and] plans to explore leveraging depth estimation, surface normal estimation, and spatial reasoning in real-time apps like augmented reality.” “Beyond these potential new experiences, this work will help us better understand the content of 2D images more generally,” it stated. “Improved understanding of 3D scenes could also help robots navigate and interact with the physical world.”