Microsoft Details How to Shoot for its HoloLens AR Headset

Microsoft’s HoloLens augmented reality headset allows video — which can be streamed over the Internet — to be viewed from any angle, combining the real world with computer-generated imagery. Whereas a digital object can be rendered in 3D and easily shown from any angle, live action isn’t so accommodating. To that end, the Silicon Valley company just came out with a document giving specific directions on how to capture and handle live action footage for use with its AR headset.

Live action footage shot for AR not only needs to be captured from every angle but also rotoscoped, or extracted, from the background to be merged convincingly with the viewer’s environment. Road to VR quotes Microsoft as saying that its research paper on the process is the “first end-to-end solution to create high-quality free-viewpoint video encoded as a compact data stream.”


The paper, which was published in the ACM Transactions on Graphics journal, Volume 34 Issue 4, describes the process in the technical language reserved for the members of the Association for Computing Machinery.

Translated into plain English, the paper reports that production is similar to motion capture, which utilizes the motion of a human actor to animate a computer generated object, Microsoft’s system both captures the performance and generates a CG model at the same time. Similar to motion capture, the Microsoft system uses a very large array of cameras — 106 RGB and infrared — to cover all angles in the performance space and capture “2.7 million points in a 3D point cloud.”

Those points are then arranged into a solid mesh composed of more than 1 million polygons per frame, which is then compressed, or reduced, to a more manageable size. This technique, however, allows the animators to keep more detail in areas of interest such as hands and faces, while much more dramatically reducing density in less interesting areas.

The results are then encoded into an MPEG file for streaming. Road to VR notes that the impressive results makes the system potentially applicable for VR capture.