February 14, 2019
In a Wednesday morning session at the HPA Tech Retreat in Palm Desert, Netflix’s Rohit Puri, engineering manager of the Cloud Media Systems team took attendees on a tour of the Netflix Media Database. The Netflix service experience, he explained, is made up of a seamless user interface, personalized content recommendation, efficient media streaming and curated content catalog. Other assets that “go a long way in helping users find content,” added Puri, include promotional artwork and video.
An increasing volume of content, he said, highlights the need to “synthesize systems at high volume in a timely fashion.” Puri also showed off the problems that Netflix faces with regard to ensuring high quality images, including a classic movie where a lighting fixture is visible at the edge of the game, and another in which subtitles overlay a credit, making the text unreadable.
“Modular analysis and persistence promotes data re-use is a solution,” he said, noting that a text-on-text detection application can be used for artwork.
Puri defined the Netflix Media Database as “a multi-tenant data store for dynamic temporarily and spatially varying metadata for Netflix media assets.” The database supports “audio, video, text, images, and also answers spatio-temporal queries on the media timeline.”
User features include “authentication and authorization; content vault (e.g., firewalls for pre-release content); Datastore-based access control; and high read throughput.” “Every Datastore has a schema,” said Puri. “The developer defines the schema, which is a schema-on-write system. Schema updates can be applied on the fly; are capable of representing spatial (3D) and temporal metadata; and integer units for temporal and spatial dimensions enable full precision.”
Puri also showed an architectural view of the system, whereby the Netflix Media Database is depicted as implementing its features via a suite of micro-services, each of which has a specific responsibility. Those micro-services include validation, persistence, indexing and query, all of which can be orchestrated through a temporal plate.