SMPTE: Industry Leaders Gather to Discuss the Future of AI

SMPTE kicked off its 2017 Annual Technical Conference & Exhibition on Monday with an all-day symposium on artificial intelligence and its emerging role in entertainment production and distribution. Among the day’s presentations, SMPTE’s Richard Welsh presented a compelling primer on AI, Google’s Jeff Kember discussed the differences between supervised and unsupervised systems, Hitachi Vantara’s Jay Yogeshwar addressed using machine learning and AI for production workflow, Yvonne Thomas of Arvato Systems looked at the value of effective data analytics, Greg Taieb of Deluxe addressed language localization for multilingual distribution, and Aspera co-founder Michelle Munson examined next generation network design.

Richard Welsh, SMPTE’s VP of education, began the day with an excellent 30-minute primer on AI. The term AI is used to describe three very different levels of computational capabilities.

Machine learning (ML) involves algorithms that are trained to do a specific task. Deep learning (DL) involves regenerative algorithms; they can learn through unsupervised feedback loops, but are cognitively constrained to one area of expertise. General artificial intelligence (GAI) involves free learning networks.

SMPTE_Logo

GAI is what most people think of when they think of AI, but GAI does not yet exist in the commercial arena. Most of what people and marketing calls AI is really a machine learning implementation.

There are two forms of machine learning algorithms, regressive algorithms (decision trees) and classification algorithms (index and compare). Welsh cautioned the crowd not to anthropomorphize the technology. AI does not mimic the human mind. It is “necessarily incomprehensible” he said, which can lead it to come up with unexpected yet valid responses.

Jeff Kember from Google’s office of the CTO discussed the difference between supervised systems where you have a specific output in mind, and unsupervised systems in which reinforcement learning takes place and both the algorithm and the output are modified.

He described how machine learning will enhance the end-to-end production workflow, from adding content and technical metadata to camera raw through supporting multi-channel day-and-date distribution and long-term archiving.

Dr. Jay Yogeshwar, Hitachi Vantara’s director of worldwide media, broadcast and entertainment, explained how content intelligence would enable operational efficiencies throughout the workflow. Hitachi is involved in using machine learning and AI to optimize video compression, nonlinear editing and media asset management systems, better recommendation engines, and OTT contextual ad insertion.

Yvonne Thomas, product manager for Arvato Systems, explained how data analytics become both more valuable and more complex as you move down the descriptive, diagnostic, predictive, and proscriptive analytics chain.

She described how you train a machine learning algorithm this way. Once you input your starting data set, the algorithm outputs a decision, the system accepts feedback on that decision by comparing it to the established set of rules and goals, and the algorithm’s parameters are adjusted prior to the next round of input. She stressed that “using a media analytics service requires some sort of social responsibility.”

Greg Taieb, senior director of localization product development at Deluxe Entertainment Services explained that language localization for multilingual distribution of content requires four things: 1) adaptation, 2) context and tone, 3) chucking (e.g. pauses or natural text breaks), and 4) timing and specification-based restrictions.

He calls the language localization process “transcreation” because you are trying to recreate the sense of the dialog as you translate, taking culture, slang, and other elements into account.

He sees AI starting to play a role in all of this, especially with the rapid development of resources like Google Translate. He also sees a new job title developing, post editor, responsible for QC-ing the machine translation.

After lunch, Aspera co-founder Michelle Munson offered a deep dive into next generation network design. Machine learning applied to network design can dramatically reduce the network footprint (hardware requirements), optimize bandwidth, and more efficiently adapt to changes in network load in real time.

Martin Wahl, principal program manager for Microsoft’s Azure Media Services group described the company’s Cognitive Services bundle. It includes visual language, speech knowledge, and most interestingly video indexing on a frame-by-frame level.

In response to a question from a Library of Congress archivist, Wahl noted that they use their own metadata nomenclature, so any merging into a standardized database would require transcreation processing.

Konstantin Wilms, principal solutions architect at Amazon Web Services, said that one of AWS’s clients is already offering personalized linear content streaming experiences. CNN is using AI to finish rendering animations.

Videos of the SMPTE presentations should be available free to SMPTE members on www.smpte.org in a few weeks.