April 11, 2019
Overcoming the uncanny valley of emotion is the major challenge of AI when creating a virtual human, according to Armando Kirwin, co-founder of Artie. He spoke at the NAB panel titled “AI in Media and Entertainment: Driving the Future, New Content Formats – Immersive.” HP’s Joanna Popper moderated the panel that also included Digital Domain’s John Canning, Lillian Diaz-Przybyl from Butcher Bird Studios, and Baobab Studios’ Kane Lee. The panel discussion ranged from synthetic characters and evolving views on acceptable versus realistic behavior, to what happens when your smart speaker becomes a virtual character.
Kirwin said that a great deal of research has already been done related to gaze and gaze awareness, pupil dilation, natural language processing and natural response, micro-emotions, and other human behaviors and emotion-indicators. He and others are now working on putting the pieces together to create credible behavior in synthetic beings; both human and animal.
When it comes to simulating emotions, Kirwin noted that AI implementations have major holes to fill. For example, most AI is not trained to recognize laughter, so it tries to understand it as words, which he describes as delivering a very dystopic response.
Digital Domain’s Canning discussed using AI to create the Thanos character for Avengers: Infinity War. The workflow still relied on Josh Brolin to bring human emotion to the CGI character’s performance. AI systems can fool people, but we are years away from the specific duplication of Josh Brolin, Canning said.
Kirwin responded that last month all 10 of the top Billboard music videos in Japan were performed by synthetic characters. As we blend the real and virtual worlds, there is growing acceptance of synthetic or virtual characters like Lil Miquela on Instagram as talent and influencers, especially among young people. Instagram is already a network of fake people, said Kirwin, so Lil Miquela is acceptable there. Diaz-Przybyl called this “acceptable artificiality.”
Thanos can pull you through the narrative, said Kirwin. He can hide the shortcomings of AI. How soon before you don’t need Josh, Armando asked? Kane Lee added that “animators are our actors” at Baobab.
Diaz-Przybyl is much less interested in photo-realistic characters than animated ones. Animation lets you do things beyond human capabilities in ways that the audience can accept. It is a more interesting direction, she said.
Lee mentioned that they have incorporated AI research into their character performances. He referenced a Stanford study that found that a person’s emotional connection to a synthetic character greatly increases if the character mirrors the person’s head and body movements with a 3-5 second delay. Kirwin referred to other research that found that placing eyes on a tip jar in a coffee shop significantly increased the number of tips left.
Diaz-Przybyl would also like an AI that can speed up the creative process of branching narratives. She is interested in what kind of story the computer tells, but that doesn’t replace what humans create. We need the surprise in the narrative, she said, and computers are not there yet.
Canning raised the issue of the impact of infinitely branching narrative on production cost. Kane added that it would be nice to have choices at key points at the narrative, but that everyone should get the same basic experience in order for it to be scalable.
The conversation turned to the role of data. Diaz-Przybyl mentioned market research that found, counter-intuitively, that people who watch a lot of YouTube videos are more likely to go to movies in theaters than people who don’t. “It will be an interesting journey for the creators to see the data,” Kirwin said. It will change the role of writers.
Earlier in the day, at a panel on extreme sports video production and distribution, extreme biking champion and video producer Mike Steidley said that he re-edits his 2 minute YouTube videos when he sees audience drop-off at 1 minute 45 seconds, and he front-loads the action in his Facebook edit of the same material because the first 6 seconds are key to hold the Facebook audience. It may be that, for AI-driven content, the writer will have an ongoing relationship with the content as the distribution platforms evolve and data analytics are incorporated into the creative process.
Finally, the panel turned the discussion to smart speakers and personal assistants. Canning pointed out that it is one thing for the personal assistant to be a disembodied voice, but quite another for it to have a face. He asked: How will that change our relationship to Alexa?
Kirwin responded that, in a study he conducted with Google, they found that people spend 5 minutes with an avatar assistant versus 5 seconds with a speaker-based assistant. Moving your personal assistant from a smart speaker to a synthetic character will apparently have a significant impact on your relationship with it.