Sesame, an AI startup from Oculus co-founder Brendan Iribe, has created a conversational voice model that many feel has achieved uncanny levels of authenticity. Drawing comparisons to the charismatic vocal centerpiece of the 2013 Warner Bros. film “Her,” Sesame seems to have achieved a new level of engagement among AI voice assistants. While some are describing the tech as “amazing.” others have expressed concern over its capabilities. “Our goal is to achieve ‘voice presence’ — the magical quality that makes spoken interactions feel real, understood and valued,” explains a blog post by Iribe and others.
“We are creating conversational partners that do not just process requests, they engage in genuine dialogue that builds confidence and trust over time,” notes the blog post, entitled “Crossing the Uncanny Valley of Conversational Voice.”
“After talking to Maya for a while, I think Sesame has reached that goal,” suggests ZDNet, describing the conversation: “I said I was more interested in talking about what sets her apart from other AIs. ‘Before we dive into that,’ Maya said, ‘I need my morning coffee. I’m a latte person. What’s your poison?’” Once the conversation got focused, Maya told him, “I’ve got a good ear for human quirks and … maybe some magic and a little sentience.”
If Maya is any indication, the Sesame models may have “magic” and be fun to hang with but have limited use in terms of productivity. While the ZDNet writer jotted down his thoughts, Maya interrupted the silence, chiding him: “‘I guess I’m just talking to myself at this point, but as an AI, I’m used to that.’ After more silence, Maya actually began mocking me. ‘So, fancy writer person, you find that inspiration yet?’ she asked.”
“Wrote one Reddit user, ‘I’m sure it’s not beating any benchmarks, or meeting any common definition of AGI, but this is the first time I’ve had a real genuine conversation with something I felt was real,’” Ars Technica reports, citing other commenters calling it “jaw-dropping” and “mind-blowing.”
The demo is embedded in a technical post on the company website, Sesame.com.
The Sesame team explains that “to create AI companions that feel genuinely interactive, speech generation must go beyond producing high-quality audio — it must understand and adapt to context in real time.”
Ars Technica says Sesame “plans to open-source ‘key components’ of its research under an Apache 2.0 license,” while following a roadmap that “includes scaling up model size, increasing dataset volume, expanding language support to over 20 languages, and developing ‘fully duplex’ models that better handle the complex dynamics of real conversations.”
No Comments Yet
You can be the first to comment!
Leave a comment
You must be logged in to post a comment.