February 2, 2017
Twelve-year old audio recognition company SoundHound just raised $75 million to build its speech recognition AI-based platform, Houndify, betting that voice will become the dominant form of interaction with Internet-connected devices. Although large companies like Apple, Baidu and Microsoft dominate the space, SoundHound has built its own AI technology to identify audio. In contrast to these other companies, SoundHound also plans to offer its voice AI tools to other device manufacturers.
Bloomberg reports the investors include Samsung’s Catalyst Fund and Nvidia, both of which already work with SoundHound, as well as Nomura Group; Kleiner, Perkins, Caufield & Byers; and the SharesPost 100 Fund. What makes the Santa Clara, California-based company stand apart is that it will not “own the users or the data” of developers that build apps or devices using its technology.
“We don’t have an agenda to hijack your product,” said SoundHound chief executive Keyvan Mohajer. “If you use Amazon, you lose your brand, your users. You have to ask your user to log into their Amazon account, they have to call on Alexa, and all the data belongs to them.”
SoundHound’s technology has already been integrated with Samsung hardware and Nvidia’s auto infotainment system. Mohajer said he will work on projects with the new investors, but was not more specific. The challenge for SoundHound and any other company working towards reliable voice recognition is creating software that can tell the difference between “pizza” and “Pisa,” something that requires “contextual knowledge of the differences.”
SoundHound dubs its speech recognition capabilities as “Collective AI,” because it can simultaneous tap into data sources from all the different companies using it. Having data resources in travel, weather, and so on gives that context, meaning that “developers using SoundHound’s software to build voice-enabled products are better at understanding what the user is saying,” according to Mohajer, who said his company also takes an approach to AI whereby the technology identifies words in AI and then works on deciphering context.
Carnegie Mellon research professor Alexander Rudnicky said the approach is likely but not definitively “incremental recognition, where the software doesn’t wait until the user stops talking to try and interpret the words.”
“If you want to give people something as fast as possible, then the right idea is to do this incremental approach,” he said, also noting that “optimizing for speed … may result in a speech and language interpretation system that struggles with certain use cases.”
SoundHound plans to use its new cash infusion to add more customers and expand operations to Asia and Europe. PitchBook estimates the company’s value at “around $800 million.”