Amazon, Apple Research How to Improve Digital Assistants

Apple researchers investigated what people really want in a digital assistant, finding that people deem an assistant “likeable” and “trustworthy” when it mirrored their own degree of chattiness. It also found that the features that make up mirroring can be extracted from the user’s speech patterns. Amazon researchers conducted a project that found Alexa can figure out what a user wants via so-called dialogue state tracking, in which it estimates and keeps tabs on a person’s goals throughout a conversation.

VentureBeat reports that Apple’s scientists determined that, “long-term reliance on digital assistants requires a sense of trust in the assistant and its abilities … [and that] an effective method for enhancing trust in digital assistants is for the assistant to mirror the conversational style of a user’s query, specifically the degree of ‘chattiness’,” which was defined as “the degree to which a query is concise (high information density) versus talkative (low information density).”

The results of the study “could lay the groundwork for an improved Siri.”

The study relied on 20 participants who “filled out a pre-study survey describing how they used digital assistants, including the frequency of their usage and the types of questions they typically asked them.” Then they made verbal requests of a wall-mounted TV displaying instructions “orchestrated by a human experimenter” and classified the responses.

In the second round, the participants were “rated on their chattiness while their speech and facial expressions were captured by a microphone, camera, and depth sensor.” The first study showed that “people who identified as chatty (60 percent) preferred the chatty interactions, while those identified as non-chatty (40 percent) preferred the non-chatty interactions.”

The researchers then built “multi-speaker and speaker-independent classifiers [based on 95 acoustic features] capable of classifying verbal commands as chatty or non-chatty, and of determining whether chatty versus non-chatty response would be preferred.” They determined that the “person’s degree of chattiness can be detected reliably.”

Elsewhere, VB reports that Amazon’s Alexa can better slot names (such as hotel prices or star rating), values or entities in a dialogue by “combining conversation history with the most recent command.”

Amazon R&D scientists, led by Alexa AI group applied scientist Shuyang Gao, proposed “an AI system that formulates dialogue state tracking as a classic question-answering problem.” That means that Amazon’s machine learning “decides on the slot value for each slot name after reading a conversational passage.”

The researchers revealed that the technique, comprised of three models, “yielded a 6.5 percent improvement in slot tracking accuracy over the previous state of the art in qualitative tests and that it had an accuracy of up to 96 percent per slot on a data set of development data.”

“Historically, research on dialogue state tracking has focused on methods that estimate distributions over all the possible values for a given slot,” wrote Gao. “But modern task-oriented dialogue systems present problems of scale. Machine reading comprehension is an active research area that has made a lot of great process in recent years. By connecting it with dialogue state tracking, we can leverage reading comprehension-based approaches and develop robust new models for task-oriented dialogue systems.”