March 12, 2019
Facebook AI Research, the Lorraine Research Laboratory in Computer Science and its Applications (LORIA), and University College London recently conducted a study to determine if AI can navigate a fantasy text-based game, dubbed “LIGHT.” To examine the AI agents’ comprehension of the virtual world, the study investigated the so-called grounding dialogue, comprised of mutual knowledge, beliefs and assumptions allowing communication between two people. The large-scale, crowdsourced “LIGHT” environment allows AI and humans to interact.
VentureBeat reports that the study, published by the preprint server arXiv.org, describes how “[t]he current state of the art uses only the statistical regularities of language data, without explicit understanding of the world that the language describes.”
“[O]ur framework allows learning from both actions and dialogue, [and our] hope is that ‘LIGHT’ can be fun for humans to interact with, enabling future engagement with our models,” the paper added. Human annotators created “all utterances,” meaning it would be natural language, with “ambiguity and coreference, making it a challenging platform for grounded learning of language and actions.”
After human annotators created backstories, location names, character categories and a list of characters and sets of their belongings, “the researchers then separately crowdsourced objects and accompanying descriptions, as well as a range of actions.” “LIGHT” currently has natural language descriptions of “663 locations based on a set of regions and biomes (like “countryside,” “forest,” and “graveyard”) all told, along with 3,462 objects and 1,755 characters,” as well as a dataset of “character-driven” interactions.
Researchers recorded 10,777 episodes of two human-controlled characters in a random location who took turns performing one action and saying one thing.
The researchers then created an AI model, using Facebook’s PyTorch machine learning framework in ParlAI, to “produce separate representations for each sentence from the grounding information.” Using Google’s Bidirectional Encoder Representations from Transformers (BERT) natural language processing technique, they next built a fast bi-ranker model and a slower cross-ranker model.
Another set of AI models was used to “encode context features (such as dialogue, persona, and setting) and generate actions.” The results showed that the AI players “had a knack for leaning on past dialogue and for adjusting their predictions in light of the game world’s changing state, and dialogue grounding on local environments’ details.” But “none of the models bested humans in terms of performance,” although adding more grounding information did improve the models’ performance.
The study also showed that “AI demonstrated the ability to produce outputs appropriate for a given setting even when the dialogue and characters didn’t change, suggesting that they’d gained the ability to contextualize.”