SuperGLUE Is Benchmark For Language-Understanding AI

Researchers recently introduced a series of rigorous benchmark tasks that measure the performance of sophisticated language-understanding AI. Facebook AI Research with Google’s DeepMind, University of Washington and New York University introduced SuperGLUE last week, based on the idea that deep learning models for today’s conversational AI require greater challenges. SuperGLUE, which uses Google’s BERT representational model as a performance baseline, follows the 2018 introduction of GLUE (General Language Understanding Evaluation), and encourages the creation of models that can understand more nuanced, complex language.

“Considered state of the art in many regards in 2018, BERT’s performance has been surpassed by a number of models this year such as Microsoft’s MT-DNN, Google’s XLNet, and Facebook’s RoBERTa, all of which were are based in part on BERT and achieve performance above a human baseline average,” reports VentureBeat. 

SuperGLUE includes a gender bias detection tool and features eight tasks designed to “test a system’s ability to follow reason, recognize cause and effect, or answer yes or no questions after reading a short passage,” notes VB.

“SuperGLUE comprises new ways to test creative approaches on a range of difficult NLP tasks focused on innovations in a number of core areas of machine learning, including sample-efficient, transfer, multitask, and self-supervised learning,” explains researchers in a Facebook AI blog post. “To challenge researchers, we selected tasks that have varied formats, have more nuanced questions, have yet to be solved using state-of-the-art methods, and are easily solvable by people.” 

“By releasing new standards for measuring progress, introducing new methods for semi-supervised and self-supervised learning, and training over ever-larger scales of data, we hope to inspire the next generation of innovation,” notes the researchers. “By challenging one another to go further, the NLP research community will continue to build stronger language processing systems.”