OpenAI and EleutherAI Foster Open-Source Text Generators

OpenAI’s GPT-3, the much-noted AI text generator, is now being used in 300+ apps by “tens of thousands” of developers and generating 4.5 billion words per day. Meanwhile, a collective of researchers, EleutherAI is building transformer-based language models with plans to offer an open source, GPT-3-sized model to the public for free. The non-profit OpenAI has an exclusivity deal with Microsoft that gives the tech giant unique access to GPT-3’s underlying code. But OpenAI has made access to its general API available to all comers, who then build services on top of it.

The Verge reports that, “as OpenAI is keen to advertise, hundreds of companies are now doing exactly this.” That includes Viable, which is using it to analyze customer feedback to find “themes, emotions, and sentiment from surveys, help desk tickets, live chat logs, reviews, and more;” Fable Studio, which creates dialogue for VR experiences; and Algolia, which is using GPT-3 to “improve its web search products which it, in turn, sells on to other customers.”

Although, says The Verge, “using GPT-3 to create a startup is ludicrously simple,” it notes that it will equally be “ludicrously simple for your competitors.” “No firm stands to gain as much as from the use of the technology as OpenAI itself,” it concludes.

Text-generating systems also demonstrate “the capacity to absorb and amplify harmful biases … [and are] also often astoundingly dumb.” Although the problems “aren’t insurmountable,” The Verge says, “they’re certainly worth flagging in a world where algorithms are already creating mistaken arrests, unfair school grades, and biased medical bills.”

Wired reports that GPT-3 has a rival with EleutherAI. Although, “Eleuther is still some way from matching the full capabilities of GPT-3 … last week the researchers released a new version of their model, called GPT-Neo, which is about as powerful as the least sophisticated version of GPT-3.”

At Cornell University, computer science professor Alexander Rush said that, given the “tremendous excitement right now for open-source NLP … there is something akin to an NLP space race going on.” Rush points to Eleuther as “one of the most impressive of a growing number of open source efforts in NLP.”

Eleuther has “powerful algorithms modeled after GPT-3,” and the team has “curated and released a high-quality text data set known as the Pile for training NLP algorithms.” University of Massachusetts computer science professor Mohit Iyyer is using “data and models from Eleuther to mine literary criticism for insights on famous texts, among other projects,” and said that, “we are definitely thankful that they aggregated all this data into one resource.”

Open source AI takes a lot of computing power; “training GPT-3 required the equivalent of several million dollars worth of cloud computing resources … [and] OpenAI recently said the computer power required for cutting edge AI projects had increased about 300,000 times between 2012 and 2018.”

Eleuther is using “distributed computing resources, donated by cloud company CoreWeave as well as Google, through the TensorFlow Research Cloud, an initiative that makes spare computer power available.” The Eleuther team also found a way to “split AI computations across multiple machines.”