OpenAI: GPT-4 Can Help with Content Moderation Workload

OpenAI has shared instructions for training to handle content moderation at scale. Some customers are already using the process, which OpenAI says can reduce time for fine-tuning content moderation policies from weeks or months to mere hours. The company proposes its customization technique can also save money by having GPT-4 do the work of tens of thousands of human moderators. Properly trained, GPT-4 could perform moderation tasks more consistently in that it would be free of human bias, OpenAI says. While AI can incorporate biases from training data, technologists view AI bias as more correctable than human predisposition.

If OpenAI’s proposed use case works as it claims, one of “the most toxic and mentally taxing tasks in tech could be outsourced to machines,” reports The Verge. “This is a really good step forward in how we use AI to solve real world issues in a way that’s beneficial to society,” The Verge quotes OpenAI Head of Safety Systems Lilian Weng telling Semafor.

In the blog post “Using GPT-4 for Content Moderation,” Weng and two others write that there are three major benefits to AI over humans when it comes to content moderation. Foremost, machines are consistent in their judgments while people interpret things differently.

Moderation policies for humans are memorialized as policies that “can be as long as a book and change constantly,” The Verge writes. OpenAI says that while it takes humans time and training to adapt to new policies, large language models can absorb and implement new policies instantly.

Secondly, OpenAI claims GPT-4 can codify content policies within hours, short-cutting the process of drafting, labeling, seeking feedback and fine-tuning. Finally, “OpenAI mentions the well-being of the workers who are continually exposed to harmful content, such as videos of child abuse or torture,” The Verge writes.

As detailed in the blog post, OpenAI says its iterative training roadmap can yield refined content moderation policies at scale if:

  • Once a policy guideline is written, policy experts can create a golden set of data by identifying a small number of examples and assigning them labels according to the policy.
  • GPT-4 reads the policy and assigns labels to the same dataset, without seeing the answers.
  • By examining the discrepancies between GPT-4’s judgments and those of a human, the policy experts can ask GPT-4 to come up with reasoning behind its labels, analyze the ambiguity in policy definitions, resolve confusion and provide further clarification.

Steps 2 and 3 are repeated until satisfied with the policy quality.

TechCrunch notes that “AI-powered moderation tools are nothing new,” citing “Google’s Counter Abuse Technology Team and the tech giant’s Jigsaw division, launched in general availability several years ago” as well as startups including “Spectrum Labs, Cinder, Hive and Oterlu, which Reddit recently acquired” as offering such services.

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.