Facebook Said to Inflate AI Takedown Rates for Hate Speech

Although Facebook leadership has suggested that artificial intelligence will solve the company’s challenge to keep hate speech and violent content at bay, AI may not be a thoroughly effective near-term solution. That evaluation comes as part of a new examination of internal Facebook documents that allegedly indicate the social media company removes only a small percentage — quantified as low-single-digits — of posts deemed to violate its hate-speech rules. Algorithmic uncertainty as to whether content violates the rules results only in that it is fed to users less frequently, rather than flagged for further scrutiny.

Documents reviewed by the The Wall Street Journal following the explosive reporting that led to whistleblower Frances Haugen’s Senate testimony October 5 reveal that two years ago Facebook cut time for human reviewers to analyze hate-speech complaints and leaned more heavily on AI enforcement.

Facebook allegedly then “inflated the apparent success of the technology in its public statistics,” WSJ reports. This despite the fact that “Facebook’s AI can’t consistently identify first-person shooting videos, racist rants and even, in one notable episode that puzzled internal researchers for weeks, the difference between cockfighting and car crashes.”

The article quotes a senior engineer and research scientist’s mid-2019 assessment that the company was far from a goal of being able to use AI to reliably control hate speech, estimating the automated technology “removed posts that generated just 2 percent of the views of hate speech on the platform that violated its rules,” says WSJ. “The problem is that we do not and possibly never will have a model that captures even a majority of integrity harms, particularly in sensitive areas,” the article quotes the engineer as saying.

“Recent estimates suggest that unless there is a major change in strategy, it will be very difficult to improve this beyond 10-20 percent in the short-medium term,” he wrote.

This March, a different team of Facebook staffers updated the conclusions, estimating AI was removing posts that triggered 3 to 5 percent of hate speech views, and 0.6 percent of all content that transgressed Facebook’s rules against violence and incitement.

Facebook spokesman Andy Stone said that the percentages referred to posts that were removed by AI but didn’t include other types of removal action, including ranking flagged posts lower in news feeds. By that measure, the incidence of non-compliant content “has been shrinking,” WSJ writes of “what the company considers its most important enforcement metric.”

But top Facebook executives, including CEO Mark Zuckerberg, professed AI as an immediate solution for minimizing “the vast majority of problematic content,” says WSJ, noting “the company often says that nearly all of the hate speech it takes down was discovered by AI before it was reported by users,” a figure characterized as its “proactive detection rate,” which the company pegged at “nearly 98 percent as of earlier this year.”

Civil rights groups, media critics and academics have, however, expressed skepticism that the AI detection rate shows that degree of progress, sometimes offering their own contrasting studies. “They won’t ever show their work,” Color of Change president Rashad Robinson said. “We ask … how did you get that number? And then it’s like crickets.”