Facebook’s AI Technique Deletes 32 Billion Fake Accounts

Facebook has been under fire for abuse on its platform, although chief executive Mark Zuckerberg often said that its AI tools have been successful at diminishing such problems. It turns out that he’s right: Facebook’s recent Community Standards Enforcement Report revealed that it removed 32+ billion fake accounts between April and September, compared to “just over 1.5 billion” during the same period last year. Largely responsible for the improvement is deep entity classification (DEC), a machine learning framework.

VentureBeat reports that, “DEC is responsible for a 20 percent reduction in abusive accounts on the platform in the two years since it was deployed, which concretely amounts to ‘hundreds of millions’ of accounts.”

Facebook software engineer Sara Khodeir explained that, “DEC excels in challenge cases … [and] was created to address problems Facebook encountered in its traditional approaches to automated fake account detection.” Before DEC, the team “would identify a set of features — such as an account’s age, number of friends, and location — and label each as ‘abusive’ or ‘benign,’ data which they’d use to train an account classifier model.” But because features were written by hand, “the feature space was relatively small, making it easier for attackers to suss out … [and begin] gaming specific features.”

DEC differs in that it aggregates “properties of behavioral features for other, related accounts in a social graph … resulting in over 20,000 features for every account as opposed to merely dozens or hundreds.” DEC also relies on a “multi-stage, multi-task learning technique using large amounts of low-precision, automatically generated labels in tandem with small amounts of high-precision human-provided labels, cutting down on the annotation work required prior to training.”

The system “first considers an account’s direct features … after the features are extracted, aggregation is applied both numerically (e.g., the mean number of groups of friends) and categorically (e.g., the percentage of the most common category) before the results of both first-order and second-order fan-out entities are aggregated together.”

Facebook validated it via three different models: one that took in only direct features, a DEC-based with tens of thousands of features and “a more sophisticated DEC with an even larger corpus.” The basic model “couldn’t predict fake accounts with greater than 95 percent accuracy, both DEC-based models surpassed this and identified a greater number of fake accounts.”

In addition to DEC, Facebook also uses “a language-agnostic AI model trained on 93 languages across 30 dialect families … in tandem with other classifiers to tackle multiple language problems at once.” With regard to videos, Facebook’s “salient sampler model — which quickly scans through the video and processes ‘important’ parts of uploaded clips — enables it to recognize more than 10,000 different actions in 65 million videos.”

Facebook is moving towards self-supervised learning, an AI training technique that uses unlabeled data “in conjunction with small amounts of labeled data to produce an improvement in learning accuracy.”