August 21, 2014
Twitter unveiled its new BotMaker system this week, designed to address its growing spam problem. The machine learning models and other techniques traditionally used to classify messages as spam do not always work with the real-time nature of Twitter, so the company developed BotMaker, which scans messages as part of bulk data analyses. According to Twitter, the system has resulted in a 40 percent reduction in spam since it was rolled out and now handles billions of events each day.
“Spam on Twitter is different from traditional spam primarily because of two aspects of our platform: Twitter exposes developer APIs to make it easy to interact with the platform and real-time content is fundamental to our user’s experience,” engineer Raghav Jeyaraman wrote yesterday on the Twitter Engineering Blog.
Jeyaraman explained, “that like many other systems in place among Web companies, Twitter included, the trick to BotMaker is breaking it down into real-time, near-real-time and batch jobs,” reports GigaOM.
“Essentially, a tool called Scarecrow tries to stop spam messages before they’re written to Twitter, by spotting problem account names or URLs, for example. Next, a tool called Sniper is constantly scouring written messages looking for things Scarecrow missed, possibly because it didn’t have enough time to analyze certain features. Finally, batch jobs periodically analyze large amounts of offline data in order to uncover long-term behavior patterns that can help make the online models smarter.”
In addition to the reduction in the amount of spam detected, the company claims that reaction time to spam attacks has dramatically improved.
“BotMaker is already being used in production at Twitter as our main spam-fighting engine,” concluded Jeyaraman. “Because of the success we have had handling the massive load of events, and the ease of writing new rules that hit production systems immediately, other groups at Twitter have started using BotMaker for non-spam purposes. BotMaker acts as a fundamental interposition layer in our distributed system. Moving forward, the principles learned from BotMaker can help guide the design and implementation of systems responsible for managing, maintaining and protecting the distributed systems of today and the future.”