August 13, 2015
The New York Times is in the process of tweaking its recommendation engine by integrating two previously used models. The Recommended for You section of NYT provides suggested content from over 300 articles, blog posts and interactive stories that are published every day. By personalizing the content that appears on apps and the website, readers are directed to stories that have the greatest interest and relevancy to them. NYT described its efforts to rebuild the engine for maximum efficiency and accuracy.
The algorithm behind the recommendation engine must, first of all, perform well with breaking news, which is defined by the topics, author, desk and associated keyword tags of each article. For that reason, NYT’s first recommendation engine used those keywords as the basis of its recommendation, combined with a user’s 30-day reading history.
What NYT discovered, however, was that this method had unintended consequences, “because the algorithm weights tags by their rareness within a corpus, rare tags have a large effect,” occasionally leading the reader to articles not of interest.
The article gave the example of a reader interested in same sex coverage being led to wedding coverage about heterosexual couples because a low-frequency tag, “Weddings and Engagements” outweighed all others.
This problem led NYT to collaborative filtering, which looks at what similar readers have read, as determined by reading history. This method, however, had its own shortcoming: it failed to recommend just-published articles and articles relevant to groups of readers but not yet read by any one reader in the group.
“It turns out,” says NYT, “that straddling both techniques can give us the best of both worlds.” The newly devised algorithm, based on Collaborative Topic Modeling (CTM), begins by “modeling each article as a mixture of the topics it addresses,” adjusts the model by “viewing signals from readers, and then makes recommendations by connecting similarities between preference and content.
“Our system is now a successful, large-scale implementation of cutting-edge research in collaborative topic modeling, and it provides significant performance increases when compared with previous algorithms used to make recommendations,” says NYT.