Privacy Concerns Grow Over Facial Recognition Data Sets

Social networks, dating services, photo websites and surveillance cameras are just some of the sources of a growing number of databases compiling people’s faces. According to privacy advocates, Microsoft and Stanford University are among the many groups gathering images, with one such repository holding two million images. All these photos will be used to allow neural networks to build pattern recognition, in the quest to create cutting edge facial recognition platforms. Some companies have collected images for 10+ years.

The New York Times reports, “tech giants like Facebook and Google have most likely amassed the largest face data sets, which they do not distribute.” But universities and other companies “have widely shared their image troves with researchers, governments and private enterprises in Australia, China, India, Singapore and Switzerland for training artificial intelligence.”

Most people, adds NYT, have no idea their faces are part of a dataset. But questions about such datasets are beginning to take center stage because images “are now being used in potentially invasive ways” and are not subject to oversight. Immigration and Customs Enforcement (ICE) and the FBI have recently been reported as using facial recognition technology. Another face database, built in the U.S. “was shared with a company in China that has been linked to ethnic profiling of the country’s minority Uighur Muslims.”

In response to user concern about privacy, “Microsoft and Stanford removed their face data sets from the Internet … but given that the images were already so well distributed, they are most likely still being used in the United States and elsewhere,” said experts. Microsoft reportedly created the largest facial data sets, dubbed MS Celeb, which houses 10+ million images of 100,000+ people. The data set was distributed internationally until being flagged by privacy advocates.

Stanford researchers began to put together their database in 2014, by using the images of a camera at a cafe. Stanford recently took down the data set but hasn’t commented. Duke University also started a data set in 2014, “using eight cameras on campus to collect images.” Duke computer science professor Carlo Tomasi noted that the cameras were “denoted with signs.” The project gathered “more than two million video frames with images of over 2,700 people,” and the data set was “later cited in myriad documents describing work to train AI in the United States, in China, in Japan, in Britain and elsewhere.”

AI startup Clarifai “built a face database with images from OkCupid, a dating site,” said company founder/chief executive Matt Zeiler, who added that he also inked a deal with “a large social media company” to use the images to train face recognition models.

Cybersecurity journalist Kim Zetter, who unwittingly became part of the Microsoft data set, said, “we’re all just fodder for the development of these surveillance systems … the idea that this would be shared with foreign governments and military is just egregious.”

BI, ICE Find State Driver’s License Photos Are a Gold Mine For Facial-Recognition Searches, The Washington Post, 7/7/19
Shopping Centers Exploring Facial Recognition in Brave New World of Retail, The Wall Street Journal, 7/2/19
The Window to Rein In Facial Recognition Is Closing, Wired, 7/10/19