Quality of Deepfakes and Textfakes Increase Potential Impact

FireEye data scientist Philip Tully showed off a convincing deepfake of Tom Hanks he built with less than $100 and open-source code. Until recently, most deepfakes have been low quality and pretty easy to spot. FireEye demonstrated that now, even those with little AI expertise can use published AI code and a bit of fine-tuning to create much more convincing results. But many experts believe deepfake text is a bigger threat, as the GPT-3 autoregressive language model can produce text that is difficult to distinguish from that written by humans.

Wired reports that Tully built his deepfake via “fine-tuning in which a machine-learning model built at great expense with a large data set of examples is adapted to a specific task with a much smaller pool of examples … [in this case] a face-generation model released by Nvidia last year.”

It took Tully a single day and a single graphics processor rented in the cloud to adapt the Nvidia model into a “Hanks-generator.” He also “cloned Hanks’ voice in minutes using only his laptop, three 30-second audio clips, and a grad student’s open-source recreation of a Google voice-synthesis project.”

“If this continues, there could be negative consequences for society at large,” said Tully.

Center for Security and Emerging Technology research fellow Tim Hwang said that deepfakes “don’t have to be perfect for them to be convincing in a world where we rapidly consume information in the way we do.” He believes the “killer app for deepfake disinformation is yet to arrive,” noting that Russia’s Internet Research Agency “and others accomplish a lot with cheap labor and relatively simple tech infrastructure and probably don’t have much to gain from even small AI projects.”

Although Hwang doesn’t believe that deepfakes are an imminent threat, he still believes that “society should invest in defenses anyway.” A report he published recently suggested that academic and corporate labs “create ‘deepfake zoos’ that collect examples made with different open-source techniques … to help create deepfake detectors.”

Nvidia has already published information on “how to detect faces synthesized with its software … [and] Facebook recently created a trove of deepfake video and offered $500,000 for the best performing deepfake detector trained on them.”

Elsewhere Wired reports on the introduction of GPT-3, “an AI that can produce shockingly human-sounding (if at times surreal) sentences.” With “increasingly plausible celebrity face-swaps on porn and clips in which world leaders say things they’ve never said before … we will have to adjust, and adapt, to a new level of unreality.” It notes that, “synthetic text … will be easy to generate in high volume, and with fewer tells to enable detection.”

“Textfakes could instead be used in bulk, to stitch a blanket of pervasive lies,” it adds. The future, it says, could be one in which algorithms read the web and publish their own responses, “leading to a feedback loop that would significantly alter our information ecosystem.”

“In the future, deepfake videos and audiofakes may well be used to create distinct, sensational moments that commandeer a press cycle, or to distract from some other, more organic scandal,” it concludes. “But undetectable textfakes … have the potential to be far more subtle, far more prevalent, and far more sinister.”

Related:
AI Can Almost Write Like a Human – and More Advances Are Coming, The Wall Street Journal, 8/11/20