November 7, 2016
Adobe Research and Princeton University are collaborating on software that acts like Photoshop for audio, including the ability to add words not found in the original audio file. Adobe developer Zeyu Jin, who spoke at the Adobe MAX conference, described the would-be product, codenamed Project VoCo, as a “sneak peak.” Project VoCo is intended to be an audio editing application, with more typical speech editing and noise cancellation features, but the Photoshop-like tool also raises potential ethical issues regarding the use of doctored audio clips.
The Verge reports that, “the software can understand the makeup of a person’s voice and replicate it, so long as there’s about 20 minutes of recorded speech.” Jin demonstrated adding a word to a sentence in a “near-perfect replication of the speaker,” according to Creative Bloq.
Adobe states that Project VoCo lets the editor “simply type in the word or words that you would like to change or insert into the voiceover. The algorithm does the rest and makes it sound like the original speaker said those words.”
TechCrunch notes that, “Project VoCo analyzes the speech, breaks it down into phonemes, transcribes it and creates the voice model. If you listen closely, you can hear when a word has changed, but it’s probably only a matter of time before you won’t be able to distinguish the actual recording and the edited (or completely fake) one.”
The software, which is based on “voice conversation” not the “traditional speech synthesis technology,” is essentially automated, although the editor can “always correct the auto-generated transcript to improve the synthesis.”
Adobe also demonstrated Project Quick Layout for easier editing of print layouts, and Project Clover, “a VR editing tool that works right inside of VR.” Adobe doesn’t commit to shipping “sneak peaks,” but many of them have gone on to be commercial products.
Project VoCo could “transform how audio engineers work,” says The Verge, but ethical implications are also rampant, with “the ability to falsify entire sentences using a person’s voice.” But, it adds, “just as Photoshop taught the general public to be wary of suspect images, Project VoCo might do so the same with regards to doctored audio clips.”
The Adobe MAX presentation is available on YouTube.