IBM Creates Machine-Learning Aided Watermarking Process

IBM now has a patent-pending, machine learning enabled watermarking process that promises to stop intellectual property theft. IBM manager of cognitive cybersecurity intelligence Marc Ph. Stoecklin described how the process embeds unique identifiers into neural networks to create “nearly imperceptible” watermarks. The process, recently highlighted at the ACM Asia Conference on Computer and Communications Security (ASIACCS) 2018 in Korea, might be productized soon, either within IBM or as a product for its clients.

VentureBeat reports Stoecklin explained that, “for the first time, we have a [robust] way to prove that someone has stolen a model.”


“Deep neural network models require powerful computers, neural network expertise, and training data [before] you have a highly accurate model,” he said. “They’re hard to build, and so they’re prone to being stolen. Anything of value is going to be targeted, including neural networks.”

In April 2017, KDDI Research and the National Institute of Informatics also published a report on “a method of watermarking deep learning models,” but Stoecklin pointed out that, “previous concepts required knowledge of the stolen models’ parameters, which remotely deployed, plagiarized services are unlikely to make public.”

Unlike other previous models, IBM’s version “allows applications to verify the ownership of neural network services with API queries,” which Stoecklin noted is “essential to protect against adversarial attacks that might, for example, fool a computer vision algorithm into seeing cats as ‘crazy quilts’, or force an autonomous car to drive past a stop sign.”

IBM’s method is based on two steps: “an embedding stage, where the watermark is applied to the machine learning model, and a detection stage, where it’s extracted to prove ownership.” Three algorithms generate three types of watermarks: “one that embedded ‘meaningful content’ together with the algorithm’s original training data, a second that embedded irrelevant data samples, and a third that embedded noise.” Applying any of these algorithms to a given neural network and feeding “the model data associated with the target label triggered the watermark.”

IBM tested the algorithms with the MNIST dataset, “a handwritten digit recognition dataset containing 60,000 training images and 10,000 testing images, and CIFAR10, an object classification dataset with 50,000 training images and 10,000 testing images,” and all of them were “100 percent accurate,” according to Stoecklin.

The only exceptions are that the process doesn’t work on offline models and it can’t “protect against infringement through ‘prediction API’ attacks.” IBM is continuing to “refine the method.”