August 24, 2017
Microsoft has debuted Brainwave, a system that improves AI hardware performance, enabling machine learning at speeds beyond what’s available today with CPUs or GPUs. At the Hot Chips symposium in Cupertino, California, researchers showed off a Gated Recurrent Unit model running on Intel’s newly released Stratix 10 FPGA (field programmable gate array chip), at a speed of 39.5 teraflops without batching operations. Brainwave currently supports models constructed with Microsoft’s CNTK framework and Google’s TensorFlow framework.
VentureBeat reports that, “the lack of batching means that it’s possible for the hardware to handle requests as they come in, providing real-time insights for machine learning systems.” The current benchmark for hardware is “convolutional neural networks like AlexNet and ResNet-50,” but Microsoft chose a model several times larger than these.
Brainwave works by loading “a trained machine learning model into FPGA hardware’s memory that stays there throughout the lifetime of a machine learning service,” allowing the hardware to “compute whatever insights the model is designed to generate, such as a predicted string of text.”
Key to Brainwave is low latency, which enables machine learning systems to be deployed at scale. “We call it real-time AI because the idea here is that you send in a request, you want the answer back,” said Microsoft Research distinguished engineer Doug Burger. “If it’s a video stream, if it’s a conversation, if it’s looking for intruders, anomaly detection, all the things where you care about interaction and quick results, you want those in real time.”
Burger believes that, “more people should ask how a machine learning accelerator can perform without bundling requests into a batch and processing them all at once.”
“All of the numbers [other] people are throwing around are juiced,” he said, adding that, “Brainwave will allow Microsoft services to more rapidly support artificial intelligence features.” Microsoft also plans to “make Brainwave available to third-party customers through its Azure cloud platform,” and to allow third parties to “bring any trained model and run it on Brainwave.”
Google is also working to accelerate machine learning, earlier this year debuting “the second revision of its Tensor Processing Unit — a dedicated chip for machine learning training and serving.”
Other startups are working on dedicated hardware accelerators. FPGAs are critiqued as being “less fast or less efficient than chips made specifically to execute machine learning operations,” but Burger counters that “this performance milestone should show that the programmable hardware can deliver high performance as well.”
He believes that Intel and Microsoft can both “further optimize both the hardware’s performance and Brainwave’s use of it,” with Microsoft likely able “to hit 90 teraflops with the Intel Stratix 10.”