IBM Divides Data Among Servers, Speeds Up Deep Learning

IBM says it has made a significant improvement in its deep learning techniques, by figuring out a way to divide the data among 64 servers running up to 256 processors. Up until now, companies have run deep learning on a single server, because of the difficulty of synchronizing data among servers and processors. With IBM’s new capability, deep learning tasks will benefit from big improvements in speed, enabling advances in many different tasks. Customers using IBM Power System servers will have access to the new technology.

Fortune reports that, “IBM used 64 of its own Power 8 servers — each of which links IBM Power microprocessors with Nvidia graphical processors with a fast NVLink interconnection to facilitate fast data flow between the two types of chips.” So-called clustering technology “acts as a traffic cop between multiple processors in a given server as well as to the processors in the other 63 servers.”


IBM fellow/IBM Research director of systems acceleration and memory Hillery Hunter says that processors out of sync can’t learn anything. “The idea is to change the rate of how fast you can train a deep learning model and really boost that productivity,” she said, adding that “expanding deep learning from a single eight-processor server to 64 servers with eight processors each can boost performance some 50 to 60 times.”

According to Fortune, “Analyst Charles King, founder of Pund-IT, is impressed with what he’s hearing about IBM’s project, saying that the company has found a way to ‘scale up’ systems so that adding extra processors improves performance.”

“For example, 100 percent scaling means that for every processor added to a given system, that system would get 100 percent performance improvement.”

Although “complex management issues and connectivity” prevent 100 percent gains, “IBM claims its system achieved 95 percent scaling efficiency across 256 processors using something called the Caffe deep learning framework created at the University of California at Berkeley.” Moor Insights & Strategy president/founder Patrick Moorhead says IBM’s claim “seems almost too good to be true.” Previously Facebook AI research held the record, at 89 percent scaling.

In terms of image recognition, IBM, again using the Caffe framework, claimed 33.8 percent accuracy, working with 7.5 million images over seven hours, compared to Microsoft’s 29.8 percent accuracy in 10 days. In addition to Caffe, IBM said Google’s TensorFlow framework can likewise run atop this new technology. According to Moorhead, by enabling TensorFlow as well as Caffe, IBM “makes this technology more broadly applicable to a range of deep learning applications.”