March 10, 2017
Facebook unveiled new hardware for use in its data centers, designs for which will be made available to outside companies through its Open Compute Project. The announcements were made during this week’s OCP U.S. Summit in Santa Clara. One announcement centered on a new GPU server designed to better serve the company’s AI initiatives. Big Basin — successor to the company’s Big Sur high-performance compute platform — will help Facebook train machine learning models that are 30 percent larger than those running on current servers.
Facebook did away with the motherboard, making Big Basin smaller and shorter than Big Sur. Engineers increased the memory size from 12GB to 16GB and made improvements to cluster management.
“The thing with Big Sur is the compute motherboard was actually in the box along with PCIe connections to eight graphics cards,” Facebook engineering manager Eran Tal told VentureBeat. “The big change with Big Basin is we took the motherboard out… We need a head node to connect. There’s more I/O [input/output] bandwidth to hook up to for the monstrous Big Basin.”
According to Facebook, “In tests with popular image classification models like ResNet-50, we were able to reach almost 100 percent improvement in throughput compared with Big Sur, allowing us to experiment faster and work with more complex models than before.”
The company also introduced its new Tioga Pass server, version 2 of its Yosemite server and the Bryce Canyon storage server. Bryce Canyon is intended primarily for high-density storage (such as photo and video files), promising “increased efficiency and performance.”
“This new storage chassis supports 72 HDDs in 4 OU (Open Rack units), an HDD density 20 percent higher than that of Open Vault,” notes Facebook. “Its modular design allows multiple configurations, from JBOD to a powerful storage server.”
Tioga Pass, the successor to Leopard, features a dual-socket motherboard, improves memory configuration, and offers more flexibility and bandwidth. “This is also Facebook’s first dual-CPU server to use OpenBMC after it was introduced with our Mono Lake server last year,” explains Facebook.
The end-to-end refresh of its server hardware is meant to provide a modular system that allows the company “to replace the hardware or software as soon as better technology becomes available.” This flexibility is necessary to meet the demands of “people watching more than 100 million hours of video every day on Facebook, 95-plus million photos and videos being posted to Instagram every day, and 400 million people now using voice and video chat on Messenger every month.”