Nvidia Touts New H100 GPU and Grace CPU Superchip for AI

Nvidia has begun previewing its latest H100 Tensor Core GPU, promising “an order-of-magnitude performance leap for large-scale AI and HPC” over previous iterations, according to the company. Nvidia founder and CEO Jensen Huang announced the Hopper earlier this year, and IT professionals’ website ServeTheHome recently had a chance to see a H100 SXM5 module demonstrated. Consuming up to 700W in an effort to deliver 60 FP64 Tensor teraflops, the module — which features 80 billion transistors and has 8448/16896 FP64/FP32 cores in addition to 538 Tensor cores — is described as “monstrous” in the best way.

“SXM5 cards are designed for Nvidia’s own DGX H100 and DGX SuperPod high-performance computing (HPC) systems as well as machines designed by third parties. These modules will not be available separately in retail, so seeing them is a rare opportunity,” writes Tom’s Hardware (sister company to ServeTheHome).

Users can access 80GB of ECC-enabled HBM3 memory connected using a 5120-bit bus. Calling it physically “one of the largest chips ever made,” Tom’s notes it is “extremely power hungry,” and says it “requires an extremely sophisticated voltage regulating module (VRM) that can deliver enough power to feed the beast.”

The Hopper architecture was unveiled in March at the GTC 2022 conference. Nvidia plans to begin shipping Hopper H100 compute GPUs in the second half of the year, by which time it will have final product specifications, including thermal and energy specs.

“The faster chip should let AI developers speed up their research and build more advanced AI models, especially for complex challenges like understanding human language and piloting self-driving cars,” according to CNET.

The new chip cements “Nvidia’s evolution from a designer of graphical processing units used for video games to an AI powerhouse,” CNET says, noting “the H100 competes with huge, power-hungry AI processors like AMD’s MI250X, Google’s TPU v4 and Intel’s upcoming Ponte Vecchio. Such chips are goliaths most often found in the preferred environment for AI training systems, data centers packed with racks of computing gear and laced with fat copper power cables.”

Nvidia’s H100 Hopper architecture “includes a Transformer Engine for faster training of AI models,” writes InfoQ, noting the Grace CPU “features 144 Arm cores” and  outperforms the company’s current dual-CPU offering.

“The Grace CPU Superchip is a single-socket package that contains two CPU chips that are connected via Nvidia’s high-speed NVLink-C2C technology,” says InfoQ, explaining that “the Transformer deep-learning model is a common choice for many AI tasks, especially large language models such as GPT-3,” whose training requires “massive datasets and many days, if not weeks, of computation.”

No Comments Yet

You can be the first to comment!

Sorry, comments for this entry are closed at this time.