Nvidia Introduces New Architecture to Power AI Data Centers

By Paula Parisi
March 24, 2022

Nvidia CEO Jensen Huang announced a host of new AI tech geared toward data centers at the GTC 2022 conference this week. Available in Q3, the H100 Tensor Core GPUs are built on the company’s new Hopper GPU architecture. Huang described the H100 as the next “engine of the world’s AI infrastructures.” Hopper debuts in Nvidia DGX H100 systems designed for enterprise. With data centers, “companies are manufacturing intelligence and operating giant AI factories,” Huang said, speaking from a real-time virtual environment in the firm’s Omniverse 3D simulation platform.

“AI has fundamentally changed what software can do and how it is produced,” Huang said in an announcement for Hopper, which succeeds the two-year-old Ampere.

Each DGX H100 “boasts two Nvidia BlueField-3 DPUs, eight ConnectX Quantum-2 InfiniBand networking adapters, and eight H100 GPUs, delivering 400 gigabytes per second throughput and 32 petaflops of AI performance at FP8 precision,” writes VentureBeat.

The article notes that “every GPU is connected by a fourth-generation NVLink for 900GB per second of connectivity, and an external NVLink Switch can network up to 32 DGX H100 nodes in one of Nvidia’s DGX SuperPOD supercomputers.”

Nvidia says its next-generation DGX SuperPOD “expands the frontiers of AI with the ability to run massive LLM workloads with trillions of parameters.” AI applications like speech, conversation, customer service and recommenders are driving fundamental changes in data center design, Huang said in his GTC keynote, which begins with an Omniverse flythrough of Nvidia’s massive Santa Clara campus that segues to the view inside an AI chipset.

The DGX H100 provides 32 petaflops of AI performance operating at a new FP8 precision of one exaflop — one quintillion floating-point operations per second — 6x more than its predecessor. The FP8 precision is leveraged in the H100’s Transformer Engine. Transformers are the backbone of today’s popular language models, including Google’s BERT, OpenAI’s GPT-3 and DeepMind’s AlphaFold.

“The challenge in training AI models is to maintain accuracy while capitalizing on the performance offered by smaller, faster formats like FP8,” says VentureBeat, noting that, Transformer Engine “cleverly” uses Nvidia’s fourth-generation tensor cores “to apply mixed FP8 and FP16 formats,” toggling between the two based on what Nvidia says are “custom, [hand]-tuned” heuristics for speedier machine learning.

Nvidia also announced the new Grace CPU Superchip for data centers, “which consists of two CPUs connected directly via a new low-latency NVLink-C2C. The chip is designed to ‘serve giant-scale HPC and AI applications’ alongside the new Hopper-based GPUs, and can be used for CPU-only systems or GPU-accelerated servers,” The Verge reports. The names Grace and Hopper are an homage to 1950s computer scientist and mathematician Grace Hopper.

CNET says there will be “plenty of competition for the H100, which is composed of a whopping 80 billion transistors that make up its data processing circuitry and is built by TSMC. Rivals include Intel’s upcoming Ponte Vecchio processor, with more than 100 billion transistors, and a host of special-purpose AI accelerator chips from startups like Graphcore, SambaNova Systems and Cerebras.”

One thing Nvidia says there won’t be much competition for is its new Eos supercomputer, which the company expects to be “the world’s fastest AI system” when it comes online later this year. Eos features a total of 576 DGX H100 systems with 4,608 DGX H100 GPUs.

Topics: AlphaFold, Ampere, Artificial Intelligence, BERT, Cerebras, DeepMind, DGX H100, DGX SuperPOD, Enterprise, Eos Supercomputer, Google, GPT-3, Grace, Grace Hopper, Graphcore, GTC 2022, Hopper, Intel, Jensen Huang, Machine Learning, Nvidia, Omniverse, OpenAI, SambaNova, TSMC

Nvidia Introduces New Architecture to Power AI Data Centers

No Comments Yet