Transformer Engine Archives

Nvidia Introduces AI-Powered GPUs and Cloud LLM Services

By Paula Parisi
September 22, 2022

“Computing is advancing at incredible speeds. Acceleration is propelling this rocket, and it’s fuel is AI,” Nvidia founder and CEO Jensen Huang said in his 2022 GTC conference keynote, announcing two new AI services: the Nvidia NeMo large language model service, which helps customize LLMs, and the Nvidia BioNeMo LLM service, aimed at bio researchers. Nvidia also unveiled its GeForce RTX 40 Series GPUs, shipping Q4. Powered by the company’s new architecture, Ada Lovelace, the two new models — GeForce RTX 4090 and GeForce RTX 4080 — offer better ray tracing performance and AI-based neural graphics. Continue reading Nvidia Introduces AI-Powered GPUs and Cloud LLM Services

Nvidia Turbo Charges NeMo Megatron Large Training Model

By Paula Parisi
August 2, 2022

Nvidia has issued a software update for its formidable NeMo Megatron giant language training model, increasing efficiency and speed. Barely a year since Nvidia unveiled Megatron, this latest improvement further leverages the transformer engine architecture that has become synonymous with deep learning since Google introduced the concept in 2017. New features result in what Nvidia says is a 5x reduction in memory requirements and up to a 30 percent gain in speed for models as large as 1 trillion parameters, making NeMo Megatron better at handling transformer tasks across the entire stack. Continue reading Nvidia Turbo Charges NeMo Megatron Large Training Model

Nvidia Touts New H100 GPU and Grace CPU Superchip for AI

By Paula Parisi
May 10, 2022

Nvidia has begun previewing its latest H100 Tensor Core GPU, promising “an order-of-magnitude performance leap for large-scale AI and HPC” over previous iterations, according to the company. Nvidia founder and CEO Jensen Huang announced the Hopper earlier this year, and IT professionals’ website ServeTheHome recently had a chance to see a H100 SXM5 module demonstrated. Consuming up to 700W in an effort to deliver 60 FP64 Tensor teraflops, the module — which features 80 billion transistors and has 8448/16896 FP64/FP32 cores in addition to 538 Tensor cores — is described as “monstrous” in the best way. Continue reading Nvidia Touts New H100 GPU and Grace CPU Superchip for AI