Google Ironwood TPU is Made for Inference and ‘Thinking’ AI

Google has debuted a new accelerator chip, Ironwood, a tensor processing unit designed specifically for inference — the ability of AI to predict things. Ironwood will power Google Cloud’s AI Hypercomputer, which runs the company’s Gemini models and is gearing up for the next generation of artificial intelligence workloads. Google’s TPUs are similar to the accelerator GPUs sold by Nvidia, but unlike the GPUs they’re designed for AI and geared toward speeding neural network tasks and mathematical operations. Google says when deployed at scale Ironwood is more than 24 times more powerful than the world’s fastest supercomputer.

VentureBeat reports that Ironwood, Google’s 7th generation TPU, “represents a significant pivot in Google’s decade-long AI chip development strategy; while previous generations of TPUs were designed primarily for both training and inference workloads, Ironwood is the first purpose-built specifically for inference.”

Ironwood is designed to support the demands of “thinking models,” including large language models (LLMs), mixture of experts (MoEs) and advanced reasoning models, each of which require massive parallel processing and efficient memory access.

“This is what we call the ‘age of inference’ where AI agents will proactively retrieve and generate data to collaboratively deliver insights and answers, not just data,” said Amin Vahdat, Google VP and GM of machine learning, systems and cloud AI, as reported by VB. “Ironwood is built to support this next phase of generative AI and its tremendous computational and communication requirements.”

For Google Cloud customers, Ironwood comes in two sizes scaled to different AI workload demands: a 256 chip configuration and a 9,216 chip configuration.

“When scaled to 9,216 chips per pod for a total of 42.5 exaflops, Ironwood supports more than 24x the compute power of the world’s largest supercomputer — El Capitan — which offers just 1.7 exaflops per pod,” Vahdat explains in a blog post.

Launching later in the year, Ironwood is “designed specifically for a new generation of more capable AI models, including ‘AI agents’ that can proactively retrieve and generate data and take actions on behalf of their human users,” writes SiliconANGLE.

Google “has seen considerable momentum in its cloud business,” according to VentureBeat, detailing Q4 2024 cloud revenue at $12 billion, a 30 percent increase year-over-year, and adding that “Google executives say active users in AI Studio and the Gemini API have increased by 80 percent in just the past month.”

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.