Inference Archives

Nvidia Reports Record $57B in Revenue, $32B in Profit for Q3

By Paula Parisi
November 21, 2025

Nvidia reported stellar results for the recent quarter, logging record revenue of $57 billion, up 62 percent year-over-year and 22 percent from Q2. “Blackwell sales are off the charts, and cloud GPUs are sold out,” Nvidia founder and CEO Jensen Huang told investors, saying “we’ve entered the virtuous cycle of AI,” with “more new foundation model makers, more AI startups across more industries, and in more countries.” The results quieted fears about an AI bubble. But there was some drama as Nvidia disclosed there is “no guarantee” of finalizing a previously announced $100 billion investment in OpenAI. Continue reading Nvidia Reports Record $57B in Revenue, $32B in Profit for Q3

Google Unveils New AI Chips, Announces Deal with Anthropic

By Paula Parisi
November 12, 2025

Google Cloud is rolling out its seventh-generation Tensor Processing Unit (TPU), Ironwood, and new Arm-based computing options that aim to meet exploding demand for AI model deployment in what the Alphabet company describes as a business shift from training models to serving end users. “Constantly shifting model architectures, the rise of agentic workflows, plus near-exponential growth in demand for compute, define this new age of inference,” explains Google Cloud. The company said that Anthropic — known for its Claude family of large language models — “plans to access up to 1 million” of the new TPUs. The deal is reportedly “worth billions.” Continue reading Google Unveils New AI Chips, Announces Deal with Anthropic

OpenAI Sets $38 Billion AWS Deal for Training and Inference

By Paula Parisi
November 5, 2025

OpenAI has entered into a $38 billion cloud computing deal with AWS in a deal set to extend at least seven years, the hyperscaler says. The Monday news propelled Amazon stock to an all-time high of $254 per share, up 4 percent at close. After initially working exclusively with investor Microsoft for cloud services, OpenAI has negotiated expansively to meet increased demand. This year, the startup has signed with Oracle, Nvidia, AMD and Broadcom for storage and processing power as well as funding for data center construction plans of its own, here and abroad. The strategic alliance with AWS marks OpenAI’s first such arrangement with Amazon. Continue reading OpenAI Sets $38 Billion AWS Deal for Training and Inference

Qualcomm Articulates Its Expansion into AI Data Center Chips

By Paula Parisi
October 29, 2025

Qualcomm, which has established itself as a leading supplier of AI chips for edge devices with its Snapdragon line, is now making a major push into the data center space to challenge industry leaders such as Nvidia and AMD. The AI200 and AI250 accelerator chips are aimed at rack-scale inference systems as the debut entries in what Qualcomm describes as a multi-generation roadmap of AI inference equipment that will be updated annually. At Monday’s market close, Qualcomm stock was up by 11 percent on the news as investors saw promise of the San Diego-based firm’s expansion beyond its core mobile market. Continue reading Qualcomm Articulates Its Expansion into AI Data Center Chips

Oracle Cloud Orders 50,000 New AMD Instinct MI450 AI GPUs

By Paula Parisi
October 16, 2025

Oracle Cloud Infrastructure (OCI) will be a launch partner for the first publicly available AI supercluster powered by AMD’s upcoming Instinct MI450 Series GPUs — with an initial order of 50,000 of the chips to be deployed starting in Q3 2026 and expanding in 2027. The resulting Oracle installations will feature Instinct MI450s configured with AMD-designed CPUs in AMD’s new Helios server rack systems, positioned to compete with Nvidia’s Vera Rubin NVL144 CPX racks when both platforms are mass-released next year. Oracle is challenged to rapidly scale its data center capacity due to massive compute commitments made this year to OpenAI. Continue reading Oracle Cloud Orders 50,000 New AMD Instinct MI450 AI GPUs

OpenAI & Broadcom Developing Custom AI Accelerator Chips

By Paula Parisi
October 15, 2025

OpenAI has expanded its alliance with Broadcom, announcing a plan to create enough custom AI accelerator chips to consume 10 gigawatts of power. News of the custom chip collaboration leaked out last month. Now that it is ready to go public, OpenAI says designing its own chips and systems will allow the startup to leverage directly into the hardware what it has learned from developing frontier models. The racks, scaled entirely with Ethernet and other connectivity solutions from Broadcom, will be deployed across OpenAI’s facilities and partner data centers beginning in the second half of 2026. Continue reading OpenAI & Broadcom Developing Custom AI Accelerator Chips

Nvidia Investing $100 Billion in OpenAI Data Center Build-Out

By Paula Parisi
September 24, 2025

Nvidia is investing up to $100 billion in a partnership with OpenAI that will result in what Nvidia CEO Jensen Huang predicts will be “the biggest AI infrastructure deployment in history.” The project will use about 10 gigawatts worth of Nvidia systems — including the upcoming Vera Rubin platform — power equivalent to 4 million to 5 million GPUs. “This partnership is about building an AI infrastructure that enables AI to go from the labs into the world,” Huang said on CNBC’s “Halftime Report,” explaining the $100 billion will be invested in stages as each gigawatt is deployed. The investment will be all-cash with Nvidia receiving an undisclosed amount of OpenAI equity. Continue reading Nvidia Investing $100 Billion in OpenAI Data Center Build-Out

Nvidia Says Rubin CPX Inference Accelerator Coming in 2026

By Paula Parisi
September 11, 2025

Nvidia has designed a new class of GPU for massive-context inference, the Rubin CPX, due in late 2026. Purpose-built to speed the million-token applications used to generate video and create software, the Rubin CPX functions as a specialty accelerator, working in concert with Nvidia Vera CPUs and Rubin GPUs packaged inside the upcoming Vera Rubin NVL144 CPX rack platform. “The Vera Rubin platform will mark another leap in the frontier of AI computing,” revolutionizing massive-context AI just as RTX did graphics and physical AI, said Nvidia CEO Jensen Huang. Continue reading Nvidia Says Rubin CPX Inference Accelerator Coming in 2026

OpenAI Reportedly Turning to Broadcom for Custom AI Chips

By Paula Parisi
September 9, 2025

OpenAI is said to be in talks with Broadcom about developing custom AI inference chips to run its models. On an earnings call last week, Broadcom disclosed that an AI developer had placed a $10 billion order for AI server racks using its chips. That new customer was reported to be OpenAI, which has relied primarily on hotly sought-after Nvidia GPUs for model training and deployment. Broadcom specializes in XPUs — accelerator chips designed for specific uses, like inference for ChatGPT. OpenAI CEO Sam Altman has publicly complained that a shortage of chips has impeded the company’s ability to get new models and products to market. Continue reading OpenAI Reportedly Turning to Broadcom for Custom AI Chips

Character.AI Introduces New Video Generator in Closed Beta

By Paula Parisi
April 24, 2025

Character.AI, a platform offering AI chatbots for socializing and role play, has released a video generation model called AvatarFX in closed beta. Promising the ability to make photorealistic images “come to life — speak, sing and emote — all with the click of a button,” the technology combines audio and video to create a variety of visual style and voice, from realistic 3D — including “non-human faces (like a favorite pet)” — to 2D animations, according to the company. AvatarFX also has the ability “to maintain strong temporal consistency with face, hand and body movement” and can “power videos with multiple speakers.” Continue reading Character.AI Introduces New Video Generator in Closed Beta

Google Ironwood TPU is Made for Inference and ‘Thinking’ AI

By Paula Parisi
April 14, 2025

Google has debuted a new accelerator chip, Ironwood, a tensor processing unit designed specifically for inference — the ability of AI to predict things. Ironwood will power Google Cloud’s AI Hypercomputer, which runs the company’s Gemini models and is gearing up for the next generation of artificial intelligence workloads. Google’s TPUs are similar to the accelerator GPUs sold by Nvidia, but unlike the GPUs they’re designed for AI and geared toward speeding neural network tasks and mathematical operations. Google says when deployed at scale Ironwood is more than 24 times more powerful than the world’s fastest supercomputer. Continue reading Google Ironwood TPU is Made for Inference and ‘Thinking’ AI

OpenAI In-House Chip Could Be Ready for Testing This Year

By Paula Parisi
February 12, 2025

OpenAI is getting close to finalizing its first custom chip design, according to an exclusive report from Reuters that emphasizes the Microsoft-backed AI giant’s goal of reducing its dependency on Nvidia chips. The blueprint for the first-generation OpenAI chip could be finalized as soon as the next few months and sent to Taiwan’s TSMC for fabrication, which will take about six months — “unless OpenAI pays substantially more for expedited manufacturing” — according to the report. Even by usual standards, the training-focused chip is already on a fast track to deployment. Continue reading OpenAI In-House Chip Could Be Ready for Testing This Year

Reasoning Model Competes with Advanced AI at a Lower Cost

By Paula Parisi
February 10, 2025

Model training continues to hit new lows in terms of cost, a phenomenon known as the commoditization of AI that has rocked Wall Street. An AI reasoning model created for under $50 in cloud compute credits is reportedly performing comparably to established reasoning models such as OpenAI o1 and DeepSeek-R1 on tests of math and coding aptitude. Called s1-32B, it was created by researchers at Stanford and the University of Washington by customizing Alibaba’s Qwen2.5-32B-Instruct, feeding it 1,000 prompts with responses sourced from Google’s new Gemini 2.0 Flash Thinking Experimental reasoning model. Continue reading Reasoning Model Competes with Advanced AI at a Lower Cost

Meta’s 3D Gen Bridges Gap from AI to Production Workflow

By Paula Parisi
July 10, 2024

Meta Platforms has introduced an AI model it says can generate 3D images from text prompts in under one minute. The new model, called 3D Gen, is billed as a “state-of-the-art, fast pipeline” for turning text input into high-resolution 3D images quickly. The app also adds textures to AI output or existing images through text prompts, and “supports physically-based rendering (PBR), necessary for 3D asset relighting in real-world applications,” Meta explains, adding that in internal tests, 3D Gen outperforms industry baselines on “prompt fidelity and visual quality” and for speed. Continue reading Meta’s 3D Gen Bridges Gap from AI to Production Workflow

Meta Deploys Gen 2 MTIA AI Accelerator Chip in Data Centers

By ETCentric Staff
April 12, 2024

Meta’s next generation AI silicon is a 5nm chip designed to power the models that provide recommendations to those who use its social network platforms. The new MTIA inference accelerator is part of a “broader full-stack development program for custom, domain-specific silicon that addresses our unique workloads and systems,” Meta says. The next-gen MTIA more than doubles the compute and memory bandwidth of its predecessor, the 7nm MTIA v1 chip introduced in May 2023, resulting in 3x the performance, according to Meta, which says the new silicon is already live in 16 data centers. Continue reading Meta Deploys Gen 2 MTIA AI Accelerator Chip in Data Centers