Nvidia Says Rubin CPX Inference Accelerator Coming in 2026

Nvidia has designed a new class of GPU for massive-context inference, the Rubin CPX, due in late 2026. Purpose-built to speed the million-token applications used to generate video and create software, the Rubin CPX functions as a specialty accelerator, working in concert with Nvidia Vera CPUs and Rubin GPUs packaged inside the upcoming Vera Rubin NVL144 CPX rack platform. “The Vera Rubin platform will mark another leap in the frontier of AI computing,” revolutionizing massive-context AI just as RTX did graphics and physical AI, said Nvidia CEO Jensen Huang.

The Rubin CPX GPU evolved from the new Rubin product line Nvidia is rolling out next year. Tom’s Hardware describes it as “a workhorse for the compute-intensive context phase of disaggregated inference” while the standard Rubin GPU will handle the “memory- and bandwidth-limited generation phase.”

“Video generation is rapidly advancing toward longer context and more flexible, agent-driven creative workflows,” Runway CEO Cristóbal Valenzuela said in Nvidia’s announcement, describing the Rubin CPX as “a major leap in performance” for demanding workloads, ultimately helping the industry toward “unprecedented speed, realism and control” in intelligent creative tools.

The Rubin CPX delivers up to 30 petaflops of compute with NVFP4 precision and features 128GB of GDDR7 memory “to accelerate the most demanding context-based workloads,” according to Nvidia.

AI inference is comprised of two main steps, reports SiliconANGLE noting that “first, an AI model analyzes the information on which it will draw to answer the user’s prompt,” then, when the analysis is complete, “the algorithm generates its prompt response one token at a time.”

Typically, both tasks have been performed by the same chips, but interest is emerging in a specialized approach, as evidenced by the recent deal between OpenAI and Broadcom.

The bifurcated approach is called disaggregated inference and Nvidia explores it and the Rubin CPX in a blog post.

“By separating the understanding portion from the response generation, which the new type of GPU chip will handle, Nvidia said its customers will get more efficient hardware” and save money, Bloomberg reports, adding that “for video generation and search, the new offering will be capable of decoding, encoding and processing on a single chip.”

Rubin CPX processors “will be offered in the form of cards that can be incorporated into existing server computer designs or used in discrete computers that can operate separately alongside other hardware in data centers,” Bloomberg writes.

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.