While Nvidia (NVDA 2.21%) had very little to offer consumers at CES 2026, the GPU giant officially took the wraps off its new Rubin platform for AI data centers. The new platform features the company's Vera CPUs, Rubin GPUs, and a variety of home-grown networking technology. A rack-scale solution with 72 GPUs and a small-scale system with 8 GPUs will be available in the second half of 2026.
Image source: Nvidia.
One big selling point of Rubin is dramatically lower AI inference costs. Compared to Nvidia's last-gen Blackwell platform, inference workloads on Rubin can be run at a 90% lower cost per token. Tokens are units of data processed by AI models, and it's how customers of those models are generally charged for use.
Speaking to Axios, multiple industry experts pointed to 2026 as the year when AI must prove itself. "Enterprises will need to see real ROI in their spend, and countries need to see meaningful increases in productivity growth to keep the AI spend and infrastructure going," said Menlo Ventures partner Venky Ganesan. "Boards will stop counting tokens and pilots and start counting dollars," said EY global tech leader James Brundage.
Nvidia's Rubin platform and its drastically lower token costs are arriving at the perfect time to make AI work for enterprises.
AI costs are a problem, and agents are making things worse
The cost of running AI models has dropped rapidly over the past few years. Andreessen Horowitz found that the per-token cost of the cheapest large language model with a minimum score on a particular benchmark has fallen by a factor of 1,000 over the past 3 years. In late 2021, OpenAI's GPT-3 cost $60 per million tokens to run. Today, the cheapest model with similar capabilities can be run for just $0.06 per million tokens.
Still, today's most capable LLMs are expensive to run, and those capabilities are necessary for more advanced tasks. Chain-of-thought reasoning, which involves an AI model breaking down a complex problem by simulating a reasoning process, can produce better results but consume up to 100 times as many tokens.
Agentic workflows, where LLM calls are strung together along with tools like web search to solve complex problems, also tend to be extremely token-heavy. AI coding assistants, for example, need to read files in a codebase, look up documentation online, break down a complex feature into simpler components, and then put it all together with multiple reasoning steps to produce the final code.
According to a recent MIT report, an incredible 95% of enterprise AI pilots fail to deliver any meaningful financial impact. While there are multiple reasons behind this high failure rate, one issue is the lofty token costs.
Nvidia's Rubin platform could make AI work for enterprises
While the 90% reduction in token costs for Rubin is likely a best-case scenario, significantly lowering the cost to run complex agentic workloads will likely change the math for many enterprises. CoreWeave, Microsoft, Alphabet, Amazon Web Services, and other AI cloud providers are expected to deploy Nvidia's Rubin platform in 2026, while every major AI lab is evaluating the platform for training future models.

NASDAQ: NVDA
Key Data Points
More affordable AI inference could drive increased usage and push demand for Nvidia's chips even higher, although the company does face some competition. AMD recently unveiled plans to launch its MI500 AI accelerator family in 2027, which provides up to 1,000 times the performance of its last-gen MI300X chip. Google is also reportedly attempting to sell its highly efficient TPU chips to third parties.
While Nvidia may not dominate the AI inference market in the same way it's dominated the AI training market, the company's push to lower token costs should be appealing to essentially every AI provider. With the economics of AI inference changing so quickly, Nvidia is likely to sell every AI chip it can make for now as tech giants battle for an advantage in the AI race.











