Training AI models, like the large language model that powers ChatGPT, requires an incredible amount of computational horsepower. NVIDIA (NVDA 2.24%) dominates the market for AI training chips, and its latest H100 GPU is far and away the most performant option. The $40,000 chip, which must be linked together by the thousands to handle the most demanding workloads, are in high demand. Companies racing to train advanced AI models are snapping up NVIDIA's AI chips faster than the company can make them.
While Advanced Micro Devices (AMD) is planning to launch powerful AI chips later this year, the biggest threat to NVIDIA's dominance may actually be Intel (INTC -0.93%). Intel shelled out $2 billion back in 2019 to acquire Habana Labs, a developer of deep-learning accelerators. At the time, Intel predicted that the market for AI chips would exceed $25 billion by 2024. That prediction now looks quaint. More recent forecasts call for the AI chip market to top $300 billion by 2030, having surpassed Intel's target in 2022.
Habana's Gaudi2 AI chip, which launched in mid-2022, can't beat NVIDIA's H100 in terms of raw performance. But the chips do offer a solid value proposition, and the next-gen Gaudi3 chip may be able to go toe-to-toe with NVIDIA.
A capable AI chip
In the most recent set of MLPerf benchmarks, a series of tests developed by a consortium of AI leaders across industry and academia, it's clear that NVIDIA's H100 reigns supreme. A cluster of 3,584 H100 GPUs churned through a large language model benchmark in less than 11 minutes. A single H100 would take 548 hours, assuming perfect scaling.
In comparison, a system composed of 384 Gaudi2 chips completed the same benchmark in 311 minutes. Making the same assumption, a single Gaudi2 chip would take about 1,990 hours to finish the job. The H100, then, is roughly three times as fast as Gaudi2. Gaudi2 will receive a software update later this year that will enable faster 8-bit floating point calculations, which should close that gap a bit.
For a company looking to build out a cluster to train AI models or rent one from a cloud computing provider, performance per dollar is just as important as total performance. While the price of a Gaudi2 chip is hard to pin down, Intel made a point to mention the value proposition in the press release announcing the benchmark results:
The accelerator's MLPerf-validated performance on GPT-3, computer vision and natural language models, plus upcoming software advances make Gaudi2 an extremely compelling price/performance alternative to Nvidia's H100.
Intel can win in multiple ways
Beyond Gaudi chips from Habana, Intel has a few other ways it can benefit from the boom in AI demand. For AI inference, which is less compute-intensive than AI training, the company's latest Sapphire Rapids data center CPUs feature built-in AI accelerators. Intel also sells data center GPUs, and tens of thousands of these chips are powering the Aurora supercomputer at Argonne National Laboratory.
Intel is also working to build out its own foundry business, and it's aggressively launching new manufacturing nodes as it races to regain its manufacturing edge over Taiwan Semiconductor Manufacturing Company (TSMC). While this effort will take years to pay off, if Intel does manage to leapfrog the rest of the foundry industry, the company could win orders to manufacture AI chips for other companies down the road.
NVIDIA's data center GPUs are the fastest AI chips available, but don't count out Intel and its Gaudi family of chips. By offering a potentially better value proposition than NVIDIA, Intel could find plenty of new customers as the AI boom continues.