British chip designer Graphcore recently unveiled the Colossus MK2, also known as the GC200 IPU (Intelligence Processing Unit), which it calls the world's most complex chip for AI applications. The chip offers eight times the performance of its predecessor, the Colossus MK1, and is powered by 59.4 billion transistors -- which surpasses the 54 billion transistors in NVIDIA's (NVDA -0.01%) newest top-tier A100 data center GPU.
Graphcore plans to install four GC200 IPUs into a new machine called the M2000, which is roughly the size of a pizza box and delivers one petaflop of computing power. On its own, the system is slower than NVIDIA's A100, which can handle five petaflops on its own.
But Graphcore's M2000 is a plug-and-play system that allows users to link up to 64,000 IPUs together for 16 exaflops (each exaflop equals 1,000 petaflops) of processing power. To put that into perspective, a human would need to perform a single calculation every second for nearly 31.7 billion years to match what a one exaflop system can do in a single second.
The GC200 and A100 are both clearly very powerful machines, but Graphcore enjoys three distinct advantages against NVIDIA in the growing AI market.
1. Graphcore is developing custom chips for AI tasks
Unlike NVIDIA, which expanded its GPUs beyond gaming and professional visualization purposes into the AI market, Graphcore designs custom IPUs, which differ from GPUs or CPUs, for machine learning tasks.
On its website, Graphcore claims: "CPUs were designed for office apps, GPUs for graphics, and IPUs for machine intelligence." It explains that CPUs are designed for "scalar" processing, which processes one piece of data at a time, and GPUs are designed for "vector" processing, which processes a large array of integers and floating-point numbers at once.
Graphcore's IPU technology uses "graph" processing, which processes all the data mapped across a single graph at once. It claims the IPU structure processes machine-learning tasks more efficiently than CPUs and GPUs. Many machine-learning frameworks -- including TensorFlow, MXNet, and Caffe -- already support graph processing.
Graphcore claims the vector processing model used by GPUs is "far more restrictive" than the graph model, which can allow researchers to "explore new models or reexplore areas" in AI research.
2. Graphcore's GC200 offers cheaper per-petaflop processing power
NVIDIA's A100 costs $199,000, which equals $39,800 per petaflop. Graphcore's M2000 system offers one petaflop of processing power for $32,450. That difference of $7,350 per petaflop could generate millions of dollars in savings in multi-exaflop systems for data centers.
That could spell trouble for NVIDIA's data center business, which grew its revenue 80% annually to $1.14 billion last quarter and accounted for 37% of the chipmaker's top line. NVIDIA recently acquired data center networking equipment maker Mellanox to strengthen that business, but that increased scale might not deter Graphcore's disruptive efforts.
3. Graphcore is backed by venture capital
Unlike NVIDIA, a publicly traded chipmaker that is regularly scrutinized over its spending practices, Graphcore is a private start-up that can focus on research and development (R&D) and growth instead of its short-term profits.
Graphcore was founded just four years ago, but was already valued at $1.95 billion after its last funding round in February. Its backers include investment firms like Merian Chrysalis and Amadeus Capital Partners, as well as big companies like Microsoft (MSFT 0.11%). Microsoft already users Graphcore's IPUs to process machine learning workloads on its Azure cloud computing platform, and other cloud giants could follow that lead over the next few years.
Should NVIDIA investors be concerned?
NVIDIA enjoyed an early-mover's advantage in data center GPUs, but it faces a growing list of challengers, including first-party chips from Amazon, Facebook, and Alphabet's Google. Graphcore represents another looming threat, and NVIDIA's investors should be wary of its new chips -- which seem to offer a cheaper, more streamlined, and more flexible approach to tackling machine learning and AI tasks.