Graphics specialist NVIDIA (NASDAQ:NVDA) is in the middle of transitioning from its older Kepler and Maxwell architectures to its new Pascal architecture. So far, the company has released two Pascal-based products. The first was the Tesla P100 based on the GP100 chip, aimed at what the company calls "Mixed Workload [High Performance Computing]" and "Strong-Scale [High Performance Computing]."
In this case, "mixed workload" refers to the chip's ability to perform both single-precision floating point calculations as well as double-precision floating point calculations at very high speeds thanks to specialized circuitry on the chip.
The second is GP104, a part aimed squarely at gaming. This means that it has excellent single precision performance (nearly on par with the vastly more expensive Tesla P100), but much lower double precision performance.
It is well known that NVIDIA is working on another GPU, known as GP102. This part is expected to serve as both as higher-end gaming cards than the recently launched GTX 1080 (i.e., GeForce GTX 1080 Ti, and a next-generation GeForce GTX Titan), as well as a future Tesla accelerator.
An even better Tesla than Tesla P100 for some workloads?
The GP100 is going to be NVIDIA's best Pascal-based part for the so-called "mixed workloads" -- workloads where both single-precision and double-precision floating point calculations will need to be carried out.
However, for workloads in which double-precision calculations aren't done very often (or at all), the Tesla P100 is better than the Tesla M40 (a single-precision focused part marketed at hyper-scale workloads), but it's not significantly so. The P100 is capable of 10.6 teraflops of single-precision performance (9.3 teraflops for the PCI Express add-in-card variant), while the M40 is capable of around 7 teraflops of single-precision performance.
For those hyper-scale customers, NVIDIA will need to put out a GPU that delivers a large leap in single-precision floating point performance. That's where GP102 will come in handy.
Indeed, if GP102 is to GP104 what GM200 was to GM204, then we should expect that GP102 will offer around 33% greater single-precision floating point performance than GP104. This would peg it at around 12 teraflops, or a 71% increase from the current GM200-based Tesla M40 accelerator.
Why build accelerators specifically for hyper-scale?
On NVIDIA's last earnings call, management said that its data center related revenue (i.e., revenue from sales of Tesla accelerators) was up 63% year over year. This, the company claims, was reflective of "enormous growth in deep learning."
"In just a few years, deep learning has moved from academia and is now being adopted across the hyper-scale landscape," CFO Colette Kress said.
The also added that hyper-scale companies "are the fastest adopters of deep learning, accelerating their growth in [NVIDIA's] Tesla business." She also went on to say that revenue from sales of Tesla GPUs to hyper-scale customers is "now similar to that from high-performance computing."
At this pace, revenue from hyper-scale customers could very well eclipse that from traditional high-performance computing customers in a relatively short time.
If it is financially justifiable to build a high-performance computing-specific part such as the Tesla P100, there is at least as much justification to do a hyper-scale specific part such as the upcoming GP102. In fact, since GP102 is likely to be sold to both hyper-scale data center customers and to performance-hungry gamers, the justification for the development of such a part is likely even more compelling.