About a year ago, the first details of Intel's (NASDAQ: INTC) upcoming high-performance computing, or HPC, oriented processor known as Knights Landing started hitting the Web. The chip, which promises the capacity to compute over 3 trillion double precision floating point operations per second, is expected to integrate very fast "on-package" memory.

This memory, according to an Intel slide found on memory maker Micron's (NASDAQ: MU) website, offers five times the bandwidth and five times the power efficiency of standard DDR4 memory in one-third the area footprint of DDR4 memory. Intel plans to include up to 16GB of this memory per Knights Landing package when the chips launch sometime in the second half of this year.

Source: Micron. 

Although this technology is exciting in the context of Knights Landing, I think things will get really interesting when this technology is more broadly deployed in future Intel Xeon processors.

When will we see it?
Recently, the first details of Intel's upcoming server platform, code-named Purley, leaked to the Web. One key feature touted in the slide deck was that memory bandwidth would increase by a factor of 1.5 times from the prior-generation Broadwell-based server parts as a result of a move from four DDR4 memory channels to six DDR4 memory channels.

The slide deck also touted an "all new memory architecture," which promises to bring capacity increases and "persistent data." This is all very cool stuff, but one thing missing from this is any mention of the high-speed on-package memory that Intel plans to bring to market with Knights Landing. This suggests it might be a while until we see this memory included in a "standard" Xeon platform.

If history is a guide, Intel generally doesn't make sweeping platform or architectural changes when it releases a refreshed set of processors on a proven platform. If we assume the Purley platform will land sometime in 2017 (I would guess second quarter of 2017, based on the leaked road maps) , then we may not see a new platform until the first half of 2019.

The 2019 Intel server platform and why on-package memory might be needed

Intel's Broadwell-EP/EX chips, which will have up to 22 and 24 cores, respectively, will still feature four DDR4 channels. This is likely the "limit" to the number of cores that a quad-channel memory subsystem can reasonably support before those chips become starved for bandwidth. With the Purley platform, the initial Skylake-based chips will have up to 28 cores.

Purley, like all of Intel's platforms, will almost certainly feature a refreshed set of processors based on the 10-nanometer Cannonlake architecture. Intel's area reduction with the 10-nanometer technology will likely be put to good use in the form of more cores (see the chart below):

Source: Fool.com.

Intel tends to significantly increase core counts in server chips as it migrates from one manufacturing technology to the next. In moving from 32-nanometer Westmere-EX to 22-nanometer Ivy Bridge-EX, Intel boosted core count by 50%. In moving from 22-nanometer Haswell-EX to 14-nanometer Broadwell-EX, Intel has gained a cool 33% increase in core count.

In light of this trend, I expect the 2018 10-nanometer Purley parts to have in the neighborhood of 36 cores. The 10-nanometer parts on the follow-on platform to Purley should have even more, likely well north of 40.

Now, the thing about increasing the core count of the processors so dramatically here is that in order to actually wring all of the performance of all of those cores working together, they will need access to plenty of memory bandwidth. 

Although the move from four memory channels to six in going from Grantley to Purley will certainly help as core counts begin to exceed 24, I believe that as core counts continue to grow at a rapid clip, these chips will need even more bandwidth.

Scaling to more memory channels per socket is certainly a way to go in the near-term, but at some point it might become too difficult to simply increase memory speeds and add more memory channels to get the kind of effective bandwidth needed. Technologies such as the on-package Hybrid Memory Cube-derived memory used on Knights Landing or the competing High Bandwidth Memory standard that the graphics processor makers plan to adopt should be able to help overcome these limitations.