Advanced Micro Devices
A Strange Dream...

Format for Printing

Format for printing

Request Reprints


By eachus
April 19, 2006

Posts selected for this feature rarely stand alone. They are usually a part of an ongoing thread, and are out of context when presented here. The material should be read in that light. How are these posts selected? Click here to find out and nominate a post yourself!

I think I may have put together a few clues, and have an idea of why AMD is not worried by Conroe. I had a weird dream last night, which is occasionally how my mind tries to tell me something. In this case when I woke up, I sort of took two seconds to validate the thought--and it is probably going to take me twenty minutes to explain it. (But you can read it faster.;-)

Consider a particular pipelined CPU register-level design and look at translating it into an actual CPU. If you plot the number of transistors in any particular realization of the design against the average transistor switching time required to meet the timing requirements, you will get an area bounded by an upside down U. Or you can turn it around plotting number of transistors versus total power consumption and you will get a U the right way around. Designs which are inside the U are possible, designs outside it won't meet timing deadlines.

What does this have to do with AMD? I'm getting there. Obviously the best point for a given clockspeed is at the bottom of the U, if you assume that silicon real estate is free. It is not free, but think about what happened to Intel with Netburst. They finally got enough transistors at 90 nm to go past the optimal point, and did so. In other words, part of the reason that the industry is going to multi-core, is that we have reached the point where adding more transistors to a CPU core can be counterproductive. Seems simple doesn't it? But there is more.

What if you want to design a faster core, i.e. a core that can run faster with the same power limit? You have to change the register level design. In modern chips, this is usually done by changing the number of pipeline stages. If done properly, each new pipeline stage has a smaller power budget than the old stages. (This is what Prescott got wrong.) But... Everything I have heard about K8L says that the number of pipeline stages wasn't changed. What gives?

The original K8 (Hammer) parts were manufactured at 130 nm. Currently AMD is at 90 nm, and K8L will start at 65 nm. At 130 nm AMD had to limit the number of transistors used, so they were limited to designs on the left side of the U. The various 90 nm revisions seem to have taken some advantage of the smaller, faster transistors to increase throughput. But the available silicon real estate was limited by various factors, like the size of Fab 30. So AMD could expand the cores a bit at 90 nm, but...

One of the things that the recent Anandtech review shows pretty graphically is that larger L2 caches won't help AMD. (Huh? If substantial increases in bandwidth and decreases in latency for main memory have that small an effect, how much less will a larger L2 cache help?) I am still hoping that AMD will increase the L1 data cache size, and K8L obviously does have some changes to the TLB structure, but neither of those changes is going to eat up much real estate.

So why does AMD need Chartered, a fast Fab 36 ramp, and Fab 30 running well beyond its original built-out capacity? It is nice to think that they plan to double the number of CPUs they build next year over this year, but even that wouldn't use up all that capacity.

However, if AMD has a real nice design, where they started trading lots slower transistors for fewer high-powered transistors, and finally got near the bottom of the U, that would do it. Of course, if you switch to faster transistors, you will throw away those power savings--but have a lot faster CPU. (It is actually easier for me to envision that graph in three dimensions, with each core design tracing out a line in clock speed and power space as you increase the applied voltage, but it is hard enough to describe two-dimensional slices through that graph. And then of course you can add processes as another dimension...but that way eventually lays madness. ;-)

What does all this mean? That K8L started out to be a (very) low-power design, lots of slow switching, power efficient transistors. Of course, at higher voltages, or with thinner gate oxides, it should also be a much faster design.

Become a Complete Fool
Join the best community on the web! Becoming a full member of the Fool Community is easy, takes just a minute, and is very inexpensive.