POST OF THE DAY
Advanced Micro Devices
More Hammer Details

Format for Printing

Format for printing

Request Reprints

Reuse/Reprint

By eachus
October 16, 2001

Posts selected for this feature rarely stand alone. They are usually a part of an ongoing thread, and are out of context when presented here. The material should be read in that light. How are these posts selected? Click here to find out and nominate a post yourself!

I'd thought I'd start a new thread to discuss some of the nuts and bolts, in the slides, some of which raise new questions.

First, instead of three integer pipes (in the K7) there are now six. But it is not clear which instructions can be executed in which pipe. I'm hoping that the AGUs (address generation unit) can handle some shifts and adds, while the ALUs process compares multiplies and divides. Even if 64-bit adds have to go to the ALU, I think it will be possible to do 32-bit (or 48 bit) adds and subtracts in the AGU. This should result in a possible six integer micro-ops dispatched per clock.

The three floating-point pipes have been renamed as FADD, FMUL, and FMISC. I have to assume that the FMUL will support the SSE2 128-bit instructions in addition to the current x86 (80-bit) and SSE (64-bit) floating-point operations. Technically it is possible to perform 3 floating-point operations per clock in the K7 family, but you have to mix MMX/3dNow! and x86 operations to do it. I hope that the redefinition/reassignment of operations to fp pipes allows more generality than the current assignments. Incidentally, there is one really tricky detail in the K7 architecture. If you, for example, load an operand that is already in the FP register file, the register remapping can convert the instruction to a no-op. This no-op must be finalized, but that can be done by any pipe, integer or floating-point. The same thing happens with integer loads that can be recognized as no-ops.

Put it all together and what do you get? A processor which will be significantly faster than an Athlon on integer arithmetic, but will show little if any improvement on floating-point arithmetic over the Athlon, absent SSE2 code. But that is only half the story. Right now, the Athlon is bandwidth limited on many standard floating-point benchmarks. The Clawhammer, due to Socket A support will not change this. It will certainly support a 333 MHz FSB speed, but that will more than be balanced by a faster processor clock. Basically, the Clawhammer will be a wet firecracker. It will support the new instructions, both x86-64 and SSE2, but otherwise will be very similar in performance to an Athlon XP or Thoroughbred with the same clock speed. (I think we now know why Clawhammer was officially delayed. AMD wants the first public reviews to focus on the Sledgehammer.)

The Sledgehammer, even if it has two CPU cores on die, will be significantly less bandwidth limited in floating-point applications. I have seen a lot of people missing a slight detail in the presentation. The integration of the Northbridge into the CPU will significantly reduce the latency of memory access. I wouldn't be surprised to see an increase in throughput of 30% or so, in addition to the doubling from the 128-bit wide access mode.

The trick to understanding this is that, in current Athlons, a data request goes from the CPU to the Northbridge, to the memory, back to the Northbridge, then back to the CPU. Right now, the CPU to Northbridge and back can be the bottleneck, especially if there is other I/O going on that has to share that data path. In the Sledgehammer, all the other I/O traffic will be on HT ports, and there is no significant limitation on CPU core to Northbridge bandwidth--the Northbridge runs at CPU speed.

Finally, one minor caution. How many pins will the Sledgehammer have? I don't know, it depends on the width of the HT channels, and the number of power and ground pins, but it could easily be over 900. This will mean either that Sledgehammers will need to be BGA mounted, or AMD will have to go to a mPGA socket with lots of pins. It isn't a cost issue, just the opposite. Take a dual Athlon motherboard, replace the Northbridge and fan with a Sledgehammer and fan, then get rid of the two CPU sockets and associated cooling. It will certainly bring (non-CPU) system costs way down.

So right now, my read is:

Sledgehammer will really make a great integer engine, definitely the best available. Right now the Alpha and Athlon run pretty much head to head for that title. The floating-point performance won't be that much greater than the Athlon, but will benefit significantly from wider bandwidth. We will see what McKinley looks like, but right now the IBM Power 4 is looking to be the Sledgehammer competition for database servers. (With Power 4 taking the high-end and Sledgehammer the low-end, the battle will be in the mid-range.)

Two, three, and four processor systems will definitely be Hammer's forte. The slides show how to hook up eight Hammers so that no processor is more than two HT hops away. For larger systems, and even for some four and six processor boxes, look for HT crossbars. Before you get too carried away, keep slide 45, The rewards of good plumbing in mind. AMD has gone to a great deal of effort to design Sledgehammer so that all interconnect paths in a 2 to 8 processor system are very short. If you want to put together a 36-processor system, there is a lot of plumbing to do. (Of course, the plumbing that Sun has in their new top-of-the-line server might do just fine...)

As for Clawhammer look for it to sample soon, if it hasn't already, but to be released concurrently with the Sledgehammer. Unless AMD wants to wipe Thoroughbred off the roadmap--and I am sure they do not--Clawhammer is right now only of interest to system and software developers. A year from now it may be a snob factor part, but don't count on it. Eventually 1H03? we will see a K8 aimed at the mobile market, but expect it to be a Sledgehammer derivative! Add a combined video and Southbridge chip to a single CPU Sledgehammer, and you have a much better combination as far as power and cost are concerned than most mobile solutions today. Of course, the mobile version of the Sledgehammer would lose at least one HT path, and probably have a smaller L2 cache, but I don't see that as a big design project. I might even give it to a summer intern. (Taking logic out of a chip is a lot easier than adding things.)

Where does this leave Barton? I thought that you were never going to ask. It may be an SMT version of the Athlon, but I think it is just going to be a blisteringly fast version of the Thoroughbred, probably designed for PC3200 memory. It will bring the K7 family into the SOI era both as an end-of-life model for the K7, and as an insurance policy for the Sledgehammer.

I find that last thought astonishing. Once upon a time, Intel was able to have multiple design teams and projects so that if one project was delayed or ran into problems, there was an alternative in the wings. Those days are long gone--over a year now. ;-) Seriously remember the week last September, when Intel announced that the Merced release was being scaled way back, Timna was cancelled, and, oh yeah, we're delaying Willamette once again (from mid-October to the end of the month). That directly led to the current situation, where if Tualatin becomes too successful, Intel fails.

I know it sounds ridiculous, but here is the bind that Intel is in. If vendors see a strong consumer preference for Tualatin over the Willamette, the Pentium 4 will be seen as a failure. If the supply chain sees the Pentium 4 as a failure they won't rush--heck they won't even crawl--to support the socket 478 platform, in particular the SDRAM and DDR versions of Brookdale. Once Northwood is shipping in quantity, and Intel has enough 0.13 micron fab capacity, then Intel can afford to hype Tualatin. But that point is six months to a year away.

Before anyone flames me on this: Intel marketing understands this, and Intel marketing is doing what it must to keep Willamette viable until enough Socket 478 infrastructure is in place. I expect them to succeed, but it will hurt their bottom line more than it already has, and it has certainly cut into AMD's profits. (Might as well use a marketing necessity--high Tualatin prices, low Willamette prices--to do as much damage as possible to your competitor.)

__________________

TMF Money Advisor
Got money questions? Your answers are just a phone call away! TMF Money Advisor puts you in touch with an objective Financial Planner whenever you need it.