Tech Corner: Core Wars

by Shui-Che Lim


Letter 1: from Mark

Mark,

Thanks for writing. I see that I've managed to stir up a little dissention and controversy. That's good since it gives us a good starting point for debate.

Mark: I have to say I'm pretty disappointed to see the published core wars article in its current form on combatsim. This really calls the site's objectivity and credibility into question.

Guilty as charged! With an explanation. I have lived and worked in Asia for the last 7 years in the PC industry. I've spent 6 of those 7 years in Taiwan doing product management and product marketing.

I will admit that it is because if this direct experience at some of these Taiwanese firms, that I've come to understand and experience first hand on some occasions, Intel's heavy handedness when things don't go the way they want them to go.

I'm not afraid to call a Spade a Spade. But I do so with definite knowledge.

Mark: The chipset is the "real brains" not the CPU?? Huh??

Most chipsets are comprised of two parts. What is now commonly referred to as a north bridge and a south bridge. The north bridge contains most of the system control logic, i.e. memory controller, cache controller, PCI bus controller, etc. The south bridge usually has I/O functions such as a keyboard controller, RTC, serial, parallel, game ports, ISA bus legacy support, etc.

Taken together, most core logic chipsets do exceed 70,000 logic gates (note I say logic gates which is different from transistor count). However, the main portion of the chipset, the north bridge usually comes in at between 50,000 to 70,000 logic gates by itself.

I understand that you've got a software development background. That means that you've probably got a much better idea of the inner workings of a CPU. However, I've got a background in system development and product management. I even developed a notebook PC that won an Editor's Choice Award from Byte Maga- zine in 1995. This is just to let you know that I'm not just talking to hear my lips flap.

With your background, I can understand why you would think of the CPU as the brains inside a PC. However, let me point out that the CPU by itself can't even access memory to load instructions for execution if not for the core logic chipset. The core logic chipset is what ties the CPU to the rest of the system. Therefore, I still stand by my assertion that the core logic chipset and not the CPU is the "brains" inside a PC.

Mark: 100MHz FSB buys you 50% memory performance speed up?? Yeah, right. It's worth having but you're not going to see 50% performance anything in a real program.

From a hardware standpoint, 100MHz FSB WILL give you an additional 50% bandwidth increase to the system memory and more importantly, to the L2 cache.

Notice that you yourself state that you won't see this sort of performance "in a real program." But then, that falls under the providence of software, doesn't it? Depending on how an application is coded, there will still be times when the CPU is stalled out due to data dependencies or having cache miss, etc.

It's like saying a Pentium 166 should really only be labeled at 133 since it can't run at full speed all the time. For simplicity's sake, I stated a figure that is technically accurate. However, real world considerations do often take their toll.

Mark: What about the Pentium II's that are going to be contemporaneous with the K6+3D?? I could go on...

Well, if you're talking about K6+3D, there will not be ANY Pentium II's in the consumer space that can be considered to be contemporaneous.

K6+3D will have an on chip 256K L2 cache, besides the 64K data/ instruction cache already on chip. The 256K cache will run at FULL core clock speeds. While it may be true that future Pentium II's will also have cache running at full speed, these units will also have ECC and are SERVER and WORKSTATION products. Due to the cost of these modules, you won't see them in normal consumer machines.

Therefore, the most prevalent Pentium II's you're going to see in the consumer space are going to be those with caches that run at half the core CPU speed and an even newer cacheless Pentium II designed to go into sub-$1000 machines. I can tell you that a cacheless Pentium II runs barely faster than a P55C at the same clock speed (I've done tests).

So... how are these chips to be considered contemporaneous to the K6+3D chip?

Mark: "Liberty of playing devil's advocate"?? I think there are a few more liberties being taken here... There's a definite agenda here.

Well, guilty as charged. My agenda was to show readers that it is NOT an all Intel world and that there are other viable and less costly solutions. Most hardcore simmers are generally older and more mature. I'm sure the demographics would be something like a male, middle class income, 30-something, with a wife and two kids. These kinds of people don't necessarily have the kind of disposable income that would allow them to get the best that Intel has to offer.

Although, I will admit that what I had written was something of a simplification. Intel is not a chipset company any more than they are a motherboard company or a system company for that matter. The majority of their revenues are derived from sales of CPUs. Any activities they concern themselves with are basically to further those aims.

Intel originally started the chipset business and got into mother- boards because there was always a few months lag between the time they had new CPUs ready and the time when Taiwanese motherboard makers could release products that could use them. Essentially, these activities were aimed at reducing the leadtime to market for their own CPU products.

However, the side effect of these activities is that Intel has taken 20% motherboard business away from Taiwanese companies and they do control approximately 85%-90% of the volume of chipsets sup- plied to Taiwanese board manufacturers. And even though Intel did not originally intend to use these issues as "leverage," that is exactly what has ended up happening.

Mark: That bit about the DIB full core speed versus half speed...I just plain laughed out loud. DIB is a bug?? So how fast is socket 7 L2 then, eh?? Sheesh...you can't have it both ways.

One man's feature is another man's bug... but that was probably another simplification. Would it help if I had said that DIB has it's architectural strengths and weaknesses? It just seems like semantics to me.

DIB has been around since the Pentium Pro, though it was not called such at the time. The DIB term was actually coined when Intel released the Pentium II. However, I have correctly stated that this was not a "new" performance feature as Intel was touting it to be.

Furthermore, DIB does in fact run at half the core processor speed on a Pentium II as opposed to the full core speed in a Pentium Pro. Did you find any factual errors in this statement?

Despite what you may say about the L2 cache speed on Socket 7 motherboards, the facts remain that on industry benchmarks such as WinBench 97, the AMD-K6 scores very closely to equivalently configured Pentium II systems at the same clock speed. Since going to 100MHz FSB would give a huge boost to both memory and L2 cache access, it is entirely conceiveable that despite the slower cache speed, overall performance on K6-3D and particularly K6+3D will be better than equivalently equipped and clocked Pentium II systems. I already have preliminary data that indicates that this may indeed be the case.

Mark: Actually the recommendations aren't all that bad after all's said but for the most part they don't relate well to the "evidence" stated above them. They're about cost; good basis to work from.

This is a fair statement. However, if everyone's buying decision was keyed solely to cost, Intel would go out of business wouldn't they? The fact remains that Intel spent 1.2 Billion USD last year promoting their Intel Inside program. All this marketing activity translates to a higher end cost to the consumer. We're PAYING Intel to brainwash us.

The evidence that was presented was a direct rebuttal to some a good number of "points" that Intel has made regarding the "superiority" of Pentium II. After refuting those points, I came back down to earth and made some common sense recommendations.

However, I felt that it was important to inform consumers of their choices, especially consumers that are now looking at a major platform upgrade.

Mark: I did have to wince at the VooDoo2 buys you better speedup that a better CPU though... Let me read this back. This says that adding a VooDoo2 board to a 133MHz (or even the 166 if you like) is going to get you more speed up in our sims than switching to a Pentium II 300MHz??

The assumption was that most serious simulations in 1998 would have Glide and Voodoo/Voodoo2 support. This is pretty much a given seeing that 3Dfx owns 50% of the 3D accelerator market.

Voodoo2 supports hardware triangle setup which offloads a good portion of computing power necesary to do this on the CPU. If you support future versions of Glide that will be coming out with Voodoo2 is there any reason to doubt that the Glide API would not support their hardware triangle setup engine?

In this regard, it is entirely possible that a Pentium 133 or Pentium 166 with Voodoo2 using Glide and the onboard triangle setup engine will run as well if not faster than just a system using a Pentium II-300 and D3D. It is obvious that Voodoo2 performance scales with the CPU; however, I was basically saying that you could spend less money getting a good accelerator and get reasonable performance instead of going out and buying a brand new mother- board and a brand new Pentium II CPU. Got a problem with this?

Mark: Think about that for a second. For TAW or F4 type things that have *extensive* code managing real time campaign engines?? No, someone doesn't understand the workloads, I fear.

I do understand the workloads and stand by my assertion that a Voodoo2 with hardware triangle setup would be a better value upgrade than a new PII mother- board and PII processor.

Mark: But hey, don't take it from me. Andy Hollis posted on the news last week that he's licking his lips for 400MHz CPUs in 100MHz FSB systems -- and he's not talking about a socket 7 system -- this is what he said will take sims he's designing to the next level: more CPU budget!

Well if Andy was not talking about Socket 7, then praytell, what other 100MHz FSB systems are available in the marketplace?

Mark: What was it that MPS said in your F4 interview about the speed of F4 on MP NT?? And MPS down-played Voodoo2 for speeding up current systems -- they said more memory first... Point examples perhaps, but ones that many of your readers will care about directly...

Good point, but this is more of an issue with the OS, again software issues. Memory now is cheap and anyone running with less than 32MB of system memory is just begging for trouble; especially on Windows95 or NT.

We're not talking about software or OS issues here. The issue was technical merit of Socket 7 (Super 7) vs. Slot-1. In that regard, I think the article presented the right level of information at the right level of complexity.

Thanks for your feedback. I will endeavor to be a little more even handed in my future articles. I will also try not to simplify things to the point where people start to take technical issue with the points I'm making.


Letter 2: from Vernon

Thank you. I'm glad you enjoyed the article. You comment on compatibility is a strange thing since it is often the programmers that end up doing something that creates incompatibilities. Programmers sometime like to be really clever and write code to exploit known "bugs" in Intel's CPUs. Every operation causes register flags within the CPU to change; sometimes programmers take advantage of certain instructions to manipulate the CPU flags. This causes a problem because AMD and Cyrix CPUs don't exhibit this behavior if they adhere strictly to x86 standards, which they do. However, as more an more games are written in C and C++ with assembly only being used in critical performance areas, I think this issue will get better over time.

The x86 architecture is a standard and, as such, is very well documented. Prior to the Pentium class of CPUs, most CPU designs were derivitives of Intel's designs. Starting with the 6th generation of CPUs - Pentium Pro, K6, Mx686 - the architecture started diverging from Intel because: 1) Intel doesn't like to share information with the public, much less with the competition, and 2) the P6 bus used by the Pentium Pro and Pentium II were never licensed.

This meant that AMD and Cyrix had to come up with different design methodologies that would allow Pentium Pro/Pentium II level performance but using the Socket 7 infrastructure. Intel's move to Slot-1 was only an attempt by Intel to migrate people into a form factor that neither AMD nor Cyrix could follow. Note that the Pentium II uses the same P6 bus available in Pentium Pro systems for the last few years.

Anyway, CPUs are really nothing more than sophisticated black boxes. For a given input, you expect a given output. This is ensured since, as I had mentioned earlier, the x86 instruction set is an industry standard. How the CPU internally processes the input to give you an output is irrelevant so long as the output matches what is expected.

This has allowed both AMD and Cyrix to make use of more aggressive CPU architectures and designs to maximize performance. Consider that despite being hobbled by an L2 cache that only runs at 66MHz, an AMD-K6 system is able to come very close to comparably clocked PII systems with a faster DIB. This says the the CPU makes up quite a bit of ground for the lack of performance in the Socket 7 system architecture. That's why I fully believe that the move to 100MHz will really make AMD and Cyrix shine. They're CPUs are in many ways more architecturally advanced than Intel's.

Anyway, if you think you like K6-3D, just wait until K6+3D comes out... or how about K7? :)


Letter Three: from James

Mr. Lim: Congratulations on your very fine article about PC CPU issues on the Combatsim site - I've been evaluating PC hardware for over 30 years now (yes, I do go back to the 8008 - looked at it for an industrial control machine ...) and even in 'fussy' mode I really can't even quibble with either your statements or your presentation style. As the proud owner of a 249MHz (83MHzx3) AMD K6 machine with a 3Dfx card, I can hardly take offense at your analysis, your reasons exactly recapitulate the process that led to my current system. However, pleased as we may be, we both need to recognize that the End-of-Life for Socket 7 may be as little as 12 months away. Why such a negative prognosis?

Because the immutable laws of physics appear to dictate some rather severe speed limitations for Socket 7. Specifically, around 350-400MHz CPU core speeds and a maximum 125MHz bus speed. Both limits are directly due to the inherently higher noise and 'time-base-spreading' effects caused by irreducible impedance mismatches at the pin-to-socket interface, combined with inter-lead signal coupling due to capacitive and RF skin-effects. I really can't see any way that even a Super-Super Socket 7 board will be speed competitive with a solution which mechanically tightly couples the L1 and L2 caches across a high speed data bus.

Unfortunately, unless some future CPU can incorporate sufficient high-speed L1 cache to isolate itself from today's relatively glacial memory transfer bandwidths we will continue to move large amounts of data through secondary caches which will become the primary I/O bottlenecks in our Socket 7 systems.

Fortunately, we can continue to expect that Intel, with its CPU-centric approach, will continue to provide the least elegant and most Byzantine solutions (Slot 1/P II, Slot 2/ Katmai), thereby affording the alternate X86 processor folks some interesting markets - viz the Cyrix MediaGx and upcoming MediaGxi (which may well be a harbinger of the future, where increasing CPU efficiency and integration combined with tightly coupled, high bandwidth memory (RamBus??) spell the end of this generation of hardware hackers) or the AMD K6+ - 3D.

Anyway thanks for the article, wouldn't it be fun to look a little farther into the future though?

James,

It appears that you're quite an able engineer and are quite well versed in PC architecture, so I'll cut to the chase.

Everything you pointed out is absolutely true. The socket 7 bus architecture would peak out at between 100~133MHz. Also, the key to faster and more efficient CPU to system designs should keep CPU clock multipliers relatively low and increase the speed of the system bus. Also, as CPU clock multipliers increase, the CPU core temperatures increase exponentially.

So what does this all mean? You are most likely correct in your assumption that Super 7 will be the last iteration of the Socket 7 architecture and that the lifespan of Super 7 is 12-18 months. On the surface, this doesn't appear to be a good recommendation. However, as most gamers are bound to a Socket 7 infrastructure, this is the most reasonable and cost effective upgrade path while maintaining performance on a par with the best that Intel has to offer.

The problem as core CPU speeds increase is that the system memory bus itself cannot sustain the data transfer rate necessary to keep the CPU's execution pipelines full. This becomes an ever more serious issue as CPUs such as the K6 are superscalar and superpipelined with 6 execution units. Maintaining this flow of data and instructions to the execution units becomes critical and even a larger 1MB L2 cache at 100MHz may not be enough to do the trick.

That is the reason for K6+3D. The difference between this chip and the K6-3D is that the K6+3D will have 256K of onboard L2 cache in addition to the 64K of data/instruction cache already present. This 256K of L2 cache will run at the full core clock speed to minimize stalling out the execution pipelines. This makes the onboard motherboard cache an L3 cache, which will be fully supported by chipsets such as the ALI Aladdin V and VIA MVP3 chipsets.

K6-3D will start at 300MHz and go to 350MHz while the K6+3D part will start at about 350MHz and progress to 400 or 450MHz. At clock multipliers over 3x feeding the CPU with data constantly becomes a very big issue because of the great difference in operating speed between the CPU and system bus. Just to give you an idea of the complexity of these chips:

K6 166/200/233 (Current) .35 micron 8.8M transistors 162mm sqare, 66MHz bus.
K6 233/266/300 (Current) .25 micron 8.8M transistors 68mm square, 66MHz bus.
K6-3D 300/350 (April '98) .25 micron 9.3M transistors 80mm square, 100Mhz bus.
K6+3D 350/400/450? (July '98) .25 micron 21.3M transistors 138mm square, 100MHz bus.

To look further into the future would be... K7. The system architecture for K7 will address many of the deficiencies found on both the Socket 7 and P6 bus. AMD had annouced the the K7 would use the Digital EV6 bus protocol. This is the same bus protocol used by Digital's Alpha chip... which has broken all land speed records in terms of raw performance.

This bus should be capable of 200~300MHz operation! AMD would have to backwards engineer PCI, AGP, USB, 1394 and legacy PC I/O support onto this bus through the system core logic chipset in order to support both old and new PC peripherals. Rumor has it though that the K7 should start at around 500MHz and could climb to nearly 1GHz before the end of it's lifecycle.

The stuff that dreams are made of!

Sincerely, Shui-Che Lim

Email to Shui-Che..Shui-Che Lim

For further discussion of PII vs K6 architecture go to The Microprocessor




� 1997 - 2000 COMBATSIM.COM, Inc. All Rights Reserved.

Sponsor
Recent Posts
Categories