Itanium 2 doesn't increase the issue rate, what Itanium 2 does is increase the possibility that IPC of 6 is possible by making better architecture.
Brief overview of Itanium architecture: The CPU processes the EPIC instructions by using two bundles of 3 instructions each, therefore achieving IPC of 6. Each bundle can have a certain combination of different instructions.
Main execution units in Itanium consists of 4 different kinds, that is Branch unit, Floating Point Unit, Memory Unit, Integer Unit. Memory and Integer unit can be considered in simple terms as ALU from what I understand.
In one bundle, you can have certain combinations of those execution units. Examples may be: MMI(memory, memory, integer), MII, MIF, BBB, and such. Remember that each bundle can have that combinations, and there is like 26 combinations or so. That means if the second bundle can't have that combinations due to the lack of execution units, 6-wide isn't possible.
Itanium had 2 M units, 2 I units, 2 FP units, 3 B units. So if the first bundle is MMI and the second bundle is MMI, it can't have 6-wide execution.
According to the article I read, first Itanium can have in theory of ~3.8 IPC due to lack of execution units, and Itanium 2 have theoretical IPC of 5.6-5.7 due to more execution units, specifically 4 M units rather than 2 as in Itanium.
There are two kind of ways to run 32-bit for Itanium. One way is the hardware emulator that's in all current Itanium chips. The 32-bit performance for first Itanium runs 32-bit x86 code as worse as 66MHz 486, or good as 200MHz Pentium MMX, when Itanium is running at 800MHz. Itanium 2 has better hardware 32-bit emulator plus better overall Itanium architecture, so 32-bit performance increases to around equal to 300MHz Pentium II(1GHz Itanium 2 has twice the performance or better compared to 800MHz Itanium in native code). That's pretty bad, makes running 32-bit practically useless, and according to the review, the compatibility was not so good either, as Quake 3 wouldn't install(not that running Quake 3 on Pentium 100MHz equivalent isn't sort of a push). Plus it takes additional die space and power consumption, which is not that much but a lot for a almost useless feature.
So Intel introduced a dynamic software translator for the Itanium called IA-32EL(Execution Layer). By translating x86 instructions to EPIC instructions and optimizing them on run-time, performance improved dramatically while, taking out the need to have hardware emulator. 1.5GHz Itanium 2 with 6MB L3 cache is now equal to equivalently clocked Xeon MP(with hardware it would have been equal to 450MHz Pentium II) or better, which isn't that bad, and much better than the hardware one.
Montecito seems to not have the hardware emulator anymore.
Dang, I *swear* I read an article on HP.com or Intel.com stating Itanium 2 was 8-wide. I can't find it anymore, but there are many saying 6-wide. Weird. Anyway, I've read plenty about the rest of the Itanium architecture, and I don't know why you're suddenly going off about it. I'll correct the issue width statement, though.
Not like it matters now, as we all know Conroe is 4-wide now. (I really expected that to be the case, but was told to make it less certain and more speculative for the article.)
The intro shows that its 6-wide, 8-stage pipeline deep architecture. 8 does stand for something but I forgot what. I babbled on because it wasn't directed all at you, but I hoped somebody who didn't know and want to know may look at it.
Argh! WTF is going on? Am I senile? I'm positive I read something about Itanium 2 (McKinley, etc.) being more than 8 pipeline stages. It stated something about the 8 stages of Merced being part of the reason Itanium 1 never reached higher clock speeds. Damn... people must just make stuff up about these architectures. :|
quote: Argh! WTF is going on? Am I senile? I'm positive I read something about Itanium 2 (McKinley, etc.) being more than 8 pipeline stages. It stated something about the 8 stages of Merced being part of the reason Itanium 1 never reached higher clock speeds. Damn... people must just make stuff up about these architectures. :|
Itanium "Merced" is 10 stage pipelines. Nearly everyone that looked at the architecture said it was a bloated design, that was released in haste. By improving design tremendously over Itanium, Itanium 2 Mckinley reduces that to 8 stage pipeline while clocking 25% higher at the SAME process.
I agree IntelUser2000, but even so, if each core used c&q with some disable core capability, would be in the 30W per core range (120W total) right on track with prescott 2M and Pentium D.
I don’t know if you noticed, but amd added more power to their designs while their processor are consuming less.... that must be because:
Good reasons first:
-amd will achieve higher clock speeds 3.4 GHz and up
-amd is already thinking in 4 cores processors
Bad reasons:
-amd will come with some bad 65nm tech
-or will come with some bad core (M2 with rev.F prescott like)
I almost spit my coffee onto the keyboard when i read that title. Came off to me as Intel released a roadmap showing the Conroe release in the third quarter of this year.
Intel's lead time on the roadmap is about 18 months, though the initial details are often lacking. With Conroe/Merom being a new architecture, I doubt Intel will do so much as mention a clock speed without NDAs.
Intel's 45nm is supposed to signal high-K, metal gates, and possibly tri-gate transistor structure. By using tri-gate, its supposed to be fully depleted substrate from the start. So, if they implement what they say they will according to their presentations:
-High-K
-Metal
-Tri-gate, which brings FD-SOI
We should see Yonah before worrying about Conroe. The specs of Yonah is pretty interesting.
Yonah looks interesting in some ways, but as far as I can tell it's just Dothan on 65nm with dual cores, improved uops-fusion, and hopefully better FP/SIMD support. I haven't even heard anything to indicate it will have 64-bit extensions, which makes it less than Conroe in my book. Not that 64-bit is the be-all, end-all, but I'm pretty sure I've bought my last 32-bit CPU now. I'd hate to get stuck upgrading for Longhorn just because I didn't bother with a 64-bit enabled processor. Bleh... Longhorn and 64-bit is really just hype anyway, but we'll be forced that way like it or not. Hehehe.
quote: Yonah looks interesting in some ways, but as far as I can tell it's just Dothan on 65nm with dual cores, improved uops-fusion, and hopefully better FP/SIMD support. I haven't even heard anything to indicate it will have 64-bit extensions, which makes it less than Conroe in my book. Not that 64-bit is the be-all, end-all, but I'm pretty sure I've bought my last 32-bit CPU now. I'd hate to get stuck upgrading for Longhorn just because I didn't bother with a 64-bit enabled processor. Bleh... Longhorn and 64-bit is really just hype anyway, but we'll be forced that way like it or not. Hehehe.
And can u tell me how that's not significant?? Yonah isn't like Smithfield's slap-on dual core, because it has arbitration logic to manage data between two cores. And even compared to A64 dual core, its not just dual core + SRQ-like, it has bunch of other enhancements which strengthen the weakness(FPU/SSE).
To: nserra
HT takes less than 5% die size, of course IMC is good, but Pentium 4 can have IMC too. I think HT and IMC is good in their own different ways.
Cache consumes low power, and takes little die space compared to number of transistors used. If you take 4 core on Athlon64 today on the 90nm, Prescott will look cool running compared to it.
The 6MB cache in Itanium 2 takes 60% die size but only 30% power consumption.
I agree IntelUser2000, but even so, if each core used c&q with some disable core capability, would be in the 30W per core range (120W total) right on track with prescott 2M and Pentium D.
I don’t know if you noticed, but amd added more power to their designs while their processor are consuming less.... that must be because:
Good reasons first:
-amd will achieve higher clock speeds 3.4 GHz and up
-amd is already thinking in 4 cores processors
Bad reasons:
-amd will come with some bad 65nm tech
-or will come with some bad core (M2 with rev.F prescott like)
Yeh, from current rumors Yonah is having every check box feature besides EM64T :)
Like I have said I can't wait till Intel brings Conroe technology, as I always have like going the Intel route, but I don't want to go for NetBurst based processors.
45nm generation looks to be quite the change for Intel, as they are moving to those tri-gate trasistors, High K, and FD-SOI, tohugh I beleive it would be introduced at the end of 2007 rather then mid at the earliest, Conroe is expected to debut on 65nm technology, hopefully it doens't need to get an optical shrink to get good like NetBurst did and is good from the get go, like Athlon 64 was.
I heard due to the limits of the Trace Cache throughput, it only can achieve IPC of 2, not 3, so even in theory, Pentium 4 only reaches IPC of 2.
About the Hyperthreading technology, I sort of disagree. If the design of the microprocessor is made to accomodate such multi-threading technology, they don't need to put 24% increase in die size like Power 5 did. I heard only with 5% increase in die size, Alpha EV8 was supposed to have performance increase of 2x, which happens to be greater than by putting another core!!!
In practice, I'd guess that NetBurst averages an IPC of around 1.3 overall. I'd say Athlon 64 is closer to 2.0. Obviously just a guess, but when you consider how a 2.4 GHz A64 3800+ compares to the P4 3.6 (570), that seems about right. Heck, P4 might even be 1.1 to 1.2 IPC on average if K8 is 2.0. Branch misses kill IPC throughput on NetBurst, for example.
We also don't know precisely (well, I don't) what the various traces represent. It could be that many traces actually take up two of the "issue slots", as traces don't have to be a single micro-op.
HyperThreading in NetBurst is really pretty simplistic. It also doens't really help improvement much except in very specific circumstances. I can't imagine any SMT configuration actually providing a bigger boost than SMP, though. (Otherwise everyone would already be doing it, rather than just NetBurst and high-end Server chips.) I seriously doubt that a 5% die space increase would be able to get more than a 10% performance increase. 10% I could see being 20 to 30%, and 15% could be 50% or more - of course, all just guesses and all under specific tests.
If you're not running multiple CPU-intensive threads, any form of SMT helps as much as SMP, which is to say not at all. Basically, this is all just guessing right now anyway, so there's no point in worrying about it too much. I have to think that Intel can get MUCH better performance with the next architecture than anything they've currently got, though. 2MB+ cache on CPUs is a lot of wasted space that could be better utilized, IMO.
quote: 2MB+ cache on CPUs is a lot of wasted space that could be better utilized, IMO.
Yeah I completely agree!!
I was hoping AMD would release a 4 core processor with 128KB L2 cache for each core. That would give almost the same transistor count of 2 cores with 1MB L2. But “a lot” more speed.
Of course in MARKETING, having a processor with a total of 512KB L2 cache would be a budget one, but for me a excellent efficient design.
Well, a point to make is this: because the designers of Alpha CPUs managed beat every other CPU at every generation and every process generation, having simpler core, then its likely that the future generation would have done so too.
Its not that companies are not using SMT because they don't know the benefits of SMT, its that they don't know how to make it good. Did you think it made sense for Intel do make Prescott core? IBM looks like best doing at SMT because they are only one of the two that actually uses SMT nowadays, the other being Intel at desktop chips. Plus, server chip design are usually pushed to their technical limits, while desktop chips are made for mainly mass production and profit.
(Exception is Itanium code-name Montecito's multi-threading, since it uses different form of it)
About the IPC, in theory P4 can output IPC of 2 and Athlon 64, three. So even with same branch misses, in theory Pentium 4 will be slower than Athlon 64, not to mention on the real one, it adds branch misses.
"The enormous potential of SMT is shown by the expectation that it can approximately double the instruction throughput of an already impressive monster like the EV8 at the cost of only about 6% extra die area over a single threaded version of the design. That is a bigger speedup than can be typically achieved by duplicating the entire MPU as done in a 2 way SMP system!"
Though it looks as P4's multi-threading is a simple one not destined to take advantage of the architecture, its more the other way around.
Pentium 4 with limited IPC throughput(2 max), limited number of registers(8 and 16 in 64-bit), limited bandwidth, is crippling HT's ability.
Alpha EV8 was supposed to have IPC of 8 in theory, 1024 registers(!!), integrated memory controller with 20GB/sec bandwidth per CPU, and the architecture that was developed to take advantage of SMT from the beginning shows its full benefits.
Next-gen Itanium with multi-threading is a different story. Montecito doesn't use SMT, it uses different form of multi-threading, so its not really comparable.
Does Conroe's roadmap intersect early on with the 45 nm process (2007)? That was the point at which Intel was supposed to migrate over to the new high-K/metal transistor gates, although I recall something about those plans being dropped while Intel works on a new high-K process. The new gates were supposed to dramatically reduce heat dissipitation, although I have no idea what to expect from the new high-K they are working on.
The roadmaps don't typically go that far out. Once the transition to 65nm is complete, we'll probably start getting information on their 45nm transition. Rough guess would be that it will launch around 18 months after 65nm, so mid-2007 give or take.
I'm interested in what AMD's response to Conroe will be.
If Conroe is indeed going to be a wide-issue, efficient design, then its average IPC should easily exceed that of the K8, while its longer pipeline (in comparison with K8/Dothan) will enable it to hit higher clock speeds. The lessons Intel learned with Netburst will likely compliment its Next Generation architecture nicely, such as the importance of good branch prediction, along with innovations such as Trace Cache.
If we assume Conroe will be released at speeds in the lower 2Ghz range initially, then AMD should have time to hold out until it is ready to release K10.
The question is: what will K10 bring us?
I would argue that AMD's response *was* the K8. It seems that Intel is playing catch up as far as architecture goes and were not going to see these new cpus for a long time still. As long as they dont come out and say "sorry, no more x86 even though the entire damn world still uses it".
Yeah, I'm more interested in what the K10 can do better that what Conroe finally just manages to get right. Here's hoping AMD has something more than just clockspeed and cache updates coming.
The real question is how much ILP can the chip squeeze out of the code (and compilers). If intel can get more ILP and non-dependant instructions dispatched to the execution units, then they'll be ahead. I just dont know that there is more to ILP to get out of current code with the technologies known about and used in today's processors. Otherwise, wider execution paths would help only if there was Hyperthreading (or some derivative) available to process two threads at once, and fill up all the execution units with instructions to perform.
Otherwise you might as well just use the extra die space and go multicore or hybrid multicore (main cores plus specialized cores for TCP/IP offloading, encryption, etc).
Which is why we say speculatively that it can go either way. 4-wide or 3-wide? I'd say it's 50-50 which one Conroe will be. What you say about HyperThreading is a good reason to pursue 4-wide, though. Take the current NetBurst HTT hack and make it into a more useful SMT design (like in POWER5 I think). Go with fully independent queues, maybe even split up caches. There's not that much point in going from a 1MB cache to 2MB cache IMO. Imagine HTT with each threading core getting its own 1MB L2 (that would be as fast as the Prescott L2 rather than the slower Prescott-2M L2).
Combined with more execution units, you could potentially increase performance of the core by 50% or more in multitasking scenarios without having to go all the way for four independent cores. I mean, current HTT doesn't add more than 5% to the die size. A second core doubles the die size. Take an in-between approach and go with a 15% increase to get a robust SMT solution, and you can get most of the benefits of SMP with far fewer transistors, right?
(Note: I am NOT a CPU designer, so maybe I'm totally wrong about what can and can't be done. The above sounds reasonable to me, however.) :)
No, not at all. Next year won't produce massive turbulences in microprocessor market. We should better keep our eyes open for 2007, 'cause quad core is on the horizon.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
33 Comments
Back to Article
IntelUser2000 - Monday, August 22, 2005 - link
First Itanium is 6-wideItanium 2 is 6-wide
Itanium 2 doesn't increase the issue rate, what Itanium 2 does is increase the possibility that IPC of 6 is possible by making better architecture.
Brief overview of Itanium architecture: The CPU processes the EPIC instructions by using two bundles of 3 instructions each, therefore achieving IPC of 6. Each bundle can have a certain combination of different instructions.
Main execution units in Itanium consists of 4 different kinds, that is Branch unit, Floating Point Unit, Memory Unit, Integer Unit. Memory and Integer unit can be considered in simple terms as ALU from what I understand.
In one bundle, you can have certain combinations of those execution units. Examples may be: MMI(memory, memory, integer), MII, MIF, BBB, and such. Remember that each bundle can have that combinations, and there is like 26 combinations or so. That means if the second bundle can't have that combinations due to the lack of execution units, 6-wide isn't possible.
Itanium had 2 M units, 2 I units, 2 FP units, 3 B units. So if the first bundle is MMI and the second bundle is MMI, it can't have 6-wide execution.
According to the article I read, first Itanium can have in theory of ~3.8 IPC due to lack of execution units, and Itanium 2 have theoretical IPC of 5.6-5.7 due to more execution units, specifically 4 M units rather than 2 as in Itanium.
There are two kind of ways to run 32-bit for Itanium. One way is the hardware emulator that's in all current Itanium chips. The 32-bit performance for first Itanium runs 32-bit x86 code as worse as 66MHz 486, or good as 200MHz Pentium MMX, when Itanium is running at 800MHz. Itanium 2 has better hardware 32-bit emulator plus better overall Itanium architecture, so 32-bit performance increases to around equal to 300MHz Pentium II(1GHz Itanium 2 has twice the performance or better compared to 800MHz Itanium in native code). That's pretty bad, makes running 32-bit practically useless, and according to the review, the compatibility was not so good either, as Quake 3 wouldn't install(not that running Quake 3 on Pentium 100MHz equivalent isn't sort of a push). Plus it takes additional die space and power consumption, which is not that much but a lot for a almost useless feature.
So Intel introduced a dynamic software translator for the Itanium called IA-32EL(Execution Layer). By translating x86 instructions to EPIC instructions and optimizing them on run-time, performance improved dramatically while, taking out the need to have hardware emulator. 1.5GHz Itanium 2 with 6MB L3 cache is now equal to equivalently clocked Xeon MP(with hardware it would have been equal to 450MHz Pentium II) or better, which isn't that bad, and much better than the hardware one.
Montecito seems to not have the hardware emulator anymore.
JarredWalton - Tuesday, August 23, 2005 - link
Dang, I *swear* I read an article on HP.com or Intel.com stating Itanium 2 was 8-wide. I can't find it anymore, but there are many saying 6-wide. Weird. Anyway, I've read plenty about the rest of the Itanium architecture, and I don't know why you're suddenly going off about it. I'll correct the issue width statement, though.Not like it matters now, as we all know Conroe is 4-wide now. (I really expected that to be the case, but was told to make it less certain and more speculative for the article.)
IntelUser2000 - Thursday, August 25, 2005 - link
http://www.intel.com/design/itanium2/datashts/2509...">http://www.intel.com/design/itanium2/datashts/2509...The intro shows that its 6-wide, 8-stage pipeline deep architecture. 8 does stand for something but I forgot what. I babbled on because it wasn't directed all at you, but I hoped somebody who didn't know and want to know may look at it.
JarredWalton - Friday, August 26, 2005 - link
Argh! WTF is going on? Am I senile? I'm positive I read something about Itanium 2 (McKinley, etc.) being more than 8 pipeline stages. It stated something about the 8 stages of Merced being part of the reason Itanium 1 never reached higher clock speeds. Damn... people must just make stuff up about these architectures. :|IntelUser2000 - Thursday, September 1, 2005 - link
Itanium "Merced" is 10 stage pipelines. Nearly everyone that looked at the architecture said it was a bloated design, that was released in haste. By improving design tremendously over Itanium, Itanium 2 Mckinley reduces that to 8 stage pipeline while clocking 25% higher at the SAME process.
Itanium-800MHz, 0.18 micron, 10 stage pipeline, 9 stage branch miss stages
Itanium 2-1GHz, 0.18 micron, 8 stage pipeline, 7 stage branch miss stages
nserra - Friday, August 19, 2005 - link
I agree IntelUser2000, but even so, if each core used c&q with some disable core capability, would be in the 30W per core range (120W total) right on track with prescott 2M and Pentium D.I don’t know if you noticed, but amd added more power to their designs while their processor are consuming less.... that must be because:
Good reasons first:
-amd will achieve higher clock speeds 3.4 GHz and up
-amd is already thinking in 4 cores processors
Bad reasons:
-amd will come with some bad 65nm tech
-or will come with some bad core (M2 with rev.F prescott like)
dwalton - Monday, August 15, 2005 - link
"Intel Q3'05 Roadmap: Conroe Appears, Speculation Ensues"
I almost spit my coffee onto the keyboard when i read that title. Came off to me as Intel released a roadmap showing the Conroe release in the third quarter of this year.
JarredWalton - Monday, August 15, 2005 - link
Sorry to disappoint. :pIntel's lead time on the roadmap is about 18 months, though the initial details are often lacking. With Conroe/Merom being a new architecture, I doubt Intel will do so much as mention a clock speed without NDAs.
IntelUser2000 - Friday, August 12, 2005 - link
Intel's 45nm is supposed to signal high-K, metal gates, and possibly tri-gate transistor structure. By using tri-gate, its supposed to be fully depleted substrate from the start. So, if they implement what they say they will according to their presentations:-High-K
-Metal
-Tri-gate, which brings FD-SOI
We should see Yonah before worrying about Conroe. The specs of Yonah is pretty interesting.
JarredWalton - Saturday, August 13, 2005 - link
Yonah looks interesting in some ways, but as far as I can tell it's just Dothan on 65nm with dual cores, improved uops-fusion, and hopefully better FP/SIMD support. I haven't even heard anything to indicate it will have 64-bit extensions, which makes it less than Conroe in my book. Not that 64-bit is the be-all, end-all, but I'm pretty sure I've bought my last 32-bit CPU now. I'd hate to get stuck upgrading for Longhorn just because I didn't bother with a 64-bit enabled processor. Bleh... Longhorn and 64-bit is really just hype anyway, but we'll be forced that way like it or not. Hehehe.IntelUser2000 - Thursday, August 18, 2005 - link
And can u tell me how that's not significant?? Yonah isn't like Smithfield's slap-on dual core, because it has arbitration logic to manage data between two cores. And even compared to A64 dual core, its not just dual core + SRQ-like, it has bunch of other enhancements which strengthen the weakness(FPU/SSE).
To: nserra
HT takes less than 5% die size, of course IMC is good, but Pentium 4 can have IMC too. I think HT and IMC is good in their own different ways.
Cache consumes low power, and takes little die space compared to number of transistors used. If you take 4 core on Athlon64 today on the 90nm, Prescott will look cool running compared to it.
The 6MB cache in Itanium 2 takes 60% die size but only 30% power consumption.
nserra - Friday, August 19, 2005 - link
I agree IntelUser2000, but even so, if each core used c&q with some disable core capability, would be in the 30W per core range (120W total) right on track with prescott 2M and Pentium D.I don’t know if you noticed, but amd added more power to their designs while their processor are consuming less.... that must be because:
Good reasons first:
-amd will achieve higher clock speeds 3.4 GHz and up
-amd is already thinking in 4 cores processors
Bad reasons:
-amd will come with some bad 65nm tech
-or will come with some bad core (M2 with rev.F prescott like)
coldpower27 - Saturday, August 13, 2005 - link
Yeh, from current rumors Yonah is having every check box feature besides EM64T :)Like I have said I can't wait till Intel brings Conroe technology, as I always have like going the Intel route, but I don't want to go for NetBurst based processors.
45nm generation looks to be quite the change for Intel, as they are moving to those tri-gate trasistors, High K, and FD-SOI, tohugh I beleive it would be introduced at the end of 2007 rather then mid at the earliest, Conroe is expected to debut on 65nm technology, hopefully it doens't need to get an optical shrink to get good like NetBurst did and is good from the get go, like Athlon 64 was.
IntelUser2000 - Friday, August 12, 2005 - link
I heard due to the limits of the Trace Cache throughput, it only can achieve IPC of 2, not 3, so even in theory, Pentium 4 only reaches IPC of 2.About the Hyperthreading technology, I sort of disagree. If the design of the microprocessor is made to accomodate such multi-threading technology, they don't need to put 24% increase in die size like Power 5 did. I heard only with 5% increase in die size, Alpha EV8 was supposed to have performance increase of 2x, which happens to be greater than by putting another core!!!
Pentium 4's HT takes LESS than 5% die size.
nserra - Wednesday, August 17, 2005 - link
Well if you think that the 5% die for HT is very well spent, what about the 5% of the AMD Athlon64 on the integrated memory controller.JarredWalton - Saturday, August 13, 2005 - link
In practice, I'd guess that NetBurst averages an IPC of around 1.3 overall. I'd say Athlon 64 is closer to 2.0. Obviously just a guess, but when you consider how a 2.4 GHz A64 3800+ compares to the P4 3.6 (570), that seems about right. Heck, P4 might even be 1.1 to 1.2 IPC on average if K8 is 2.0. Branch misses kill IPC throughput on NetBurst, for example.We also don't know precisely (well, I don't) what the various traces represent. It could be that many traces actually take up two of the "issue slots", as traces don't have to be a single micro-op.
HyperThreading in NetBurst is really pretty simplistic. It also doens't really help improvement much except in very specific circumstances. I can't imagine any SMT configuration actually providing a bigger boost than SMP, though. (Otherwise everyone would already be doing it, rather than just NetBurst and high-end Server chips.) I seriously doubt that a 5% die space increase would be able to get more than a 10% performance increase. 10% I could see being 20 to 30%, and 15% could be 50% or more - of course, all just guesses and all under specific tests.
If you're not running multiple CPU-intensive threads, any form of SMT helps as much as SMP, which is to say not at all. Basically, this is all just guessing right now anyway, so there's no point in worrying about it too much. I have to think that Intel can get MUCH better performance with the next architecture than anything they've currently got, though. 2MB+ cache on CPUs is a lot of wasted space that could be better utilized, IMO.
nserra - Wednesday, August 17, 2005 - link
Yeah I completely agree!!
I was hoping AMD would release a 4 core processor with 128KB L2 cache for each core. That would give almost the same transistor count of 2 cores with 1MB L2. But “a lot” more speed.
Of course in MARKETING, having a processor with a total of 512KB L2 cache would be a budget one, but for me a excellent efficient design.
IntelUser2000 - Tuesday, August 16, 2005 - link
Well, a point to make is this: because the designers of Alpha CPUs managed beat every other CPU at every generation and every process generation, having simpler core, then its likely that the future generation would have done so too.Its not that companies are not using SMT because they don't know the benefits of SMT, its that they don't know how to make it good. Did you think it made sense for Intel do make Prescott core? IBM looks like best doing at SMT because they are only one of the two that actually uses SMT nowadays, the other being Intel at desktop chips. Plus, server chip design are usually pushed to their technical limits, while desktop chips are made for mainly mass production and profit.
(Exception is Itanium code-name Montecito's multi-threading, since it uses different form of it)
IntelUser2000 - Tuesday, August 16, 2005 - link
About the IPC, in theory P4 can output IPC of 2 and Athlon 64, three. So even with same branch misses, in theory Pentium 4 will be slower than Athlon 64, not to mention on the real one, it adds branch misses.About SMT, look here: http://www.realworldtech.com/page.cfm?ArticleID=RW...">http://www.realworldtech.com/page.cfm?ArticleID=RW...
"The enormous potential of SMT is shown by the expectation that it can approximately double the instruction throughput of an already impressive monster like the EV8 at the cost of only about 6% extra die area over a single threaded version of the design. That is a bigger speedup than can be typically achieved by duplicating the entire MPU as done in a 2 way SMP system!"
Though it looks as P4's multi-threading is a simple one not destined to take advantage of the architecture, its more the other way around.
Pentium 4 with limited IPC throughput(2 max), limited number of registers(8 and 16 in 64-bit), limited bandwidth, is crippling HT's ability.
Alpha EV8 was supposed to have IPC of 8 in theory, 1024 registers(!!), integrated memory controller with 20GB/sec bandwidth per CPU, and the architecture that was developed to take advantage of SMT from the beginning shows its full benefits.
Next-gen Itanium with multi-threading is a different story. Montecito doesn't use SMT, it uses different form of multi-threading, so its not really comparable.
Horshu - Friday, August 12, 2005 - link
Does Conroe's roadmap intersect early on with the 45 nm process (2007)? That was the point at which Intel was supposed to migrate over to the new high-K/metal transistor gates, although I recall something about those plans being dropped while Intel works on a new high-K process. The new gates were supposed to dramatically reduce heat dissipitation, although I have no idea what to expect from the new high-K they are working on.JarredWalton - Friday, August 12, 2005 - link
Heheh... 45nm? Let us reach 65nm first. ;)The roadmaps don't typically go that far out. Once the transition to 65nm is complete, we'll probably start getting information on their 45nm transition. Rough guess would be that it will launch around 18 months after 65nm, so mid-2007 give or take.
nserra - Wednesday, August 17, 2005 - link
65nm will put intel in line with what AMD have already achived with 90nm.NFS4 - Wednesday, August 10, 2005 - link
I can't wait for the Intel Cornrows :DBitByBit - Thursday, August 11, 2005 - link
I'm interested in what AMD's response to Conroe will be.If Conroe is indeed going to be a wide-issue, efficient design, then its average IPC should easily exceed that of the K8, while its longer pipeline (in comparison with K8/Dothan) will enable it to hit higher clock speeds. The lessons Intel learned with Netburst will likely compliment its Next Generation architecture nicely, such as the importance of good branch prediction, along with innovations such as Trace Cache.
If we assume Conroe will be released at speeds in the lower 2Ghz range initially, then AMD should have time to hold out until it is ready to release K10.
The question is: what will K10 bring us?
segagenesis - Friday, August 12, 2005 - link
I would argue that AMD's response *was* the K8. It seems that Intel is playing catch up as far as architecture goes and were not going to see these new cpus for a long time still. As long as they dont come out and say "sorry, no more x86 even though the entire damn world still uses it".nserra - Wednesday, August 17, 2005 - link
Maybe the K8 on socket M2 will come with some surprises, that just the DDR2 support.ZobarStyl - Thursday, August 11, 2005 - link
Yeah, I'm more interested in what the K10 can do better that what Conroe finally just manages to get right. Here's hoping AMD has something more than just clockspeed and cache updates coming.reactor - Friday, August 12, 2005 - link
eliminate the southbridge? ;Dcoldpower27 - Wednesday, August 10, 2005 - link
Yeha I am really interested to see how this 4th architecture will relate to the other 3 Intel has.Doormat - Wednesday, August 10, 2005 - link
The real question is how much ILP can the chip squeeze out of the code (and compilers). If intel can get more ILP and non-dependant instructions dispatched to the execution units, then they'll be ahead. I just dont know that there is more to ILP to get out of current code with the technologies known about and used in today's processors. Otherwise, wider execution paths would help only if there was Hyperthreading (or some derivative) available to process two threads at once, and fill up all the execution units with instructions to perform.Otherwise you might as well just use the extra die space and go multicore or hybrid multicore (main cores plus specialized cores for TCP/IP offloading, encryption, etc).
JarredWalton - Thursday, August 11, 2005 - link
Which is why we say speculatively that it can go either way. 4-wide or 3-wide? I'd say it's 50-50 which one Conroe will be. What you say about HyperThreading is a good reason to pursue 4-wide, though. Take the current NetBurst HTT hack and make it into a more useful SMT design (like in POWER5 I think). Go with fully independent queues, maybe even split up caches. There's not that much point in going from a 1MB cache to 2MB cache IMO. Imagine HTT with each threading core getting its own 1MB L2 (that would be as fast as the Prescott L2 rather than the slower Prescott-2M L2).Combined with more execution units, you could potentially increase performance of the core by 50% or more in multitasking scenarios without having to go all the way for four independent cores. I mean, current HTT doesn't add more than 5% to the die size. A second core doubles the die size. Take an in-between approach and go with a 15% increase to get a robust SMT solution, and you can get most of the benefits of SMP with far fewer transistors, right?
(Note: I am NOT a CPU designer, so maybe I'm totally wrong about what can and can't be done. The above sounds reasonable to me, however.) :)
snedzad - Wednesday, August 10, 2005 - link
No, not at all. Next year won't produce massive turbulences in microprocessor market. We should better keep our eyes open for 2007, 'cause quad core is on the horizon.Thatguy97 - Tuesday, June 16, 2015 - link
and conroe changed the industry