This can be very interesting especially if this can be expanded to other products.
TSVs increase the pricing of certain products by a substantial margin and have pretty high failure rates, they also increase the internal resistance of components and can cause thermal issues.
HBM and 3Dxpoint built on EMIB can reduce the price of these components substantially, it can also make it viable again to split the dies of the IGP and the CPU allowing Intel have SKUs with different IGP configuration including EDRAM/SRAM without having to have multiple die designs.
I don't think the cost of multiple die designs is significant for Intel since they amortize it over very high volume. (For example, they updated the Celeron/Pentium/i3 line to Kaby Lake even though customers probably won't even notice.) EMIB looks like it would always be more expensive than a single die or MCM so I don't see the benefit for consumer chips.
EMIB allows you to integrate dies that are built using different processes which means that you don't have to wait for all of your processes to align.
If you can't produce reliable SRAM at 10nm no problem, if you want to integrate FPGAs that are build using a different process no problem, you want to scale up the same basic core design of 4 core clusters again just make the same small dies and interpose them based on market demands.
This will allow Intel to sell Xeon SoCs/CPU's with 128 to 256 core which isn't viable if you put them all on the same die, it also means that they don't have to disable cores for different SKU's now if they want to make a 10 core CPU they just put 10 cores and not 12 with 2 of them disabled.
People also overlook the cost of the masks, a set of master masks cost a lot of money even their copies cost a small fortune and as we go into more and more intensive processes as far as radiation goes the lifespan of these masks shrinks in an ever increasing pace.
I'm assuming this might also mean they can leverage their legacy fabs to a greater degree for non-critical components in competitive spaces? That seems like a huge advantage.
It sounds like a move from "Do we assume the next node will be ready for our most important chip in X years?" to "What parts of chips is the next node currently ready for?"
Fabbing smaller dies on lower-yield, leading edge processes would also be nice. Curious what the trade-offs will look like in terms of power/thermal vs a monolithic die design.
The probability of a major flaw that forces you to throw away the whole chip goes up with die size, so presumably EMIB would allow Intel to manufacture and test smaller chips individually before assembling them into a bigger processor, and that may result in less waste and lower cost.
It also might mean large-core chips could be available sooner on new manufacturing processes (which tend to have a high failure rate until they're thoroughly debugged).
> For example, they updated the Celeron/Pentium/i3 line to Kaby Lake even though customers probably won't even notice.
The difference between Sky Lake Celeron and Kaby Lake Celeron is small, but it's not like it's a big difference for i7 either. I don't think a lot of people are tossing out their Sky Lake CPUs and putting in Kaby Lake at any part of the performance curve; it's just if you're upgrading from something older (maybe Sandy or Ivy Bridge) or buying something new that you may want to go with the newer core design. Eg: Celeron G3920 (Skylake) vs G3930 (Kaby Lake) are both 2.9Ghz but G3930 is about $10 cheaper and may have a slightly nicer GPU.
>> make it viable again to split the dies of the IGP and the CPU allowing Intel have SKUs with different IGP configuration
Also, it allows Intel to increase the rate of innovation - for example by specifying how each silicon die(memory, interconnect, rf?, acceleration, etc) connects to the EMIB, and letting internal and maybe external groups work independently and maybe compete on better modules.
Shame this would be wasted on someone as close as Intel though.
The concept is like AMD’s Infinity Fabric but one step further.
No more IGP for high end Desktop CPU. Can easily mix and match CPU / GPU or even FPGA according to customer's requirement. I wonder why will this be on the market.
Another Point is Data Center First for new node. Why? What push them to make this decision? They literally have 95% market shares with healthy margin.
The $64 G4560 making the i3 redundant, Kabylake X, and now this. This seems like a meltdown at Intel.
I think there is a real risk if AMD deliver superb performance and value with Zen, Intel will lose a little bit of the shine and a lot of consumers will not perceive them as innovative anymore.
The sheer number of skus itself shows Intel has been on a bit of a downward spiral towards becoming something of a 'marketing company'. Kabylake has negligible to no gains over skylake making it a pure branding exercise. Risky move for reputation. Time to focus on innovation and value.
I can't wait until AMD releases Ryzen in March. I haven't owned an AMD system since socket 939, but Ryzen pricing is leaking now, and it looks like it's going to significantly undercut Intel's current prices. [0]
Let's hope this encourages Intel to discount their current and future chips. More competition is always good for the consumer.
Lets hope the Ryzen is the real deal.I mean the i7-3610 is just as good as my i7-6700hq which was like 3 or 4 years ago. Without competition we are in trouble. I remember when Athlon64 came it made Intel come up with core2duo which was a quantum leap from p4.
I agree with you that competition is important. I disagree that the improvements in generations of i7s are as minor as they superficially seem to be.
For at least my task, compiling large amounts of C++, the difference between the different i7s is huge. I get a new expensive laptop every two years (and a cheaper machine for portability or other purposes each off year). I have a high end 1st, 2nd, 4th and 6th gen i7 each machine is measurably faster with the oldest machine taking almost 10 minutes and the newest taking just about 2 minutes to build the same codebase.
You are comparing various mobile CPUs for each generation which isn't quite the same thing most people are likely talking about. Desktop and Mobile CPUs aren't the same thing (usually).
I suspect you are comparing the improvements along the lines Intel says it can still improve (lower power chips) but its simply because you keep buying laptops vs. desktops. If you had been buying desktops, I suspect you wouldn't notice the improvement.
My desktop I built 4 years ago compiles within ~20% as my desktop at work with its shiny new Xeon.
Every machine has a raid 1 of "high performance" SSDs. Most have Corsair force3 GTs. The 4th gen is an untrabook with room for only one SATA3 ssd, so I put a Mushkin M3 drive in in for the raid. The Newest machine some other SSD that is just slightly faster. The read speeds are something like this 1st gen -> 1.2GB/s, 2nd gen -> 1.2GB/s, 4th gen -> 1.1GB/s (the M3 drive slows it slightly), 6th gen -> 1.3GB/s.
Each machine is faster than the previous by a wide margin, far more than the proportional speed difference from the slowest to the fastest. There is almost a whole minute between (3m to 2m) from the 4th to 6th.
The first gen has 2 cores with hyperthreading (looks like 4 in gnome system monitor), the quad cores with hyperthreading where far out of my price range. All the rest, the 2nd, 4th and 6th gen have 4 cores and hyperthreading (looks like 8 in gnome system monitor).
For reference each of these was around $1,600 plus or minus $200, except the 6th gen. The 6th also has an nVidia 980 (not the normal mobile version) and 4 total hard drives. The other two drives are large mechanical disks that I only mount when I am doing stuff with movies or other large offline data, I do not believe they have any impact on the benchmark. This machine cost $3,200 after all the crap I did to it.
Do SSDs make any noticeable difference? I think I once read a post (maybe from Joel) where they bought SSDs and the difference in compilation time was negligible.
I think it might even depend on the build complexity. It is possible to write code that requires insane amount of CPU time with the write (wrong?!) kind of C++ template metaprogramming.
The stagnation in desktop single-core performance is taking a toll on my productivity as software continues to bloat. Yes, I have a massive number of cores in my desktop workstation, but most are idle at any given moment. With CPU-wasteful single-threaded software, there are far too many applications that routinely fail to keep up with my input.
I know it's popular to advise new programmers to avoid concurrency because it's a "hard problem," but the wasted cores of my CPU keep screaming for something to do. They yell: "It's a shame so few people write multithreaded applications! Alas, back to sleep for me."
Well, why don't you learn a concurrent language like Go, CUDA, or OpenCL and do something about that?
From my vantage point, I remain amazed that people have fled concurrent programming in an age where the hardware for doing so has not only thrived, but which may also be the only path forward from here.
I write plenty of multithreaded code of my own. What I am grieving is our CS education teaching new programmers that they should avoid concurrency because it's "too hard." I don't have time to make Thunderbird use more CPU cores so that its user interface doesn't block at times.
Go isn't comparable to CUDA or OpenCL in terms of parallelism, which is what matters here. Go doesn't even have SIMD available without writing assembly.
Only for some interpretations of Moore's law. The strictest readings count transistors in CPUs. But plenty take it as a general prediction in the doubling of performance of hardware.
Computing things still get cheaper as more tech gets included. For example, Someone working in AI or video games can drop a $1000 on a Cuda enabled GPU and get more than a 10x more performance increase then they could reasonably get with that $1000 3 years earlier. This is much more than the two doublings some interpretations of Moore's law predicts.
Not all performance increases are going to be inside the most complex part.
It's important to note that none of the processes actually describe a physical size anymore.
They used too now it's just PR, a 10nm 14nm or 7nm processes don't actually have anything that is the process "size" in size.
Together with various optimization the process name is "analogous" to a hypothetical CMOS lithography with of that scale.
When transistors stop shrinking- whether that's at 10, 5 or 7nm- the limit is going to be economic, not physical. Intel probably could make a chip with 1nm transistors; the question is whether it would ever deliver a positive ROI, since
- the R&D and the fabs would probably cost a zillion dollars, in a shrinking market for high-performance CPUs
- yields would probably be very low, even as the process matured - this problem is becoming more and more pronounced with the latest generations
- the advantages over the current chips would not be great; note svantana's comment about diminishing returns with the most recent process shrink
Well, for chips that are, say, twice as fast as the best current chips for three times the money, I suspect that the market is in fact shrinking, because for most people, current chips are good enough.
There's still plenty of people who want more speed badly enough to pay for it, but my impression is that there aren't as many as there used to be.
Or in other words, the market for single-chip performance is shrinking.
The hyperscale folks care about how much physical space given performance takes, but that's likely more energy/performance than chipcount/performance constrained.
Agreed, there's still plenty of people hitting pain with Image/Video processing, 3D development (design or programming), 4K video/gaming/modelling are definitely still out there...
Upgraded to an i7-4790K almost 2 years ago, and while not willing to drop another $1500 to upgrade to the latest CPU/MB, I have been keeping an eye open.
There's a BIG difference between transistors being developed and having sustainable manufacturing processes and good yields. Intel actually taped out an internal chip on 10nm, but then "mysteriously" scrapped it. Likely because of bad yields...
And that's not even dealing with the demon of quantum tunneling which IMO will likely kill node shrinks even harder.
Finally, the other comment is right, Dennard scaling is dead, and thermals have become the primary constraint in chip design. FinFETs have somewhat helped with this, but it's not enough.
Isn't the real problem that process shrinks have stopped giving performance gains? Broadwell halved the logic area of Haswell (see slide in article), but benchmarks showed improvements of only a few percent. Surely that stagnation can only get worse at 10 nm. If Intel actually saw a huge business case for 10 nm, my guess is they would have found a way by now.
>Broadwell halved the logic area of Haswell (see slide in article), but benchmarks showed improvements of only a few percent.
in the presence of competition the Broadwell chip would have double the number of cores of the Haswell chip it was replacing, and thus the modern multithreaded benchmarks would have shown double the performance. Current situation clearly showcases what happens when there is no competition (the experience very familiar to anybody who saw/rode/drove a USSR/Russian made car or aware of the Russian presidential elections history of the last 17 years :)
Big laptop i7s have something like a 50W TDP. It's quite a lot, even for a somehow "heavy" laptop... You can do maybe twice more on extra heavy laptop (for hardcore gamers) that probably only few people want.
For more than a decade, the Semicon industry has been trying to extend Moore's Law with CO2 laser technology that essentially has not had a generational advance since Moore's article was published. The laser wasn't the only issue, but now that those other barriers are being overcome, it will be the prime one. Far from acceptable cost-of-ownership, and inability to scale, have been known laser limiting factors from the beginning. That makes the hesitancy of companies, such as Intel, ASML/Cymer and Trumpf, to invest in CO2 laser breakthroughs that have been presented to them, more than shocking.
The minimum commercial viability of 250W EUV@IF (Intermediate Focus) has still not been achieved. There is no chance of them scaling to the 1kW EUV@IF that Intel has stated is needed for truly sustainable viability.
Why don't you see Moore's law progressing beyond 3nm and maybe even stopping at 7nm or 10nm?
At that size, defects of just a couple silicon atoms (approximately 0.2nm diameter) starts to become an issue. Photo lithography requires higher frequencies of "light" for smaller features, which also becomes more of a pain to deal with.
And that doesn't even get into the quantum effects at that scale, which is whole 'nother barrel of headaches.
I'm still surprised at the (seemingly to me) lack of research in molecular nanotechnology. We're rapidly approaching the point where we do need to precisely control the placement of each individual atom for these products.
You'd think the boards of directors and major investors would be pushing these companies harder to develop the next generation technologies. Though Wall Street itself barely cares about the next three months.
This can be very interesting especially if this can be expanded to other products. TSVs increase the pricing of certain products by a substantial margin and have pretty high failure rates, they also increase the internal resistance of components and can cause thermal issues.
HBM and 3Dxpoint built on EMIB can reduce the price of these components substantially, it can also make it viable again to split the dies of the IGP and the CPU allowing Intel have SKUs with different IGP configuration including EDRAM/SRAM without having to have multiple die designs.