Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Apple mainly does two things: Much larger caches ($$$ but they have the margins) and memory inside the CPU package (shorter and faster connections, but can't upgrade memory).

[Edit] + Buying up all state of the art production capacities so competition is one node behind.

There is no Apple secret sauce.

As long as the others don't want to go that route - and they seem not to be in need to cut into their margins (AMD shows how X3D helps with performance).

I think what is interesting especially for Intel/AMD is that Xiaomi drops legacy 32 bit ARM and translates apps to 64bit.

Dropping 16/32bits can reduce die size which can be used for larger caches for the same price.



> Buying up all state of the art production capacities so competition is one node behind.

This is such a funny statement. Do people think Apple is dumping wafers into the ocean? Or buying the capacity and not using it?

The economic reality is that Apple can pay more for cutting edge process because they have higher prices and margins. So, people paying a premium for hardware get more advanced hardware.

How is this in any way surprising? Is the theory that if only Apple wasn’t willing to pay a premium, TSMC would sell the same wafers cheaper to other manufacturers? Wouldn’t that make TSMC 1) dumb, and 2) less profitable and therefore less able to invest in the next process?


What's your point, exactly? You seem to acknowledge what the person you replied to said but it's somehow "funny".


The memory is not inside the package any more than on any other flip chip or pop soc, ie every mobile ap soc made in the past 5 years. Please stop propagating this myth.

One of Apple's actual secret sauces is they can make their big caches fast. Typically latency increases with cache size so it's a tradeoff. Apple trades off less here. And it's not some "only fast because tsmc" it's just really solid engineering at both the architectural and physical design level.


The Apple reality distortion field is in full swing:

"The memory is not inside the package"

vs.

"The SoC and RAM chips are mounted together in a system-in-a-package design." [0]

Every mobile SOC does the same? All Intel SOCs do this? Which one? Can you point out the 16Gb of RAM in this Meteor Lake SOC?

https://images.anandtech.com/doci/20046/Meteor_Lake_Hotchips...

The Wikipedia article on Meteor lake doesn't even mention memory at all [1]

[0] https://en.wikipedia.org/wiki/Apple_M2#Memory

[1] https://en.wikipedia.org/wiki/Meteor_Lake


It's a board space and cost saving measure but it does not change performance. The tooling is also expensive and Intel have their own internal mature packaging processes.

The drams on an apple chip are still bog standard lpddr. Most benchmarks find the actual memory middle of the road at best.

Critically they aren't magically on the die or any more inside the package than most other high end mobile chips.


1. "It's not in a package, stop spreading the myth"

2. "It is in a package like no other vendor, but it's not changing performance"

3. ???


It's not packaged materially differently from the other chips it's compared against. Which is what I said originally.


Have you ever seen M1 or M2 chip? here you go https://eandt.theiet.org/content/articles/2022/09/teardown-a... and M2 pro https://www.ifixit.com/News/71442/tearing-down-the-14-macboo...

ram is ordinary POP, you got lied to by Apple marketing. If you acted on this marketing and spend money then re-programming will be very difficult with brain actively fighting on every step to prevent cognitive dissonance.


https://www.anandtech.com/show/17024/apple-m1-max-performanc...

It is cool to live in the future where 243 GB/s is middle of the road.

It is still impressive that Apple pulled it off 2 years ago, IMO.


This is what mobile SoCs use. https://en.m.wikipedia.org/wiki/Package_on_a_package If trace length was a big factor surely PoP would offer even greater improvement.

Anyway the point is, this is not a meaningful performance benefit as it's still just off the shelf LPDDR5. In fact the M SoCs tend to underperform in memory latency tests.


Yes, they use one package on another, but not one package.

"Anyway the point is, this is not a meaningful performance benefit"

Do you have a benchmark to read? This "Still LPDDR5" is hand waving.


> + Buying up all state of the art production capacities so competition is one node behind.

From what I read, before that, there’s “paying billions to get state of the art production capabilities built”

Chances are that capacity wouldn’t be there without Apple’s money, so if Apple didn’t exist, it still wouldn’t be available to others as rapidly as it is now.

> There is no Apple secret sauce.

They didn’t always have loads of money, so, historically, there must have been something else than “they have loads of money and large margins, so can afford to buy the best”.

I think there still is something more than that. For example, it also is about having the courage to decide that milled aluminum is a better way to build laptop chassises, so spending billions on buying/creating the capacity to build millions of such chassises is a good idea, or to decide that, at their size, building your own CPUs is worth doing.

I think part of their secret sauce also is that they have higher standards for what they want to sell. Take for example foldable screens. They must have prototypes with them, but don’t have a product because they don’t deem them good enough.


Having fairly high standards is, I think, a consistent perk of theirs. In particular, they don’t seem to let anything slip below a sort of entry-level enthusiast quality; might not be the best at anything in particular but there’s nothing the Apple device will be truly awful at.

But the Apple that stayed alive in the 90’s-early 00’s is pretty different from modern Apple. Modern Apple makes some of the best chips out there. Old Apple stayed afloat by selling a Unix clone on commodity x86.


> There is no Apple secret sauce.

Except for secret sauce like super wide instruction decode and enough registers to keep all their execution units filled[0], sure I guess there's no secret sauce.

Caches are only useful when they're serving execution units and Apple packed their chips with them. That's special sauce. If it wasn't special then every ARM chip would have the same levels of performance. It's not like the M1 was Apple's first chip. The A-series have been kicking the shit out of other ARM chips for almost a decade. If Apple didn't have any special sauce in their chip designs this wouldn't have been the case. It's not like Qualcomm doesn't have good chip designers and hasn't tried to compete with Apple's chips.

[0] https://news.ycombinator.com/item?id=25257932


Super wide instruction decode won't help you much unless you're able to feed and retire those instructions at a consistent pace. This means being able to keep you ALU busy and for that to happen, there's plenty of problems one has to solve but two major bottlenecks in CPU design are (1) branch-prediction in the CPU frontend and (2) hiding the memory latency in the CPU backend. Both of those are tightly coupled to the instruction- and data-cache design.

Coincidentally, both of those caches in Apple M design are unusually large - 192KB for instruction cache size and 128KB L1 data cache size - per core (!). The same goes for L2 cache size - 3MB per (performance) core.

When compared to bleeding edge _server_ CPUs from AMD and Intel, it's crazy to see that those figures are by several _magnitudes_ larger in the Apple M design. E.g. Zen3 Epyc - 32KB of instruction cache size, 32KB of L1 data cache size and 512KB L2. Intel Xeon Gold - 32KB of instruction cache size, 64KB of L1 data cache size and 1.25MB of L2.


> Buying up all state of the art production capacities so competition is one node behind.

M2 is a 5nm (N5P) chip, AMD laptops already use 4nm.


> There is no Apple secret sauce.

I don't think that's quite true. It's clearly a combination of better microarchitecture (very wide decode, 128 byte cache lines, etc), and also massively bigger area budgets. Maybe more the latter, but it's pretty clear that Apple is right at the top of the "good microarchitecture" leader board.


> and memory inside the CPU package (shorter and faster connections, but can't upgrade memory).

So just like every other non-server ARM SoC?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: