Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'll admit to some reflexive skepticism here. I know GeekBench at least used to be considered an entirely unserious indicator of performance and any discussion relating to its scores used to be drowned out by people explaining why it was so bad.

Do those criticisms still hold? Are serious people nowadays taking Geekbench to be a reasonably okay (though obviously imperfect) performance metric?



I verified Geekbench results to be very tightly correlated with my use case and workloads (JVM, Clojure development and compilation) as measured by my wall times. So yes, I consider it to be a very reliable indicator of performance.


Curious how you verified that? I should possibly do thesame


Run Geekbench on a sample of hardware. Run your workload along same hardware. Regress.


That's not very scientific at all. With how close the CPUs are, how would you compare the tiny differences?


Run my compilation on a CPU, note down the time it took and the Geekbench score for that CPU.

Run the compilation on another CPU, note down the time it took and the Geekbench score for that CPU.

Now look at the ratios — if Geekbench scores implied the faster CPU was, say, 20% faster, is my compilation 20% faster?

I'm looking at my notes and without digging too much, I can see two reasonably recent cases: 30% faster compilation (Geekbench said 30%), and another one: 40% faster compilation (Geekbench said 38%).

So yes, I do consider Geekbench to be a very reliable indicator of performance.


You’re not the only one — but just curious where this skepticism comes from.

This is M4 — Apple has now made four generations of chips and each one were class leading upon release. What more do you need to see?


> Apple has now made four generations of chips and each one were class leading upon release.

Buying up most of TSMC's latest node capacity certainly helps. Zen chips on the same node turn out to be very competitive, butAMD don't get first dibs.


It’s more like Apple fronts the cash for TSMC’s latest node. But regardless, in what way does that detract from their chips being class-leading at release?


Because the others can't use that node, there are no others in that same class. If there was a race, but one person is on foot, and the other is in a car, it's not surprising if the person in the car finishes first.


Eventually Apple moves off a node and the others move on.

People pretend like this isn’t a thing that’s already happened, and that there aren’t fair comparisons. But there are. And even when you compare like for like Apple Silicon tends to win.

Line up the node, check the wattages, compare the parts. I trust you can handle the assignment.


I have some sympathy with this view because it's in no way mass market.

Nevertheless, product delivery is a combination of multiple things of which the basic tech is just one component.


I disagree with the wording "AMD don't get first dibs". It's more like "AMD won't pay for first dibs"


I don’t think a lot of people fully understand how closely Apple works with TSMC on this, too. Both in funding them, holding them accountable, and providing the capital needed for the big foundry bets. It’s kind of one of those IYKYK things, but Apple is a big reason TSMC actually is the market leader.


If that's all we cared about we wouldn't be discussing a Geekbench score in the first place. The OP could have just posted the statement without ever mentioning a benchmark.

I was just curious if people had experience with how reliable Geekbench has been at showing relative performance of CPUs lately.


I don’t think they are skeptical of the chip itself. Just asking about the benchmark used.

If I was reviewing cars and used the number of doors as a benchmark for speed, surely I’d get laughed at.


Right but we keep repeating this cycle. A new M series chip comes, the geekbench leaks and its class leading.

Immediately people “but geEk BeNcH”

And then actual people get their hands on the machines for their real workloads and essentially confirm the geekbench results.

If this was the first time, then fair enough. But it’s a Groundhog Day style sketch comedy at this point with M4.


I blame it on the PC crowd being unconsciously salty the most prestigious CPU is not available to them. You heard the same stuff when talking about Android performance versus iPhone.

There is a lot to criticize about Apple's silicon design, but they are leading the CPU market in terms of mindshare and attention. All the other chipmakers all feel like they're just trying to follow Apple's lead. It's wild.


I was surprised and disappointed to see that the industry didn’t start prioritizing heat output more after the M1 generation came out. That was absolutely my favorite thing about it, it made my laptop silent and cool.

But anyway, what is it you see to criticize about Apple‘s Apple Silicon design? The way RAM is locked on package so it’s not upgradable, or something else?

I’m kind of surprised, I don’t hear a lot of people suggesting it has a lot to be criticized for.


It was wild to see the still ongoing overclocking Ghz competition, while suddenly one could use a laptop with good performance, no fans, no noise and while using it mobile.


The lack of multiple display support early on. The M2 generation produced much more heat than the M1, but that could be the new Macbook Air. GPU weakness with the bigger chips.


By the way, have you heard about the recent Xiaomi SU7 being the fastest 4-doors car on the Nurburgring Nordschleife?

It has 4 doors! It’s all over the shitty car news medias. The car is a prototype with only one seat though.


In power efficiency maybe, but not top performance


Literally yes top single core performance. (And incidentally also efficiency)



I don’t see the M4 on any of these charts


Why would that be necessary to prove the series has top performance?


As demonstrated by the M1-M3 series of chips, essentially all of that lead was due to being the first chips on a smaller process, rather than to anything inherent to the chip design. Indeed, the Mx series of chips tend to be on the slower side of chips for their process sizes.


Show your work.

Most people who say things like this tend to deeply misunderstand TDP and end up making really weird comparisons. Like high wattage desktop towers compared to fan-less MacBook Airs.

The process lead Apple tends to enjoy no doubt plays a huge role in their success. But you could also turn around and say that’s the only reason AMD has gained so much ground against Intel. Spoiler: it’s not. Process node and design work together for the results you see. People tend to get very stingy with credit for this though if there’s an Apple logo involved.


Geekbench is an excellent benchmark, and has a pretty good correlation with the performance people see in the real world where there aren't other limitations like storage speed.

There is a sort of whack-a-mole thing where adherents of particular makers or even instruction sets dismiss evidence that benefits their alternatives, and you find that at the root of almost all of the "my choice doesn't win in a given benchmark means the benchmark is bad" rhetoric. Then they demand you only respect some oddball benchmark where their favoured choice wins.

AMD fans long claimed that Geekbench was in cahoots with Intel. Then when Apple started dominating, that it was in cahoots with ARM, or favoured ARM instruction sets. It's endless.


Any proprietary benchmark that's compiled with the mystery meat equivalent of compiler/flags isn't "excellent" in any way.

SPECint compiled with either the vendor compiler (ICC, AOCC) or the latest gcc/clang would be a good neutral standard, though I'd also want to compare SIMD units more closely with x265 and Highway based stuff (vips, libjxl).

And how do you handle the fact that you can't really (yet) use the same OS for both platforms? Scheduler and power management counts, even for dumb number crunching.


Good points. gemma.cpp can also be an interesting benchmark, it also uses Highway.


Geekbench is a highly regarded benchmark because it effectively reflects the overall performance of various platforms as experienced by the average user. By "platform," we mean the combination of hardware and software—how systems are actually used in day-to-day scenarios.

Specint, on the other hand, is useful for assessing specific tasks if you plan to run identical workloads. However, its individual test results vary widely. For example, Apple Silicon chips generally perform well in Specint but might match a competing chip in one test and be three times faster in another. These tests focus on very narrow tasks that can highlight the unique strengths of certain instructions or system features but are not representative of overall real-world performance.

The debate over benchmarks is endless and, frankly, exhausting, as it often rehashes the same arguments. In practice, most people accept that Geekbench is a reliable indicator of performance, and I maintain it’s an excellent benchmark. You might disagree, but my stance stands.


Lots of appeal to popularity, "most people accept" a lot of things.

>Specint, on the other hand, is useful for assessing specific tasks if you plan to run identical workloads. [...] These tests focus on very narrow tasks that can highlight the unique strengths of certain instructions or system features but are not representative of overall real-world performance.

What? First, SPECint is an aggregate of 12 benchmarks (https://en.wikipedia.org/wiki/SPECint#Benchmarks), none of them synthetic in any way. They're also ranging from low to high level, it's not just number crunching. Sure, it's missing stuff like browser benchmarks to better represent the average user, but it's certainly not as useless as what you seem to imply.

Any "system wide" benchmark is aggregating too much into a single number to mean anything, in any case.

And this subthread is about using benchmarks to compare HARDWARE, not whole systems, so this discussion is pretty much meaningless.


I never said SPECInt was synthetic though did I? What are you arguing against?

Yet a benchmark of how Xalan-C++ transforms XML documents has shockingly little relevance to most of the things I do. And the M1 runs the 400.perlbench benchmark slower than the 5950X, yet it runs the 456.hmmer benchmark twice as quickly, both I guess mattering if I'm running those specific programs?

As with the strawman that I said it was synthetic, I also didn't say it was useless. Not sure why you're making things. It's an interesting benchmark, but most people (yup, there's that appeal again) find Geekbench more informative.

And, again, most people, including the vast majority of experts in this field, respect geekbench as a decent broad-spectrum benchmark. As with all things there are always contrarians.

>And this subthread is about using benchmarks to compare HARDWARE, not whole system

Bizarre. This submission is specifically about Geekbench, specifically about the M4 running, of course, macOS. This subthread is someone noting that they can't escape the negatron contrarians who always pipe up with the No True Benchmark noise.


I'd be reflexively skeptical if I didn't have a M1 Mac. It really is something.


I'm not skeptical of Apple's M-series chips. They have proven themselves to be quite impressive and indeed quite competitive with traditional desktop CPUs even at very low wattages.

I'm skeptical of Geekbench being able to indicate that this specific new processor is robustly faster than say a 9950x in single-core workloads.


It's robustly faster at the things that Geekbench is measuring. You can find issue with the test criteria (measures meaningless things or is easy to game) but the tests themselves are certainly sound.


> You can find issue with the test criteria (measures meaningless things or is easy to game).

That's exactly their point.


On the other hand, I have yet to see any benchmark where people didn’t crawl out of the woodwork to complain about it.


It'll still be at the top of SPECint 2017 which is the real industry standard. Geekbench 6.3 slightly boosted Apple Silicon scores by adding SME - a very niche instruction set extension which is never used in SPECint workloads. So the gap may not be as wide as GB6.3 implies.


Does SPECint cover heavily memory bound pointer chasing stuff? Not up to date.



Did no one check the scores? They're not the top consumer CPU by quite a range. It's probably the best power per watt, but not the most powerful CPU.


The performance-per-watt isn’t necessarily the best. These scores are achieved when boosted and allowed to draw significantly more power. Apple CPUs may seem efficient because, most of the time, computers don’t require peak performance. Modern ARM microarchitectures have been optimized for standby and light usage, largely due to their extensive use in mobile devices. Some of MediaTek and Qualcomm's CPUs can offer better performance-per-watt, especially at lower than peak performance. The issue with these benchmarks is that they overlook these nuances in favor of a single number. Even worse, people just accept these numbers without thinking about what they mean.


M4 is also an ARM architecture, why would Qualcomm's be more efficient?


Benchmarking itself is of limited usefulness, since in reality it is only an internally-relevant comparison.

Translating these scores into the real world is problematic. There are numerous examples of smart phones powered by Apple chips versus Qualcomm chips having starkly different performance with actual use. This is in spite of the chips themselves scoring similarly in benchmarks.

The interesting thing here isn't really how high it's scored against other chip brands, but how it outperformed the M2 Ultra. There was some hum of expectation on HN that the differences between M1, M2, M3 etc would be token and that Apple's chips devision is losing its touch. Yet the M2 Ultra in the Mac Studio was released in June 2023, and the M4 Pro in the mini now for November 2024. That is quite the jump in performance over time and a huge change in bang for buck.


It's by no means a be all end all "read this number and know everything you need to know" benchmark but it tends to be good enough to give you a decent idea of how fast a device will be for a typical consumer.

If I could pick 1 "generic" benchmark to base things off of I'd pick PassMark though. It tends to agree with Geekbench on Apple Silicon performance but it is a bit more useful when comparing non-typical corner cases (high core count CPUs and the like).

Best of all is to look at a full test suite and compare for the specific workload types that matter to you... but that can often be overkill if all you want to know is "yep, Apple is pulling ahead on single thread performance".


GB6 is great. Older versions weren’t always very representative of real workloads. Mostly because their working data sets were way too small.

But GB6 aligns pretty well with SPEC2017.


You are thinking of AnTuTu.


If it shows a good result for Apple then it's perfectly accurate, otherwise it's flawed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: