In short, the *maximum* possible speed is the same (+/- some nitpicks), but ther...

arghwhat · 2026-01-14T14:57:38 1768402658

To be honest, I think a lot of the justification here is just a difference in standard library and ease of use.

I wouldn't consider there to be any notable effort in making thread build on target platforms in C relative to normal effort levels in C, but it's objectively more work than `std::thread::spawn(move || { ... });`.

Despite benefits, I don't actually think the memory safety really plays a role in the usage rate of parallelism. Case in point, Go has no implicit memory safety with both races and atomicity issues being easy to make, and yet relies much heavier on concurrency (with a parallelism degree managed by the runtime) with much less consideration than Rust. After all, `go f()` is even easier.

(As a personal anecdote, I've probably run into more concurrency-related heisenbugs in Go than I ever did in C, with C heisenbugs more commonly being memory mismanagement in single-threaded code with complex object lifetimes/ownership structures...)

josephg · 2026-01-15T01:55:24 1768442124

> To be honest, I think a lot of the justification here is just a difference in standard library and ease of use.

I really liked this article by Bryan Cantrill from 2018:

https://bcantrill.dtrace.org/2018/09/28/the-relative-perform...

He straight ported some C code to rust and found the rust code outperformed it by ~30% or something. The culprit ended up being that in C, he was using a hash table library he's been copy pasting between projects for years. In rust, he used BTreeMap from the standard library, which turns out to be much better optimized.

This isn't evidence Rust is faster than C. I mean, you could just backport that btreemap to C and get exactly the same performance in C code. At the limit, I think both languages perform basically the same.

But most people aren't going to do that.

If we're comparing normal rust to normal C - whatever that means - then I think rust takes the win here. Even Bryan Cantrill - one of the best C programmers you're likely to ever run into - isn't using a particularly well optimized hash table implementation in his C code. The quality of the standard tools matters.

When we talk about C, we're really talking about an ecosystem of practice. And in that ecosystem, having a better standard library will make the average program better.

mandw · 2026-01-15T03:30:26 1768447826

The only real question I have with this is did the program have to have any specific performance metric? I could write a small utility in python that would be completely acceptable for use but at the same time be 15x slower than an implementation in another language. So you do you compare code across languages that were not written for performance given one may have some set of functions that happens to favour one language in that particular app? I think to compare you have to at least have the goal of performance for both when testing. If he needed his app to be 30% faster he would have made it so, but it didn't need to be so he didn't. Which doesn't make it great for comparison.

   Edit, I also see that your reply was specifically about the point that the libs by themselves can help the performance with no work, and I do agree with you, as you were to the guy above.

josephg · 2026-01-15T04:50:35 1768452635

Honestly I'm not quite sure what point you're making.

> If he needed his app to be 30% faster he would have made it so

Would he have? Improving performance by 30% usually isn't so easy. Especially not in a codebase which (according to Cantrill) was pretty well optimized already.

The performance boost came to him as a surprise. As I remember the story, he had already made the C code pretty fast and didn't realise his C hash table implementation could be improved that much. The fact rust gave him a better map implementation out of the box is great, because it means he didn't need to be clever enough to figure those optimizations out himself.

Its not an apples-to-apples comparison. But I don't think comparing the world's fastest C code to the world's fastest rust code is a good comparison either, since most programmers don't write code like that. Its usually incidental, low effort performance differences that make a programming language "fast" in the real world. Like a good btree implementation just shipping with the language.

mandw · 2026-01-15T06:30:04 1768458604

I did feel my post was a bit unneeded when I added my edit :)

   My point about the  30% was that you mentioned that he got in rust and attributed it to essentially, better algorithms in the rust lib he used.  Once he knew that then its hard to say that rust is 'faster' but the point is valid and I accept that he gained performance by using the rust library.

   My other point was that the speed of his code probably didn't matter at the time. If it was a problem in the past he probably would have taken the time to profile and gain some more speed.  Sure you cant gain speed that can't be had but as you pointed out, it wasn't a language issue, it was an implementation of the library issue.

   He could have arbitrarily used a different program that used a good library and the results reversed.

   I also agree that most devs are not working down at that level of optimisation so the default libraries can help but at the same time it mostly doesnt matter if something takes 30% longer if that overall time is not a problem.   If you are working on something where the speed really matters and you are trying to shave off milliseconds then you have to be that developer that can work C or Rust at that level.

mjevans · 2026-01-15T06:20:31 1768458031

What I think it illustrates more is how much classic languages could gain by having a serious overhaul of their standard library and maybe even a rebrand if that's the expected baseline of a conformant implementation.

coldtea · 2026-01-15T08:27:41 1768465661

>If he needed his app to be 30% faster he would have made it so

That still validates "In short, the maximum possible speed is the same (+/- some nitpicks), but there can be significant differences in typical code" the parent wrote

johnisgood · 2026-01-15T08:30:51 1768465851

> He straight ported some C code to rust and found the rust code outperformed it by ~30% or something. The culprit ended up being that in C, he was using a hash table library he's been copy pasting between projects for years. In rust, he used BTreeMap from the standard library, which turns out to be much better optimized.

Are you surprised? Rust is never inherently faster than C. When it appears faster, it boils down to library quality and algorithm choice, not the language.

Also worth noting that hash tables and B-trees have fundamentally different performance characteristics. If BTreeMap won, it is either the hash table implementation, or access patterns that favor B-tree cache locality. Neither says anything about Rust vs C. It is a library benchmark, not a language one.

Lorkki · 2026-01-15T08:47:54 1768466874

> library quality and algorithm choice

And especially having performant and actively maintained default choices built in. With C, as described in the post you responded to, you'll typically end up building a personal collection of dusty old libraries that work well enough for most of the time.

johnisgood · 2026-01-15T09:00:50 1768467650

I think Rust projects will accumulate their own cruft over time, they are just younger. And the Rust ecosystem's churn (constant breakage, edition migrations, dependency hell in Cargo.lock) creates its own class of problems.

Either way, I would like to reiterate that the comparison is flawed at a more fundamental level because hash tables and B-trees are different data structures with different performance characteristics. O(1) average lookup vs O(log n) with cache-friendly ordered traversal. These are not interchangeable.

If BTreeMap outperformed his hash table, that is either because the hash table implementation was poor, or because the access patterns favored B-tree cache locality. Neither tells you anything about Rust vs C. It is a data structure benchmark.

More importantly, choosing between a hash table and a tree is an architectural decision with real trade-offs. It is not something that should be left to "whatever the standard library defaults to". If you are picking data structures without understanding why, that is on you, not on C's lack of a blessed standard library (BTW one size cannot fit all).

JuniperMesos · 2026-01-15T09:48:42 1768470522

> If BTreeMap outperformed his hash table, that is either because the hash table implementation was poor, or because the access patterns favored B-tree cache locality. Neither tells you anything about Rust vs C. It is a data structure benchmark.

The specific thing it tells you about Rust vs C is that Rust makes using an optimized BTreeMap the default, much-easier thing to do when actually writing code. This is a developer experience feature rather than a raw language performance feature, since you could in principle write an equally-performant BTreeMap in C. But in practice Bryan Cantrill wasn't doing that.

> More importantly, choosing between a hash table and a tree is an architectural decision with real trade-offs. It is not something that should be left to "whatever the standard library defaults to". If you are picking data structures without understanding why, that is on you, not on C's lack of a blessed standard library (BTW one size cannot fit all).

The Rust standard library provides both a hash table and a b-tree map, and it's pretty easy to pull in a library that provides a more specialized map data structure if you need one for something (because in general it's easier to pull in any library for anything in a Rust project set up the default way). Again, a better developer experience that leads to developers making better decisions writing their software, rather than a fundamentally more performant language.

josephg · 2026-01-15T11:03:49 1768475029

> the Rust ecosystem's churn (constant breakage, edition migrations, dependency hell in Cargo.lock) creates its own class of problems.

What churn? Rust hasn't broken compatibility since 1.0, over a decade ago. These days it feels like rust changes slower than C and C++.

> Either way, I would like to reiterate that the comparison is flawed at a more fundamental level because hash tables and B-trees are different data structures with different performance characteristics. O(1) average lookup vs O(log n) with cache-friendly ordered traversal. These are not interchangeable.

They're mostly interchangeable when used as a map! In rust code, in most cases you can just replace HashMap with BTreeMap. In practice, O(log n) and O(1) are very similar bounds owing to how slowly log(n) grows with respect to n. Cache locality often matters much more than a O(log n) factor in your algorithm.

If you read the actual article, you'll see that Cantrill benchmarked his library using rust's b-tree and hash table implementation. Both maps outperformed his C based hash table implementation.

> Neither tells you anything about Rust vs C.

It tells you rust's standard library has a faster hash map implementation than Bryan Cantrill. If you need a hash table, you're almost certainly better off using rust than rolling your own in C.

bcantrill · 2026-01-15T16:22:59 1768494179

One point of clarification: the C version does not have (and never had) a hash table; the C version had a BST (an AVL tree). Moreover, the "Rust hash table implementation" is in fact still B-tree based; the hash table described in the post is a much more nuanced implementation detail. The hash table implementation has really nothing to do with the C/Rust delta -- which is entirely a BST/B-tree delta. As I described in the post, implementing a B-tree in C is arduous -- and implementing a B-tree in C as a library would be absolutely brutal (because a B-tree relies on moving data). As I said in the piece, the memory safety of Rust is very much affecting performance here: it allows for the much more efficient data structure implementation.

arghwhat · 2026-01-16T13:54:31 1768571671

I wouldn't consider implementing a B-tree in C any more "arduous" than implementing any other notable container/algorithm in C, nor would making a library be "brutal" as moving data really isn't an issue. Libraries are available if you need them.

Quite frankly, writing the same in Rust seems far, far more "arduous", and you'd only realistically be writing something using BTreeMap because someone else did the work for you.

However, being right there in std makes use much easier than searching around for an equivalent library to pull into your C codebase. That's the benefit.

bcantrill · 2026-01-16T16:55:01 1768582501

I don't often do this, but I'm sorry, you don't know what you're talking about. If you bother to try looking for B-tree libraries in C, you will quickly find that they are either (1) the equivalent of undergraduate projects that are not used in production systems or (2) woven pretty deeply into a database implementation. This is because the memory model of C makes a B-tree library nasty: it will either be low performance or a very complicated interface -- and it is because moving data is emphatically an issue.

arghwhat · 2026-01-17T23:04:34 1768691074

> I don't often do this

You should have practiced better restraint then, as this was not a productive addition to the discussion.

My experience disagrees with your opinion and I see no value in engaging further.

estebank · 2026-01-15T16:37:03 1768495023

> constant breakage

Can you mention 3 cases of breakage the language has had in the last, let's say, 5 years? I've had colleagues in different companies responsible for updating company-wide language toolchains tell me that in their experience updating Rust was the easiest of their bunch.

> edition migrations

One can write Rust 2015 code today and have access to pretty much every feature from the latest version. Upgrading editions (at your leisure) can be done most of the time just by using rustfix, but even if done by hand, the idea that they are onerous is overstating their effect.

Last time I checked there were <100 checks in the entire compiler for edition gates, with many checks corresponding to the same feature. Adding support for new features that doesn't affect prior editions and by extension existing code (like adding async await keywords, or support for k# and r# tokens) is precisely the point of editions.

> dependency hell in Cargo.lock

Could you elaborate on what you mean?

Mawr · 2026-01-15T10:50:10 1768474210

> When it appears faster, it boils down to library quality and algorithm choice, not the language.

That's a thin, thin line of argumentation. The distinction between the ecosystem and language may as well not exist.

A lot of improvements of modern languages come down to convenience, and the more convenient something is, the more likely it is to be used. So it is meaningful to say that the average Rust program will perform better than the average C program given that there exist standard, well-performing, generic data structure libraries in Rust.

> It is a library benchmark, not a language one.

If you have infinite time to tune performance, perhaps. It is also meaningful to say that while importing a library may take a minute, writing equivalently performant code in C may take an hour.

mrheosuper · 2026-01-16T02:35:29 1768530929

Then this is more like comparison between Cargo and whatever-C-using. If a project decided not to use cargo, then Rust would have the same problem.

I acknowledge that C needs a tool as good as cargo, but if we are comparing language, we should restrict to language.

estebank · 2026-01-15T16:27:32 1768494452

> Rust is never inherently faster than C.

The opposite is true too. Which is the point of the article.

majormajor · 2026-01-14T23:15:09 1768432509

> (As a personal anecdote, I've probably run into more concurrency-related heisenbugs in Go than I ever did in C, with C heisenbugs more commonly being memory mismanagement in single-threaded code with complex object lifetimes/ownership structures...)

Is that beyond just "concurrency is tricky and a language that makes it easier to add concurrency will make it easier to add sneaky bugs"? I've definitely run into that, but have never written concurrent C to compare the ease of heisenbug-writing.

pornel · 2026-01-14T22:17:18 1768429038

Go is weirdly careless about thread-safety of its built-in data structures, but GC, channels, and the race detector seem to be enough?

Mawr · 2026-01-15T10:52:54 1768474374

Go generally takes the "eh, good enough" approach ;)

oconnor663 · 2026-01-15T04:05:46 1768449946

> Despite benefits, I don't actually think the memory safety really plays a role in the usage rate of parallelism.

I can see what you mean with explicit things like thread::spawn, but I think Tokio is a major exception. Multithreaded by default seems like it would be an insane choice without all the safety machinery. But we have the machinery, so instead most of the async ecosystem is automatically multithreaded, and it's mostly fine. (The biggest problems seem to be the Send bounds, i.e. the machinery again.) Cargo test being multithreaded by default is another big one.

usrnm · 2026-01-15T07:11:01 1768461061

> Multithreaded by default seems like it would be an insane choice without all the safety machinery

You're describing golang, and somehow it's fine. Bugs are possible, but not super common

simonask · 2026-01-15T08:16:44 1768465004

Isn't that "somehow" super attributable to the fact that Go is garbage collected?

Garbage collection is the one other known way to achieve memory safety.

throwaway2037 · 2026-01-15T08:42:50 1768466570

You raise a good point here. When I think about writing multi-threaded code, three things come to mind about why it is so easy in Java and C#: (1) The standard library has lots of support for concurrency. (2) Garbage collection. (3) Debuggers have excellent support for multi-threaded code.

arghwhat · 2026-01-15T15:08:32 1768489712

Not really, especially as garbage collection doesn't achieve memory safety. Safety-wise, it only helps avoid UAF due to lifecycle errors.

Garbage collection is primarily just a way to handle non-trivial object lifecycles without manual effort. Parallelism happens to often bring non-trivial object lifecycles, but this is not a major problem in parallelism.

In plain C, the common pattern is trying to keep lifecycles trivial, and the moment this either doesn't make sense or isn't possible, you usually just add a reference count member:

    struct some_type {
        uint32_t refcnt;
        uint32_t otherfields;
    };

    struct some_type *some_type_ref(struct some_type *a) {
        a->refcnt++;
        return a;
    }

    void some_type_unref(struct some_type *a) {
        a->refcnt--;
        if (a->refcnt == 0) {
            free(a); // or some_type_destroy(a);
        }
    }

In both Go and C, all types used in concurrent code needs to be reviewed for thread-safety, and have appropriate serialization applied - in the C case, this just also includes the refcnt itself. And yes you could have UAF or leak if you don't call ref/unref correctly, but that' sunrelated to parallism - it's just everyday life in manual memory management land.

The issues with parallelism is the same in Go and C, that you might have invalid application states, whether due to missing serialization - e.g., forgetting to lock things appropriately or accidentally using types that are not thread safe at all - or due to business logic flaws (say, two threads both sleeping, waiting for the other one to trigger an event and wake it up).

Mawr · 2026-01-15T10:51:49 1768474309

Kind of, but Go isn't memory-safe in the face of concurrent data races.

nineteen999 · 2026-01-14T20:12:40 1768421560

> (As a personal anecdote, I've probably run into more concurrency-related heisenbugs in Go than I ever did in C, with C heisenbugs more commonly being memory mismanagement in single-threaded code with complex object lifetimes/ownership structures...)

This is my experience too.

OptionOfT · 2026-01-14T14:35:41 1768401341

Apart from multi threading, there is more information in the Rust type system. Would that would allow more optimizations?

kouteiheika · 2026-01-14T14:42:43 1768401763

Yes. All `&mut` references in Rust are equivalent to C's `restrict` qualified pointers. In the past I measured a ~15% real world performance improvement in one of my projects due to this (rustc has/had a flag where you can turn this on/off; it was disabled by default for quite some time due to codegen bugs in LLVM).

steveklabnik · 2026-01-14T14:45:52 1768401952

Not just all &mut T, but also all &T, where the T does not transitively contain an UnsafeCell<T>. Click "show llvm ir" instead of "build" here: https://play.rust-lang.org/?version=stable&mode=release&edit...

marcianx · 2026-01-14T15:00:23 1768402823

I was confused by this at first since `&T` clearly allows aliasing (which is what C's `restrict` is about). But I realize that Steve meant just the optimization opportunity: you can be guaranteed that (in the absence of UB), the data behind the `&T` can be known to not change in the absence of a contained `UnsafeCell<T>`, so you don't have to reload it after mutations through other pointers.

steveklabnik · 2026-01-14T15:19:48 1768403988

Yes. It's a bit tricky to think about, because while it is literally called 'noalias', what it actually means is more subtle. I already linked to a version of the C spec below, https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf but if anyone is curious, this part is in "6.7.4.2 Formal definition of restrict" on page 122.

In some ways, this is kind of the core observation of Rust: "shared xor mutable". Aliasing is only an issue if the aliasing leads to mutability. You can frame it in terms of aliasing if you have to assume all aliases can mutate, but if they can't, then that changes things.

dmitrygr · 2026-01-15T02:53:38 1768445618

Do you not use restrict in your normal everyday C code that you write? I use it in my normal C code.

kouteiheika · 2026-01-15T04:54:42 1768452882

I used to use it, but very rarely, since it's instant UB if you get it wrong. In tiny codebases which you can hold in your head it's probably practical to sprinkle it everywhere, but in anything bigger it's quite risky.

Nevertheless, I don't write normal everyday C code anymore since Rust has pretty much made it completely obsolete for the type of software I write.

kazinator · 2026-01-15T08:08:02 1768464482

restrict works by making some situations undefined behavior that would otherwise be defined without it. It is probably unwise to use casually or habitually.

kazinator · 2026-01-15T08:03:44 1768464224

But of course the only thing restrict does in C is potentially introduce certain kinds of undefined behavior into a program that would be correct without it (and then things can be optimized on the assumption that the code is not invoked in a way that it would happen)

mhh__ · 2026-01-14T14:43:55 1768401835

Aliasing info is gold dust to a compiler in various situations although the absence of it in the past can mean that they start smoking crack when it's provided.

randomNumber7 · 2026-01-14T17:44:03 1768412643

In C there is the "restrict" keyword to tell the compiler that there is no other pointer to the values accessed over a certain pointer.

If you do not use that the generated code can be quite suboptimal in certain cases.

throwaway2037 · 2026-01-15T09:06:34 1768467994

    > If you do not use that the generated code can be quite suboptimal in certain cases.

I believe you, but I don't understand it. Can you give a simple example to demonstrate your point?

bennettnate5 · 2026-01-15T16:02:18 1768492938

The simplest example is `memcpy(dst, src, len)` and similar iterative byte copying operations. If the function did not use noalias, the compiler wouldn't be free to optimize individual byte read/writes into register-sized writes, as the destination may overlap with the source. In practice this means 8x more CPU instructions per copy operation on a 64-bit machine.

Note that memcpy specifically may already be implemented this way under the hood because it requires noalias; but I imagine similar iterative copying operations can be optimized in a like manner ad-hoc when aliasing information is baked in like it is with Rust.

randomNumber7 · 2026-01-16T15:48:06 1768578486

Say you have 2 pointers (that might overlap). You (or the compiler) keep one value read from the first pointer in a register, since the value is needed multiple times.

You then write access the second pointer. Now the value you kept in the register is invalidated since you might have overwritten it through the overlapping pointers.

adgjlsfhk1 · 2026-01-14T14:41:28 1768401688

Yes. Specifically since Rust's design prevents shared mutablity, if you have 2 mutable data-structures you know that they don't alias which makes auto vectorization a whole lot easier.

tcfhgj · 2026-01-14T15:02:57 1768402977

what about generics (equivalent to templates in C++), which allow compile time optimizations all the way down which may not possible if the implementation is hidden behind a void*?

OptionOfT · 2026-01-14T17:52:14 1768413134

Unless you use `dyn`, all code is monomorphized, and that code on its own will get optimized.

This does come with code-bloat. So the Rust std sometimes exposes a generic function (which gets monomorphized), but internally passes it off to a non-generic function.

This to avoid that the underlying code gets monomorphized.

https://github.com/rust-lang/rust/blob/8c52f735abd1af9a73941...

dwattttt · 2026-01-15T00:39:26 1768437566

> This does come with code-bloat. So the Rust std sometimes exposes a generic function (which gets monomorphized), but internally passes it off to a non-generic function.

There's no free lunch here. Reducing the amount of code that's monomorphised reduces the code emitted & improves compile times, but it reduces the scope of the code that's exposed to the input type, which reduces optimisation opportunities.

josephg · 2026-01-15T01:41:10 1768441270

Yes. But I like that rust gives you the option.

In C, the only way to write a monomorphized hash table or array list involves horribly ugly macros that are difficult to write and debug. Rust does monomorphization by default, but you can also use &dyn trait for vtable-like behaviour if you prefer.

bonzini · 2026-01-15T08:31:55 1768465915

There is also the "momo" crate to do the same with a procedural macro attribute (https://docs.rs/momo/latest/momo/).

dochtman · 2026-01-15T07:59:42 1768463982

I think the way Rust checks borrows also makes it a lot more feasible to avoid allocations/copies; not because it is impossible to do in C, but because doing it in C requires writing very careful documentation and the caller to actually read that documentation. In (safe) Rust this is all checked by the compiler such that libraries can leverage it without blowing their complexity budget.

throwaway2037 · 2026-01-15T08:35:48 1768466148

    > When writing single-threaded code takes 90% of effort of writing multi-threaded one

That "when" is doing some heavy lifting! More seriously: You raise a very interesting point. When I moved from C++ to Java (10+ years ago), I was initially so nervous to add threads to my Java code. Why? Because it was (then) difficult and dangerous to do it in C++. C++ debuggers were awful, so I didn't think I could debug problems with multi-threaded C++ code. (Of course, the C++ ecosystem has drastically improved in the last 10 years, so I am sure it is now much more pleasant (and safe) to write multi-threaded C++ code.) When I finally sat down to add threads to some Java code, I could not believe how easy it was, including debugging. As a result, going forward, I was much more likely to add threads to my Java... or even start with a multi-threaded design, even if there is only a modest performance improvement.

gpderetta · 2026-01-14T15:18:29 1768403909

Then again, often

  #pragma omp for

is a very low mental-overhead way to speed up code.

MeetingsBrowser · 2026-01-14T15:31:38 1768404698

Depends on the code.

OpenMP does nothing to prevent data races, and anything beyond simple for loops quickly becomes difficult to reason about.

thesz · 2026-01-15T00:32:52 1768437172

No.

It is easy to divide loop body into computation and share info update, the latter can be done under #pragma omp critical (label).

nurettin · 2026-01-14T15:33:38 1768404818

Yes! gcc/omp in general solved a lot of the problems which are conveniently left out in the article.

The we have the anecdotal "They failed firefox layout in C++ twice then did it in Rust" < to this I sigh in chrome.

steveklabnik · 2026-01-14T15:39:17 1768405157

The Rust version of this is "turn .iter() into .par_iter()."

It's also true that for both, it's not always as easy as "just make the for loop parallel." Stylo is significantly more complex than that.

> to this I sigh in chrome.

I'm actually a Chrome user. Does Chrome do what Stylo does? I didn't think it did, but I also haven't really paid attention to the internals of any browsers in the last few years.

pjmlp · 2026-01-14T15:59:52 1768406392

And the C++ version is add std::execution::par_unseq as parameter to the ranges algorithm.

MeetingsBrowser · 2026-01-14T20:44:33 1768423473

This has the same drawbacks as "#pragma omp for".

The hard part isn't splitting loop iterations between threads, but doing so _safely_.

Proving an arbitrary loop's iterations are split in a memory safe way is an NP hard problem in C and C++, but the default behavior in Rust.

pjmlp · 2026-01-15T06:43:56 1768459436

Well, if you are accessing global data with ranges, you are doing it wrong.

Naturally nothing on C++ prevents someone to do that, which is why PVS, Sonar and co exist.

Just like some things aren't prevented by Rust rather clippy.

jltsiren · 2026-01-15T08:33:58 1768466038

Concurrency is easy by default. The hard part is when you are trying to be clever.

You write concurrent code in Rust pretty much in the same way as you would write it in OpenMP, but with some extra syntax. Rust catches some mistakes automatically, but it also forces you to do some extra work. For example, you often have to wrap shared data in Arc when you convert single-threaded code to use multiple threads. And some common patterns are not easily available due to the limited ownership model. For example, you can't get mutable references to items in a shared container by thread id or loop iteration.

dwattttt · 2026-01-15T10:42:59 1768473779

> For example, you can't get mutable references to items in a shared container by thread id or loop iteration.

This would be a good candidate for a specialised container that internally used unsafe. Well, thread id at least; since the user of an API doesn't provide it, you could mark the API safe, since you wouldn't have to worry about incorrect inputs.

Loop iteration would be an input to the API, so you'd mark the API unsafe.

unrealhoang · 2026-01-15T12:45:36 1768481136

There’s split_at_mut to avoid writing unsafe yourself in this case.

nurettin · 2026-01-14T16:17:56 1768407476

Afaik it does all styling and layout in the main thread and offloads drawing instructions to other threads (CompositorTileWorker) and it works fine?

dwattttt · 2026-01-15T00:43:04 1768437784

That does sound like Chrome has also either failed to make styling multithreaded in C++ (or haven't attempted it), while it was achieved in Rust?

stonogo · 2026-01-15T06:58:29 1768460309

    Even just spawning a thread is going to make somebody complain that they can't build the code on their platform due to C11/pthread/openmp.

This matches squarely with my experience, but it's not limited to threading, and Rust evades a large swath of these problems by relatively limited platform support. I look forward to the day I can run Rust wherever I run C!

rtpg · 2026-01-15T07:14:30 1768461270

While Rust doesn't have C coverage, it has (by my last check) better coverage than something like CPython currently does.

The big thing though is Rust is honest about their tiers of support, whereas for many projects "supported platform" for minor platforms often mean "it still compiles (at least we think it does, when the maintainer tries it and it fails they will fix it)"

Not to be too glib though, there are obviously tools out there that have as much or more rigor than Rust and cover more platforms. Just... "supported platforms" means different things in different contexts.

stonogo · 2026-01-15T23:18:50 1768519130

All too common (not just with compilers) for someone to port the subset they care about and declare it done. Rust's decision to create standards of compliance and be conscious about which platforms are viable targets and which ones don't meet their needs is a completely valid way to ensure that whole classes of trouble never come. I think it's a completely valid approach, despite complaints from some.

yobbo · 2026-01-15T15:17:36 1768490256

In C, one can build data structures with pointers that would require reference counting and heap allocation in Rust. The performance would also depend on what kind of CPU/features it is compiled for.

Whether one should do it is a different question.

steveklabnik · 2026-01-15T15:22:53 1768490573

You can use unsafe in Rust to write the exact same thing.

m-schuetz · 2026-01-14T14:52:11 1768402331

I'm still confused as to why linux requires linking against TBB for multithreading, thus breaking cmake configs without if(linux) for tbb. That stuff should be included by default without any effort by the developer.

sebtron · 2026-01-14T15:02:10 1768402930

I think this is related to the C++ standard library implementation.

Using pthread in C, for example, TBB is not required.

Not sure about C11 threads, but I have always thought that GLIBC just uses pthread under the hood.

m-schuetz · 2026-01-14T15:28:13 1768404493

I don't know the details since I'm mainly a windows dev, but when porting to linux, TBB has always been a huge pain in the ass since it's a suddenly additionally required dependency by gcc. Using C++ and std::thread.

pjmlp · 2026-01-14T16:02:26 1768406546

Also clang, and in general parallel algorithms aren't available outside of platforms not supported by TBB.

C++26 will get another similar dependency, because BLAS algorithms are going to be added, but apparently the expectation is to build on top of C/Fortran BLAS battle tested implementations.

jasonjmcghee · 2026-01-14T15:48:45 1768405725

> Rust programmers may as well sprinkle threads all over the place regardless whether that's a 16x improvement or 1.5x improvement

What about energy use and contention?

pornel · 2026-01-14T22:54:46 1768431286

Usually it's a benefit for energy usage anyway.

CPUs are most energy efficient sitting idle doing nothing, so finishing work sooner in wall-clock time usually helps despite overheads.

Energy usage is most affected by high clock frequencies, and CPUs will boost clocks for single-threaded code.

Threads waiting on cache misses let CPU use hyperthreading, which is actually energy efficient (you get context switching in hardware).

You can waste energy in pathological cases if you overuse spinlocks or spawn so many threads that bookkeeping takes more work than what the threads do, but helper libraries for multithreading all have thread pools, queues, and dynamic work splitting to avoid extreme cases.

Most of the time low speed up is merely Amdahl's law – even if you can distribute work across threads, there's not enough work to do.

jasonjmcghee · 2026-01-15T01:37:33 1768441053

Thanks

groundzeros2015 · 2026-01-14T14:55:50 1768402550

Multithreading does not make code more efficient. It still takes the same amount of work and power (slightly more).

On a backend system where you already have multiple processes using various cores (databases, web servers, etc) it usually doesn’t make sense as a performance tool.

And on an embedded device you want to save power so it also rarely makes sense.

MrJohz · 2026-01-14T15:55:16 1768406116

According to [1], the most important factor for the power consumption of code is how long the code takes to run. Code that spreads over multiple cores is generally more power efficient than code that runs sequentially, because the power consumption of multiple cores grows less than linearly (that is, it requires less than twice as much power to run two cores as it does one core).

Therefore if parallelising code reduces the runtime of that code, it is almost always more energy efficient to do so. Obviously if this is important in a particular context, it's probably worth measuring it in that context (e.g. embedded devices), but I suspect this is true more often than it isn't true.

[1]: https://arxiv.org/abs/2410.05460

fauigerzigerk · 2026-01-14T16:53:14 1768409594

>Therefore if parallelising code reduces the runtime of that code, it is almost always more energy efficient to do so

Only if it leads to better utilisation. But in the scenario that the parent comment suggests, it does not lead to better utilisation as all cores are constantly busy processing requests.

Throughput as well as CPU time across cores remains largely the same regardless of whether or not you paralellise individual programs/requests.

MrJohz · 2026-01-14T18:37:42 1768415862

That's true, which is why I added the caveat that this is only true if parallelising reduces the overall runtime - if you can get in more requests per second through parallelisation. And the flip side of that is that if you're able to perfectly utilise all cores then you're already running everything in parallel.

That said, I suspect it's a rare case where you really do have perfect core utilisation.

pirocks · 2026-01-14T15:34:03 1768404843

> Multithreading does not make code more efficient. It still takes the same amount of work and power (slightly more).

In addition to my sibling comments I would like to point out that multithreading quite often can save power. Typically the power consumption of an all core load is within 2x the power consumption of a single core load, while being many times faster assuming your task parallelizes well. This makes sense b/c a fully loaded cpu core still needs all the L3 cache mechanisms, all the DRAM controller mechanisms, etc to run at full speed. A fully idle system on the other hand can consume very little power if it idles well(which admittedly many cpus do not idle on low power).

Edit:

I would also add that if your system is running a single threaded database, and a single threaded web server, that still leaves over a hundred of underutilized cores on many modern server class cpus.

groundzeros2015 · 2026-01-14T17:04:18 1768410258

Responding to your last point.

If you use a LAMP style architecture with a scripting language handling requests and querying a database, you can never write a single line of multithreaded code and already are setup to utilize N cores.

Each web request can happen in a thread/process and their queries and spawns happen independently as well.

NetMageSCW · 2026-01-14T15:00:15 1768402815

Multithreading can made an application more responsive and more performant to the end user. If multithreading causes an end user to have to wait less, the code is more performant.

groundzeros2015 · 2026-01-14T15:01:48 1768402908

Yes it can used to reduce latency of a particular task. Did you read my points about when it’s not helpful?

Are people making user facing apps in rust with GUIs?

sebtron · 2026-01-14T15:10:18 1768403418

> Are people making user facing apps in rust with uis?

We are talking not only about Rust, but also about C and C++. There are lots of C++ UI applications. Rust poses itself as an alternative to C++, so it is definitely intended to be used for UI applications too - it was created to write a browser!

At work I am using tools such as uv [1] and ruff [2], which are user-facing (although not GUI), and I definitely appreciate a 16x speedup if possible.

[1] https://github.com/astral-sh/uv

[2]https://github.com/astral-sh/ruff

jenadine · 2026-01-15T06:22:35 1768458155

> There are lots of C++ UI applications.

Is there? UI applications historically used to be written in C++. But in this decade, I don't think many new GUI are being written in C++

pjmlp · 2026-01-15T06:52:54 1768459974

Games are, anything based on Qt/KDE, UWP/WinUI (even if it is mostly Microsoft employees).

Now even if it is Flutter, React Native, or Chrome/Electron, they are powered by C++ graphics engine, and language runtimes.

estebank · 2026-01-15T17:03:27 1768496607

The engine being written in C++ does not mean the application is. You're conflating the platform with what is being built on top of it. Your logic would mean that all Python applications should be counted as C applications.

pjmlp · 2026-01-15T17:24:08 1768497848

Indeed too many fake Python libraries.

allreduce · 2026-01-14T15:36:44 1768405004

Usually it does not reduce latency but increases throughput.

Multithreading is an invaluable tool when actually using your computer to crunch numbers (scientific computing, rendering, ...).

tcfhgj · 2026-01-14T15:03:44 1768403024

> Are people making user facing apps in rust with GUIs?

yes

PPanther · 2026-01-15T06:18:11 1768457891

I made this with egui and typo: https://github.com/PJaros/menu_pdf

Works under Linux and Windows.

groundzeros2015 · 2026-01-14T15:05:52 1768403152

got any to share? Should I assume native gui in these these rust performance debates?

tcfhgj · 2026-01-14T15:19:47 1768403987

https://system76.com/cosmic

https://helix-editor.com/

https://zed.dev/

groundzeros2015 · 2026-01-14T17:27:11 1768411631

What do you think are good use cases for multi threading in these editors?

steveklabnik · 2026-01-14T18:27:43 1768415263

"don't block the ui thread" is a pretty classic aphorism in any language.

johnisgood · 2026-01-15T09:12:25 1768468345

Hmm. "Fearless concurrency" and the flagship examples are... background threads for search and not freezing the UI?

That is GUI programming 101 from the Win32 era. Every Tcl/Tk app, every GTK app, every Qt app has been doing this for 25+ years.

If Rust's concurrency story were genuinely revolutionary, you would expect examples like:

- Lock-free data structures that are actually hard to get right

- Complex parallel algorithms with non-trivial synchronization

- Work-stealing schedulers with provable correctness

Instead we have "we run grep in a background thread"?

steveklabnik · 2026-01-15T15:12:26 1768489946

When a basic question is asked, a basic answer is given. I didn’t say that I think that’s the coolest or most interesting answer. It’s just the most obvious, straightforward one. It’s not even about Rust!

(And also, I don’t think things like work stealing queues are relevant to editors, but maybe that’s my own ignorance.)

johnisgood · 2026-01-15T17:08:27 1768496907

You cannot have it both ways though. Either these are meaningful examples of Rust's benefits, or they are not worth mentioning.

In a thread about Rust's concurrency advantages, these editors were cited as examples. "Don't block the UI thread" as justification only works if Rust actually provides something novel here. If it is just basic threading that every language has done for decades, it should not have been brought up as evidence in the first place.

Plus if things like work-stealing queues and complex synchronization are not relevant to editors, then editors are a poor example for demonstrating Rust's concurrency story in the first place anyway.

steveklabnik · 2026-01-15T19:29:35 1768505375

Here is the question that was asked:

> What do you think are good use cases for multi threading in these editors?

That question is not even about Rust. I answered the question, not some other related question.

tcfhgj · 2026-01-15T23:37:55 1768520275

The editors (and the desktop environment) are examples for apps with a GUI in Rust, to show people indeed create apps with GUIS in Rust, nothing else.

tcfhgj · 2026-01-14T18:05:19 1768413919

search, linting

xpe · 2026-01-14T15:20:03 1768404003

You might start with https://github.com/zed-industries/awesome-gpui and https://blog.logrocket.com/state-rust-gui-libraries/

gf000 · 2026-01-14T15:24:20 1768404260

Well, what about small CLI tools, like ripgrep and the like? Does multithreading not matter when we open a large number of files and process them? What about compilers?

groundzeros2015 · 2026-01-14T17:09:55 1768410595

Sure. But the more obviously parallel the problem is (visiting N files) the less compelling complex synchronization tools are.

To over explain, if you just need to make N forks of the same logic then it’s very easy to do this correctly in C. The cases where I’m going to carefully maintain shared mutable state with locking are cases where the parallelism is less efficient (Ahmdal’s law).

Java style apps that just haphazardly start threads are what rust makes safer. But that’s a category of program design I find brittle and painful.

The example you gave of a compiler is canonically implemented as multiple process making .o files from .c files, not threads.

gf000 · 2026-01-14T17:47:24 1768412844

> The example you gave of a compiler is canonically implemented as multiple process making .o files from .c files, not threads.

This is a huge limitation of C's compilation model, and basically every other language since then does it differently, so not sure if that's a good example. You do want some "interconnection" between translation units, or at least less fine-grained units.

groundzeros2015 · 2026-01-14T18:02:43 1768413763

And yet despite that theoretical limit C compiles faster than any other language. Even C++ is very fast if you are not using header-only style.

What’s better? Rust? Haskell? Swift?

It’s very hard to do multithreading at a more granular level without hitting amdahl’s law and synchronization traps.

gf000 · 2026-01-15T09:55:52 1768470952

It reminds me of the joke that "I can do math very fast", probed with a multiplication and immediately answering some total bollocks answer. - "That's not even close" - "Yeah, but it was fast"

Sure, it's not a trivial problem, but why wouldn't we want better compilation results/developer ergonomics at the price of more compiler complexity and some minimal performance penalty?

And it's not like the performance doesn't have its own set of negatives, like header-only libraries are a hack directly manifested from this compilation model.

groundzeros2015 · 2026-01-17T16:19:28 1768666768

your topic is shifting from multi-threading benefits in language compilation, to ergonomics of C headers.

pjmlp · 2026-01-15T06:55:07 1768460107

No it doesn't, try it against a language with modules support, even the oldie Turbo Pascal for MS-DOS.

groundzeros2015 · 2026-01-17T16:20:11 1768666811

Turbo pascal compiles faster because of better parallelization?