Back about 20 years, DO CONCURRENT went into FORTRAN. It says that the programmer claims all the iterations of a DO look are non-interfering. This was never really that useful. It holds for matrix multiply, but not much else.
It's exactly what you want for the article's use case of adding things up
in parallel.
There are a few standard cases for parallelism:
- There's no interaction between tasks at all for long periods. Easiest case. Use case is codebreaking and crypto mining. Best done on special purpose hardware.
- You're running a service with a large number of incoming connections. The connections are essentially independent of each other.
This is easy to do concurrently, because the threads don't talk to each other. For N > 1000 or so, this is the useful case for server-side async.
- You have some big area or volume oriented computation, like weather prediction, where there are many local regions being computed, and the regions talk to their neighbors a bit. This is the classic supercomputer application.
- You have lots of little tasks going on, queuing up events for each other.
This was the vision Alan Kay had for Smalltalk. It sometimes shows up inside games,
and inside discrete event simulations. The internals of some operating systems work this way.
- Anything that runs on a GPU. Very limited interaction between tasks. Until you get to shadows, lighting, occlusion culling, reflections, and anything where you don't want to
do N lights x M meshes processing. Vulkan gives you the low-level tools to deal with the interlocking needed to do that, which is why Vulkan is so complicated.
(I can't speak to LLM training; haven't been inside that.)
The Fortran committee botched DO CONCURRENT badly. The requirements imposed on the program only make it safe to execute its iterations in any serial order, but are not sufficient to allow safe parallel execution. So one can write a perfectly conforming DO CONCURRENT loop that will produce wrong answers when actually run in parallel. The problem in the spec seems to have been inadvertent but they have refused to fix it (and don’t seem to understand the problem either.)
There are a few standard cases for parallelism:
- There's no interaction between tasks at all for long periods. Easiest case. Use case is codebreaking and crypto mining. Best done on special purpose hardware.
- You're running a service with a large number of incoming connections. The connections are essentially independent of each other. This is easy to do concurrently, because the threads don't talk to each other. For N > 1000 or so, this is the useful case for server-side async.
- You have some big area or volume oriented computation, like weather prediction, where there are many local regions being computed, and the regions talk to their neighbors a bit. This is the classic supercomputer application.
- You have lots of little tasks going on, queuing up events for each other. This was the vision Alan Kay had for Smalltalk. It sometimes shows up inside games, and inside discrete event simulations. The internals of some operating systems work this way.
- Anything that runs on a GPU. Very limited interaction between tasks. Until you get to shadows, lighting, occlusion culling, reflections, and anything where you don't want to do N lights x M meshes processing. Vulkan gives you the low-level tools to deal with the interlocking needed to do that, which is why Vulkan is so complicated.
(I can't speak to LLM training; haven't been inside that.)