This post is about *training* not inference. And llama.cpp has similarly simple ...

HarHarVeryFunny · on April 9, 2024

Sure neural networks in of themselves are conceptually simple, and not difficult to code. Andrew Ng's original Coursera class is all you need to go from zero knowledge to building MATLAB based neural nets in this same hard coded style.

However, there is a huge difference in functionality (hence complexity) in a framework such as PyTorch vs hardcoding a single NN. It's a bit like the difference between writing a toy compiler in CompSci class vs a production one that supports optimization, multiple targets, etc, etc.

The first step in convenience beyond hardcoding models, was frameworks like the original Torch, and original TensorFlow. Those frameworks let you explicitly assemble a neural net out of modular "lego blocks" (tensor operations), then just call model.forward() or model.backward() - no need to yourself write the forwards and backwards functions.

What PyTorch (successor to Torch) did was increase the complexity of the framework, but bring massive ease-of-use to the developer, by getting rid of the explicit lego-block assembly process, and instead let the developer just write arbitrary Python code corresponding to what they want the model to do, and then PyTorch itself build the model internally and therefore is able to infer the backward function. This extra functionality/ease-of-use, but with corresponding internal complexity, is what differentiated PyTorch from TensorFlow, made it so succesful, and caused most developers to switch to it.

There is also a lot of other functionality in PyTorch that adds to the complexity - supporting multiple back ends, custom CUDA/etc kernels beyond what is provided by cuDNN, etc, etc.

antirez · on April 9, 2024

I know all this things. Again: look at MLX.

HarHarVeryFunny · on April 10, 2024

I looked at the MLX codebase - certainly pretty tight, although also very poorly commented/documented.

I'm still not sure MLX vs PyTorch is really a fair comparison though since PyTorch is a much more mature framework and of course supports many different backends, as opposed to MLX which is just Metal or CPU.

Comparing these is a bit like comparing the new programming language of the day vs older ones that have accumulated a log of cruft/complexity. The shiny new language will likely look just as crufty after it has accumulated all the additional functionality of the older one.

bootsmann · on April 10, 2024

Show me the profiling in MLX