Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This post is about training not inference. And llama.cpp has similarly simple LoRa training code. There is nothing in neural networks themselves so complex to justify the amount of complexity the Python-ML community piled up. MLX, for instance, is a similarly general purpose research framework that is a fraction of the size.


Sure neural networks in of themselves are conceptually simple, and not difficult to code. Andrew Ng's original Coursera class is all you need to go from zero knowledge to building MATLAB based neural nets in this same hard coded style.

However, there is a huge difference in functionality (hence complexity) in a framework such as PyTorch vs hardcoding a single NN. It's a bit like the difference between writing a toy compiler in CompSci class vs a production one that supports optimization, multiple targets, etc, etc.

The first step in convenience beyond hardcoding models, was frameworks like the original Torch, and original TensorFlow. Those frameworks let you explicitly assemble a neural net out of modular "lego blocks" (tensor operations), then just call model.forward() or model.backward() - no need to yourself write the forwards and backwards functions.

What PyTorch (successor to Torch) did was increase the complexity of the framework, but bring massive ease-of-use to the developer, by getting rid of the explicit lego-block assembly process, and instead let the developer just write arbitrary Python code corresponding to what they want the model to do, and then PyTorch itself build the model internally and therefore is able to infer the backward function. This extra functionality/ease-of-use, but with corresponding internal complexity, is what differentiated PyTorch from TensorFlow, made it so succesful, and caused most developers to switch to it.

There is also a lot of other functionality in PyTorch that adds to the complexity - supporting multiple back ends, custom CUDA/etc kernels beyond what is provided by cuDNN, etc, etc.


I know all this things. Again: look at MLX.


I looked at the MLX codebase - certainly pretty tight, although also very poorly commented/documented.

I'm still not sure MLX vs PyTorch is really a fair comparison though since PyTorch is a much more mature framework and of course supports many different backends, as opposed to MLX which is just Metal or CPU.

Comparing these is a bit like comparing the new programming language of the day vs older ones that have accumulated a log of cruft/complexity. The shiny new language will likely look just as crufty after it has accumulated all the additional functionality of the older one.


Show me the profiling in MLX




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: