You can run inference today on pretty much any card.
Download Ollama on a modern MacBook and can run 13B and even higher (if your RAM allows) at fast speeds. People run smaller models locally on their phones
Google has trained their latest models on their own TPUs... not using Nvidia to my knowledge.
So, no, there are alternatives. CUDA has the largest mindshare on the training side though.
Download Ollama on a modern MacBook and can run 13B and even higher (if your RAM allows) at fast speeds. People run smaller models locally on their phones
Google has trained their latest models on their own TPUs... not using Nvidia to my knowledge.
So, no, there are alternatives. CUDA has the largest mindshare on the training side though.