Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When you run that, what quantization do you get? The library website of Ollama (https://ollama.com/library/gemma2:27b) isn't exactly a good use case in surfacing useful information like what the default quantization is.


If you leave the :27b off from that URL you'll see the default size which is 9b. Ollama seems to always use Q4_0 even if other quants are better.


not sure how to tell, but here's the full output from ollama serve https://pastes.io/ollama-run-gemma2-27b


If you hit the drop-down menu for the size of the model, then tap “view all”, you will see the size and hash of the model you have selected and can compare it to the full list below it that has the quantization specs in the name.


Still, I don't see a way (from the web library) to see the default quantization (from Ollama's POV) at all, is that possible somehow?


The model displayed in the drop-down when you access the web library is the default that will be pulled. Compare the size and hash to the more detailed model listing below it and you will see what quantization you have.

Example: the default model weights for Llama 3.3 70b, after hitting the “view all” have this hash and size listed next to it - a6eb4748fd29 • 43GB

Now scroll down through the list and you will find the one that matches that hash and size is “70b-instruct-q4_K_M”. That tells you that the default weights for Llama 3.3 70B from Ollama are 4-bit quantized (q4) while the “K_M” tells you a bit about what techniques were used during quantization to balance size and performance.


Thanks, that seems to indicate Q4 for the quantization, you're probably able to run that on the 4090 as well FWIW, the size of the model is just 14.55 GiB.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: