When you run that, what quantization do you get? The library website of Ollama (...

mkesper · 2025-02-11T22:28:59 1739312939

If you leave the :27b off from that URL you'll see the default size which is 9b. Ollama seems to always use Q4_0 even if other quants are better.

fkyoureadthedoc · 2025-02-11T16:13:24 1739290404

not sure how to tell, but here's the full output from ollama serve https://pastes.io/ollama-run-gemma2-27b

navbaker · 2025-02-11T19:21:31 1739301691

If you hit the drop-down menu for the size of the model, then tap “view all”, you will see the size and hash of the model you have selected and can compare it to the full list below it that has the quantization specs in the name.

diggan · 2025-02-11T19:41:45 1739302905

Still, I don't see a way (from the web library) to see the default quantization (from Ollama's POV) at all, is that possible somehow?

navbaker · 2025-02-11T20:39:53 1739306393

The model displayed in the drop-down when you access the web library is the default that will be pulled. Compare the size and hash to the more detailed model listing below it and you will see what quantization you have.

Example: the default model weights for Llama 3.3 70b, after hitting the “view all” have this hash and size listed next to it - a6eb4748fd29 • 43GB

Now scroll down through the list and you will find the one that matches that hash and size is “70b-instruct-q4_K_M”. That tells you that the default weights for Llama 3.3 70B from Ollama are 4-bit quantized (q4) while the “K_M” tells you a bit about what techniques were used during quantization to balance size and performance.

diggan · 2025-02-11T16:17:03 1739290623

Thanks, that seems to indicate Q4 for the quantization, you're probably able to run that on the 4090 as well FWIW, the size of the model is just 14.55 GiB.