When you run that, what quantization do you get? The library website of Ollama (https://ollama.com/library/gemma2:27b) isn't exactly a good use case in surfacing useful information like what the default quantization is.
If you hit the drop-down menu for the size of the model, then tap “view all”, you will see the size and hash of the model you have selected and can compare it to the full list below it that has the quantization specs in the name.
The model displayed in the drop-down when you access the web library is the default that will be pulled. Compare the size and hash to the more detailed model listing below it and you will see what quantization you have.
Example: the default model weights for Llama 3.3 70b, after hitting the “view all” have this hash and size listed next to it - a6eb4748fd29 • 43GB
Now scroll down through the list and you will find the one that matches that hash and size is “70b-instruct-q4_K_M”. That tells you that the default weights for Llama 3.3 70B from Ollama are 4-bit quantized (q4) while the “K_M” tells you a bit about what techniques were used during quantization to balance size and performance.
Thanks, that seems to indicate Q4 for the quantization, you're probably able to run that on the 4090 as well FWIW, the size of the model is just 14.55 GiB.