Pay attention to IO bandwidth if you’re building a machine with multiple GPUs li...

Miraste · 2025-02-11T15:07:45 1739286465

Unless you run the GPUs in parallel, which you have to go out of your way to do, the IO bandwidth doesn't matter. The cards hold separate layers of the model, they're not working together. They're only passing a few kilobytes per second between them.

Xenograph · 2025-02-11T15:08:41 1739286521

Which models do you enjoy most on your 4090? and why vLLM instead of ollama?

ekianjo · 2025-02-11T15:21:06 1739287266

> Author didn’t mention NVLink so I’m presuming it wasn’t used, but I believe these cards would support it.

How would you setup NVLink, if the cards support it?

zinccat · 2025-02-11T15:07:02 1739286422

I feel that you are mistaking the two bandwidth numbers