Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How is the performance difference between using a dedicated GPU from Nvidia for example compared to whatever Apple does?

So lets say we'd run a model on a Mac Mini M4 with 24GB RAM, how many tokens/s are you getting? Then if we run the exact same model but with a RTX 3090ti for example, how many tokens/s are you getting?

Do these comparisons exist somewhere online already? I understand it's possible to run the model on Apple hardware today, with the unified memory, but how fast is that really?



Not the exact same comparison but I have an M1 mac with 16gb ram and can get about 10 t/s with a 3B model. The same model on my 3060ti gets more than 100 t/s.

Needless to say, ram isn't everything.


Could you say what exact model+quant you're using for that specific test + settings + runtime? Just so I could try to compare with other numbers I come across.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: