How is the performance difference between using a dedicated GPU from Nvidia for example compared to whatever Apple does?
So lets say we'd run a model on a Mac Mini M4 with 24GB RAM, how many tokens/s are you getting? Then if we run the exact same model but with a RTX 3090ti for example, how many tokens/s are you getting?
Do these comparisons exist somewhere online already? I understand it's possible to run the model on Apple hardware today, with the unified memory, but how fast is that really?
Not the exact same comparison but I have an M1 mac with 16gb ram and can get about 10 t/s with a 3B model. The same model on my 3060ti gets more than 100 t/s.
Could you say what exact model+quant you're using for that specific test + settings + runtime? Just so I could try to compare with other numbers I come across.
So lets say we'd run a model on a Mac Mini M4 with 24GB RAM, how many tokens/s are you getting? Then if we run the exact same model but with a RTX 3090ti for example, how many tokens/s are you getting?
Do these comparisons exist somewhere online already? I understand it's possible to run the model on Apple hardware today, with the unified memory, but how fast is that really?