So this should be referring to w8a8 (weights and activations in 8 bit) So this i...

imjonse · on Oct 25, 2024

Were there comparisons made to AWS, Smoothquant, GPTQ or other non-vanilla PTQ methods? Thanks.

formalsystem · on Oct 25, 2024

Not that I know of for this study, at least for the specific scope torchao we want to make it easier for researchers to create new quantization algorithms in python and have those algorithms run fast and you can see a lot of those algorithms here https://github.com/pytorch/ao/tree/main/torchao/prototype

So for example for AWQ and GPTQ we can accelerate them by using a fast int4 kernel called tinygemm