Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They use less memory for inference but remember the details less well. For instance if you’re implementing code and want edits, it will forget various functions to be part of the script. Even transformers aren’t perfect at this and SSMs are even worse. For many use cases, that ability isn’t needed as much so the memory savings is a bigger lever.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: