The status quo that SNARKs require a lot of memory is simply due to the current available constructions; optimisations and new constructions will certainly change that.
While building snark proofs is very memory intensive in itself, it is far from optimization free, which is one of the primary requirements for a PoW...
I did get your point. But if the onus to produce such a proof is on the miner before they can broadcast the next block, then the PoW is no longer optimization free.
Ah I see. It's true, but there's a good chance these optimizations would be discovered independently, progressively and percolate into the ecosystem. Turning to ASIC was an "optimization" that was replicated and copied. It's true that there is a lot more secret sauce to optimizing a SNARK, but if mining drives improvement in SNARK efficiency, I would call that a win.