I'm thinking, what if someone would add in their blog's terms of use something like "the content on this blog is not for machine learning purposes"? Could they later sue Google if their site gets scraped by Google for LLM training?
A contract requires an agreement between two parties. I don't see how writing "the content on this blog is not for machine learning purposes" alone shows Google has agreed to your terms.