The actual corpus that is worth using is the book corpus. While Google can't pro...

		ChuckMcM on Aug 22, 2014 \| parent \| context \| favorite \| on: Google's fact-checking bots build vast knowledge b... The actual corpus that is worth using is the book corpus. While Google can't provide public access to all of the books it has scanned there is no restriction on them using the data in the books to feed this project. Given the amount of information they have scanned from libraries and elsewhere that is a much better source.

Is anyone doing the same for the books scanned by Archive.org?