(I am not a lawyer.) As far as I understand the legal precedents involved, random terms of services for websites are not effective in this scenario as the public profiles do not require having any account or other relationship. This actually went to court, and because Zappos didn't force users to click through a terms of service to access their service, the terms of service was invalid.
As for their ability to control what you do with the information: there might be a limited license on the data granted from users to LinkedIn that is not transferrable, so maybe you couldn't build a service that redistributed that information, but I don't see why obtaining and holding it would be illegal.
As for the analogies to power and telephone and such, those are built on property owned by a local government and there are usually other extra laws related to them: it isn't due to some common law position that you can't mess with their stuff. Here, I am not a lawyer, but I am a government official with a particular interest in sewage; here is a link to the sewer use ordinances form our local sanitation district: pay particular attention to 2.03.
I worked at Google for four years, an independent search engine for 5 more after that, and at IBM after it acquired said search engine for 18 months after that. Everyone of those organizations spent many thousands of dollars on legal fees over just this question and reviewed tons of case law.
Every single one of them concluded that based on how the law was written and how the web worked, there is no legal way to scrape a web site without its explicit permission to do so.
That won't stop people from trying of course and it was a source of constant entertainment in the ops team at Blekko at how people tried to sneak around at scraping (it can get very creative) but; it isn't legal, you can and will get banned from all access for it, and if you use the results in another product or offering you will be found liable for damages.
If your robots.txt file is /allow then you did. If you have no robots.txt file then it's an open question. If you put a /deny into your robots.txt file Google will stop scraping your site.
The implicit contract is that you let them scrape because you want to show up in their search results which will send you traffic. If you don't care about Google traffic then set /deny in your robots.txt and get back the bandwidth you were giving them.
> If you have no robots.txt file then it's an open question.
Only for definitions of explicit I must be unfamiliar with.
If the presence of a robots.txt makes one's intent for a given resource explicit one way or the other, the lack of one (and the lack of some communication in some other channel) must mean there is no explicit permission.
That is correct, for what it was worth IBM's legal team came down on the side of 'assume deny' and Google was (at the time I was there) 'assume allow.'
To the extent to which that is the case, though, it isn't due to the terms of service; and that is also a case of how you are using the data for later, which is a separate question from the scraping and collection process: it is very clear to me that a search engine is operating on the legal equivalent of thin ice, particularly with details like snippets and synthesis ;P. Whether the CFAA applies (as indicated in this article) is an open question, but that just isn't quite so obvious as "you also can't connect up to the public sewer".
> it is very clear to me that a search engine is
> operating on the legal equivalent of thin ice,
We may be saying similar things but from a metaphor I think of search engines operating on 'thick' ice. It has been litigated so much that there is a bevy of case law to refer to at all levels. Eric Goldman's blog used to have a pretty good list of the number of suits of various kind and the searchengine blog covered many of them as well.
For a search engine it is super clear, robots.txt is all. If you say yes explicitly, great. If you say no explicitly, that has to be honored. If you say nothing, then its up to the search engine to decide which way to interpret it, but if the site owner complains because you picked wrong you have to honor their wishes (which may include destroying any cached data as well).
PadMapper, Perfect10, and the newspapers generated a ton of cases based on 'scraping a web site and using the data.' There are also about a dozen comparative shopping sites that have been dinged for the exact same issues. (look vs Amazon or vs Walmart).
Whether CFAA, DMCA, Torte law (contracts), or something else applies is constantly being discussed :-). I'm just the messenger here. I haven't found a single case that has held that the point of view of the scraper of someone else's web site should prevail. The argument that it should be allowed 'to help new businesses get off the ground' is like saying Apple should pay out some of its cash hoard as grants to startups trying to break into some business. I have yet to read anything that was sympathetic to that point of view.
yuummm Torte law. (It's tort, and tort law is generally considered to be distinct from contract law, because in tort rights and duties come from common law whereas in contract law they come from acts of agreement between two parties).
Chocolate Torte is my favorite :-) Thanks for the clarification, in the various articles I've read over the years on this topic they refer to tort law (no doubt because much of the argument references common law and the way in which the relations are argued) and I made the leap to 'contracts' which was incorrect.
As for their ability to control what you do with the information: there might be a limited license on the data granted from users to LinkedIn that is not transferrable, so maybe you couldn't build a service that redistributed that information, but I don't see why obtaining and holding it would be illegal.
As for the analogies to power and telephone and such, those are built on property owned by a local government and there are usually other extra laws related to them: it isn't due to some common law position that you can't mess with their stuff. Here, I am not a lawyer, but I am a government official with a particular interest in sewage; here is a link to the sewer use ordinances form our local sanitation district: pay particular attention to 2.03.
http://goletawest.org/wp-content/uploads/2012/04/Ordinance-N...