| « How Apple got green overnight... | Slashdotted! (but not dugg...) » |
What the google.be case is really about
09/28/06
What the google.be case is really about
Everybody’s been saying lots of things about the Google.be case, especially that the Belgian newspapers should have used robots.txt to tell Google what not to index. And that the fact they did not use robots.txt clearly show all they were interested is in getting money from Google…
Well, friends, I’m no lawyer or legal expert of any kind, but I’m French… and that lets me read and “almost” understand the terms of the ruling… I guess…
I think the ruling makes it pretty clear what the Belgian newspapers want, and I think this has been mistunderstood:
- The papers welcome Google to index and display their news as part of Google News! (or at least they don’t care)
- The papers’ particular online business model is that news are free, but access to archives require payments. Example here.
- Once an article falls out of the news category and into the archives category, it should not be freely accessible any more.
- Google, via its world (in)famous Google Cache, often makes the content available forever, or at least for a very long time after is has gone off the official site’s free area.
I guess that’s it: what the Beligian paper really want is a way to get the content out of Google News once it is no news any more.
Now, I’m no robots.txt or Googlebot expert either, but from what I understand there was no convenient way for the papers to tell Google that it is okay to index some content for, let’s say 2 months, but not keep it in cache after that delay.
Goggle made some general comments on the case on their blog, but:
- They are not allowed to comment specifically on the ruling, so it’s not that useful;
- They failed to show up at the trial, which is quite unbelievable… but would make it almost believable they fail to understand the real issue that has been raised…

Note: again, I’m no legal expert. Just trying to make a little sense of all this noise…
Be social: digg this! ![]()
15 comments
Kochise
I asked him by email if there would not be any side effects in telling Goggle not to Cache. I mean, Google needs the Cache to determine exact relevancy at search time. Not being in the Cache could restrict you to the supplemental results only.
Danny answered that people have been worried some time ago ago but that he hasn't seen any worry like that for some time.
It is possible that meta noarchive just hides the archive link in Google but Google still caches internally. In this cases everything is okay.
Now I wonder: why don't we all use meta noarchive? What good can it do to have content publicly available from Google's cache instead of the original site? ;)
The 410 response is primarily intended to assist the task of web maintenance by notifying the recipient that the resource is intentionally unavailable and that the server owners desire that remote links to that resource be removed. Such an event is common for limited-time, promotional services and for resources belonging to individuals no longer working at the server's site. It is not necessary to mark all permanently unavailable resources as "gone" or to keep the mark for any length of time -- that is left to the discretion of the server owner.
I Think it should definitely unindex and uncache the page in that event.
The remaining hacky solution would be to replace the gone page with a blank page containing a meta nocache header ;)
http://www.google.com/search?q=obama%20site%3Anytimes.com
Some of those pages are cached, some aren't, but all are indexed. Google News in the US frequently lists articles that cannot be viewed without paying money first.
You can time when your cache expires exactly, according to this dude on reddit:
http://reddit.com/info/k3qv/comments/ck4pz
And even if this technical solution didn't exist, the Belgian papers are still idiots. If you do not get along with Google's policies, robots.txt can tell them to go away. They apparently want Google to anticipate their desires via mental telepathy. I trust there are a few geeks in Belgium that are rather embarrassed about this whole thing.
Another thing I understood from the ruling is that the papers were pretty much pissed off by the fact they Google never listened to their matter in the first place. I can believe that... since they didn't show up at the court! :>
Maybe Google could afford a little tech support... even to Belgians ;)
As said in the comment on Reddit, you practically need to be an SEO to know there is a solution. (That also applies to Danny Sullivan above I guess ;)
I find issues of copyright and the web non-sensical - you had to make about 15 copies of this comment (along with the rest of the text on the page) in order to view it, between the inter-router hops, your browser's cache, the in-memory version of the article, etc.
The fact that (at least in the US) everything is automatically copyrighted, and very few websites specifically grant people the right to copy flies in the face of their actions - stuff is on the web (generally) to be viewed (and that means copied) by everyone, as much as they want.
The entire situation simply doesn't make sense.
Google isn't that bad, if you fall on a 404, it just couldn't harm people anymore. Otherwise WebArchive may harm several people over the world, more than Google !
Kochise
Also, most people here are missing the main point of Google's objection to the ruling: their home page is 'sacred'. It is a key part of what makes them Google - the home page is simple and uncluttered.
There's no reason the court couldn't have compromised and permitted them to simply add a prominent link from their home page to the settlement. There's no reason the text of settlement itself has to appear on the home page.
It's sad that Google did not appear in court. The judge's ruling seems irrelevant in light of the fact that Google has always provided a technical means whereby the belgian newspapers could easily prevent users from linking to a cached version of their copyrighted articles. There are many sites which correctly use the meta noarchive to do just that.
It looks like the judge was simply defending a group of incompetent publishers' right to continue being totally incompetent...
This fact becomes even clearer when you note that certain "archived" articles are available for a "1 credit" charge via LeSoir's search box, but the same article remains accessible for free via another link on exactly the same website:
link
or
link
Luckily, I do not own shares Rossel et Cie SA (editor of Le Soir Magazine), 'cause it seems to me that they do not know what the hell they are doing...
