I am seeing all over the net a discussion on 302 Hijackings and that Google is evil. But the thing is no one is discussing the actual cause of it. The actual cause is the HTTP Protocol that says EXPLICITLY
"10.3.3 302 Found
The requested resource resides temporarily under a different URI. Since the redirection might be altered on occasion, the client SHOULD continue to use the Request-URI for future requests. This response is only cacheable if indicated by a Cache-Control or Expires header field." - Emphasis not mine.
You can read it for yourself at http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
Now we all know the importance of protocol. Its a communicating language. In this case the protocol was basically developed when the WEB was pure and unadulterated. Where people expected others to follow and not misuse the protocol.
But with money always comes greed and dishonesty. WEB originally was not built with Business in mind. It was for free Information Interchange. But it has just evolved to a state where Commercially the WEB can be harnessed (exploited whatever) for its potential.
So now any search engine that follows the protocol to the letter is in effect aiding the Hijacking, but is it the mistake of the search engine or the protocol? Unlike Human languages, protocols dont evolve uninhibited. If it did then very soon no browser can understand all the servers and vice versa. i.e you might need 10 kinds of browsers to access 10 different website, beacuse those 10 websites talk a different language.
(Now come to think of it, is this not what is happening in the DRM world. You download music from one site and you can't play it on another without a hack). That is the reason there is a standard and it gets revised every so often so that it can also keep up with the times.
So some of the suggestions like throw the redirecting page into the bin and keep the target page will really have web wide repurcussions for people who use it with the standard in mind and with a legitimate purpose. So you ask who uses it and for what purpose?
Let me give an example.
Ever tried buying from Amazon.com?
Okay how do you reach the homepage?
Well i type in amazon.com into my browser and i get the page. BUT the url at which i get the page is exactly now http://www.amazon.com/exec/obidos/subst/home/home.html/103-7996157-2162261
Use this server header tool for understanding what happens http://www.webrankinfo.com/english/tools/server-header.php
1) Enter www.amazon.com
It says
HTTP/1.1 301 Moved Permanently
Date: Thu, 24 Mar 2005 14:38:22 GMT
Server: Stronghold/2.4.2 Apache/1.3.6 C2NetEU/2412
(Unix) amarewrite/0.1 mod_fastcgi/2.2.12
Set-Cookie: skin=; domain=.amazon.com; path=/; exp ires=Wed, 01-Aug-01 12:00:00 GMT
Location: http://www.amazon.com:80/exec/obidos/subst/home/home.html
Connection: close
Content-Type: text/plain
So amazon.com doesnt exist (dont mistake me, the page amazon.com) what exists is http://www.amazon.com:80/exec/obidos/subst/home/home.html. 2) Now enter http://www.amazon.com:80/exec/obidos/subst/home/home.html in the box.
It says
HTTP/1.1 302
Date: Thu, 24 Mar 2005 14:40:48 GMT
Server: Stronghold/2.4.2 Apache/1.3.6 C2NetEU/2412
(Unix) amarewrite/0.1 mod_fastcgi/2.2.12
Set-Cookie: session-id-time=1112256000; path=/; do main=.amazon.com; expires=Thursday, 31-Mar-2005 08 <wbr></nobr>:00:00 GMT
Set-Cookie: session-id=002-8272699-5270422; path=/ ; domain=.amazon.com; expires=Thursday, 31-Mar-200 5 08:00:00 GMT
Location: http://www.amazon.com/exec/obidos/subst/home/home.html/002-8272699-5270422
Connection: close
Content-Type: text/html
So now the home page is temporarily at http://www.amazon.com/exec/obidos/subst/home/home.html/002-8272699-5270422
If Google were to follow your advice the home page of amazon will be http://www.amazon.com/exec/obidos/subst/home/home.html/002-8272699-5270422 But this URL is so temporary that if you try to access the homepage even seconds after it gets re-directed to a new url with some other no in the end.
Currently searching for amazon [google.com] to this day gives amazon homepage as being www.amazon.com/exec/obidos/subst/home/home.html and not someother url. Personally a thumbs up from my side.
What is the purpose of the amazon url dance? Its for tracking. You want Amazon to stop tracking in their websites because other sites are hijacked? (I am not saying that your hijacked website is of lesser worth that big websites but the thing is your website was defrauded using a perfectly legitimate method because the method has no failsafe built into it).
Amazon isnt the only one. Lots of sites do it. And it is even more common in the education and the open source world where they redirect to other domains because things are shared even more.
So now coming back to one more comment, it was something akin to "When i get mugged and complained to the cops, the cops tell me that i am supposed to ask the mugger to stop it!"
So in one stroke the commenter has said that Google is the cop of the net and that they are useless as a cop. Let me just tell you even in real life the cops are only as useful as the laws that back them up. And in this case the laws have not yet been written for handling mugging.
And do you really want any one company to change the protocol on its own in its own way? You know what this will lead up to? Three different search engines reading the protocols in three different ways. And this will only be a start. You know what a mess it will be trying to work with these engines.
This is a problem that has to be nicked at the source i.e the protocols. Lets stop laying the blame on the first step we come across.
One possible solution is the adding of an meta-tag protocol.
Meta-tag = 'Redirect, source url'
Value = 'accept'
And for the same domain redirect
Meta-tag = 'Redirect, yourdomain.com'
Value = 'accept'
(There will be better methods but that is for the w3 consortium to decide)
The protocol should specify that if the meta-tag is not there its default value is 'not accept'. And the standards committee must ensure that the world wide web speaks the same language.
AND THEN if Google lets your page be hijacked, we can blame Google. Not now when they are doing their work AS prescribed by the laws.
And in the Interim whats the solution?
I think Google's algo is pretty robust in weeding out MOST of the hijackings. How do i say it? My experience alone. The thing is what you all are seeing is just the tip of the ice berg. Pages still do get hijacked, but they are minisucle compared to the pages that dont get hijacked even though they are targetted with 302 redirected links.
I hope i have clearly explained where lies the problem. In the web hysteria that is following this current discussion majority are not being explained the facts. They all are being given the impression that it is Googles mistake alone.
And no. I am not a paid or unpaid spokesman for Google.
"10.3.3 302 Found
The requested resource resides temporarily under a different URI. Since the redirection might be altered on occasion, the client SHOULD continue to use the Request-URI for future requests. This response is only cacheable if indicated by a Cache-Control or Expires header field." - Emphasis not mine.
You can read it for yourself at http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
Now we all know the importance of protocol. Its a communicating language. In this case the protocol was basically developed when the WEB was pure and unadulterated. Where people expected others to follow and not misuse the protocol.
But with money always comes greed and dishonesty. WEB originally was not built with Business in mind. It was for free Information Interchange. But it has just evolved to a state where Commercially the WEB can be harnessed (exploited whatever) for its potential.
So now any search engine that follows the protocol to the letter is in effect aiding the Hijacking, but is it the mistake of the search engine or the protocol? Unlike Human languages, protocols dont evolve uninhibited. If it did then very soon no browser can understand all the servers and vice versa. i.e you might need 10 kinds of browsers to access 10 different website, beacuse those 10 websites talk a different language.
(Now come to think of it, is this not what is happening in the DRM world. You download music from one site and you can't play it on another without a hack). That is the reason there is a standard and it gets revised every so often so that it can also keep up with the times.
So some of the suggestions like throw the redirecting page into the bin and keep the target page will really have web wide repurcussions for people who use it with the standard in mind and with a legitimate purpose. So you ask who uses it and for what purpose?
Let me give an example.
Ever tried buying from Amazon.com?
Okay how do you reach the homepage?
Well i type in amazon.com into my browser and i get the page. BUT the url at which i get the page is exactly now http://www.amazon.com/exec/obidos/subst/home/home.html/103-7996157-2162261
Use this server header tool for understanding what happens http://www.webrankinfo.com/english/tools/server-header.php
1) Enter www.amazon.com
It says
HTTP/1.1 301 Moved Permanently
Date: Thu, 24 Mar 2005 14:38:22 GMT
Server: Stronghold/2.4.2 Apache/1.3.6 C2NetEU/2412
(Unix) amarewrite/0.1 mod_fastcgi/2.2.12
Set-Cookie: skin=; domain=.amazon.com; path=/; exp ires=Wed, 01-Aug-01 12:00:00 GMT
Location: http://www.amazon.com:80/exec/obidos/subst/home/home.html
Connection: close
Content-Type: text/plain
So amazon.com doesnt exist (dont mistake me, the page amazon.com) what exists is http://www.amazon.com:80/exec/obidos/subst/home/home.html. 2) Now enter http://www.amazon.com:80/exec/obidos/subst/home/home.html in the box.
It says
HTTP/1.1 302
Date: Thu, 24 Mar 2005 14:40:48 GMT
Server: Stronghold/2.4.2 Apache/1.3.6 C2NetEU/2412
(Unix) amarewrite/0.1 mod_fastcgi/2.2.12
Set-Cookie: session-id-time=1112256000; path=/; do main=.amazon.com; expires=Thursday, 31-Mar-2005 08 <wbr></nobr>:00:00 GMT
Set-Cookie: session-id=002-8272699-5270422; path=/ ; domain=.amazon.com; expires=Thursday, 31-Mar-200 5 08:00:00 GMT
Location: http://www.amazon.com/exec/obidos/subst/home/home.html/002-8272699-5270422
Connection: close
Content-Type: text/html
So now the home page is temporarily at http://www.amazon.com/exec/obidos/subst/home/home.html/002-8272699-5270422
If Google were to follow your advice the home page of amazon will be http://www.amazon.com/exec/obidos/subst/home/home.html/002-8272699-5270422 But this URL is so temporary that if you try to access the homepage even seconds after it gets re-directed to a new url with some other no in the end.
Currently searching for amazon [google.com] to this day gives amazon homepage as being www.amazon.com/exec/obidos/subst/home/home.html and not someother url. Personally a thumbs up from my side.
What is the purpose of the amazon url dance? Its for tracking. You want Amazon to stop tracking in their websites because other sites are hijacked? (I am not saying that your hijacked website is of lesser worth that big websites but the thing is your website was defrauded using a perfectly legitimate method because the method has no failsafe built into it).
Amazon isnt the only one. Lots of sites do it. And it is even more common in the education and the open source world where they redirect to other domains because things are shared even more.
So now coming back to one more comment, it was something akin to "When i get mugged and complained to the cops, the cops tell me that i am supposed to ask the mugger to stop it!"
So in one stroke the commenter has said that Google is the cop of the net and that they are useless as a cop. Let me just tell you even in real life the cops are only as useful as the laws that back them up. And in this case the laws have not yet been written for handling mugging.
And do you really want any one company to change the protocol on its own in its own way? You know what this will lead up to? Three different search engines reading the protocols in three different ways. And this will only be a start. You know what a mess it will be trying to work with these engines.
This is a problem that has to be nicked at the source i.e the protocols. Lets stop laying the blame on the first step we come across.
One possible solution is the adding of an meta-tag protocol.
Meta-tag = 'Redirect, source url'
Value = 'accept'
And for the same domain redirect
Meta-tag = 'Redirect, yourdomain.com'
Value = 'accept'
(There will be better methods but that is for the w3 consortium to decide)
The protocol should specify that if the meta-tag is not there its default value is 'not accept'. And the standards committee must ensure that the world wide web speaks the same language.
AND THEN if Google lets your page be hijacked, we can blame Google. Not now when they are doing their work AS prescribed by the laws.
And in the Interim whats the solution?
I think Google's algo is pretty robust in weeding out MOST of the hijackings. How do i say it? My experience alone. The thing is what you all are seeing is just the tip of the ice berg. Pages still do get hijacked, but they are minisucle compared to the pages that dont get hijacked even though they are targetted with 302 redirected links.
I hope i have clearly explained where lies the problem. In the web hysteria that is following this current discussion majority are not being explained the facts. They all are being given the impression that it is Googles mistake alone.
And no. I am not a paid or unpaid spokesman for Google.