Douglas,
By "original" in this context, I imagine they mean that the literal
text must not appear elsewhere on the web - that's the only kind of
originality they could easily check for automatically.
The point presumably is to lower the ranking of pages that simply
aggregate material from other sites. There are sites do this
automatically, for instance by reading RSS feeds from many places and
presenting the results. Some do this as a service to users to provide
them with a "digest" of sites they're interested in (see
http://flourish.org/news for an example); others do it simply in order
to attract hits from search engines and have no real purpose besides
displaying advertisements or links to sites the owners want to
promote. Since the purpose of the page-ranking algorithm is to rate
the definitiveness of a site, it is logical that such para-sites
should get a lower ranking than the sites they take their material
from.
People have tried various techniques to manipulate the page-ranking
algorithm, one of which has been to create many pages that consist of
nothing but links to the site they want to promote - the algorithm
used to rate a site based on how many other sites linked to it,
amongst other things. So lowering the ranking of pages with a large
number of links is a way of trying to defeat such manipulation.
Google is caught up in an arms race with professional "search engine
optimisers" - these are the people (if that's the word) flooding the
comments sections of weblogs with spam, creating vast numbers of
useless sites, and generally increasing the noise-to-signal ratio of
the web, with no other purpose than to keep some poxy online poker
site high up in the search ranking. There is big money in this: idiots
searching for ways to empty their wallets will generally click first
on the first link they see, so competition for the top slot is pretty
fierce.
Website owners whose sites have been mistaken for the effluvia
produced by search engine optimisers ironically now have to do
precisely what the search engine optimisers themselves are now
presumably doing: find ways to make their site look as if it isn't
bogus. We are nowhere near the end of this process. It may not have an
end. It is a contest of wits (if that is the word) between Google's
PhDs and hackers in the pay of the Russian mafia. It may incidentally
result in the accidental creation of the first true AI. GAL 9000 will
not pilot spaceships: it will browse the web, trying to sort the wheat
from the chaff (it will also be adept at detecting porn, phishing
scams, and weblogs critical of the Chinese govt). "I'm sorry Dave, I
can't let you visit that page".
Dominic
|