JISCMail - WRITING-AND-THE-DIGITAL-LIFE Archives

Millie,
  Google now has a "Blog search" Beta under their "more" menu. I tried 
searching for "Writing and the Digital Life" and it came up right at 
the top. Guess they have been thinking about the problem!

 -Peter Ciccariello
 ARTIST'S BLOG - http://invisiblenotes.blogspot.com/


 -----Original Message-----
 From: Millie Niss <[log in to unmask]>
 To: [log in to unmask]
 Sent: Mon, 19 Sep 2005 06:04:27 -0400
 Subject: [WDL] googling WDL blog, the dangers of Google

  I have twice in recent days (since I just moved between my two homes, 
and
  don't have access yet to my old internet bookmarks and old mail) 
Googled the
  WDL blog to try to find it. I was alarmed to discover that it is quite 
hard
 to find the blog on Google. The first few results (if one gives search
  terms such as "thomas", "digital" and so forth in addition to "WDL", 
which
 has many other meanings) are references to the blog, but most are from
  discussions of the blog before it ever existed, so they do not lead to 
the
  blog's address. The correct blog link is somewhere well down on the 
list,
  and is only there at all with certain choices of search terms (I had 
to try
 several to find the right ones).

 I don't know if Sue can do anything to make the blog easier to find on
  Google -- as you know, getting good search engine results for sites is 
now
  an entire profession although the tricky part is supposed to be making 
the
  client's website come up at the top when relevant general categories 
are
  searched; it is really unfortunate that searching for the site by its 
actual
 name doesn't work very well... But aside from what my experience means
 about marketing the WDL blog, it is worthwhile to consider that Google,
  usually much-praised for good reason -- may be the weak link in the 
web's
 status as a useful and reliable research tool.

  If the only way -- or at least the usual way -- to find anything on 
the web
  is to Google it, sites which are not easy to Google can simply become 
lost
 on the web. Not only won't they get new visitors who find them through
 Googling, they will slowly lose their old visitors because people will
  forget to bookmark the site and then will be unable to find it again 
when
 they want it. This is bad enough when you know the site exists and know
  many details about it, but in that case you can start asking around 
for the
  URL and searching in cleverer ways so you could find the site again, 
but if
 someone puts up a valuable and excellent web site that isn't widely
  marketed, people will not discover it if Google doesn't lead them to 
it.

 This problem wasn't quite as bad when there was more competition in the
 search engine market, so that if Google didn't lead you to the site, at
  least the people who used some other search engine might get there, 
and many
  people even used multiple search engines for the same search to get 
wider
  results, but now Google is completely dominant and many other 
apparently
  distinct search sites are actually "powered by Google" so they won't 
give
 unique results.

  I don't blame Google for this state of things -- it is understandable 
for
  them to try to beat their competition and they actually do provide a 
better
 service than most other search engines (and they aren't known for
 eliminating their competition in dishonest and/or unfair ways, the way
 Microsoft does), but I think it could become a really bad problem,
  especially regarding use of the web for academic and other serious 
research
 purposes. Too often, a web search forms the primary basis of initial
 research, even published research in journals, so that someone could
 conceivably write a survey article on something that completely omits a
  major point of view or even a major set of facts, if the omitted 
material
 isn't easily accessible by Google.

  If the researcher were instead to use a library, subject-specific 
databases
 on CD ROM, indexes to periodicals, actual journals and their indices,
  published collections of abstracts, and so forth they would be much 
less
  likely to miss something major because those sources of data have 
systematic
  indexing systems designed by librarians (even if the index seems much 
less
  flexible than a computer search) and are also edited by human beings 
so as
  not to omit things. (The academic field I studied was math. In math, 
there
  is a monthly publication called "Current Math Publications" and it 
lists
  every paper in many journals, so that if you search the CMP index, you 
will
 find every paper on the your topic, not just the ones which happen to
  accrete to search terms on Google by the secret algorithms of the 
Google
 webspiders.)

  I really fear that there will be an increasing number of "literature 
survey
  articles" or even supposedly scientific "meta-analyses" which purport 
to
  draw conclusions about an actual subject (not just about the state of 
the
  literature that is on the web about a subject) by analyzing what all 
the
  different papers one finds on the web say about the subject. For 
example,
  there is a respected tradition of "meta analyses" in the medical 
research
 literature, where all the studies ever done on a certain subject are
  collected and the results are presented in aggregate, generally with 
some
  statistical methods which are supposed to measure how reliable the 
results
  are and weight better or bigger studies more heavily in the analysis 
and so
 forth.

  Hopefully the mathematics improves the quality of the results, but 
clearly a
  simple minded meta analysis could yield truly worthless results. 
Suppose one
 did a meta analysis of whether internet use causes insanity. The meta
 analysis collects a bunch of published studies on this topic. One study
 might be a randomized clinical trial in which a well-balanced sample of
  10,000 random people was compiled, and each person's amount of 
internet use
  was correlated with their reported episodes of mental illness and also 
with
  the results of a standardized psychiatric examination. A second study 
might
  be a study of 10 psychotic murderers (out of a bigger group of 30 
psychotic
  murderers where the 10 were the ones who consented to be interviewed) 
whom
  an untrained investigator has asked whether or not they liked to go 
online
 before committing their crimes.

  The simpleminded meta alanysis would try to make a standard coding for 
all
 the studies (all two of them in my example) and would consider that the
  aggregate results of the studies was equivalent to a single larger 
study of
 the total number of subjects (10,010 in our example). Of course in our
 example the two studies are not at all comparable -- even though they
  purport to answer the same question. Our results would not be total 
garbage
  only because the second, much less reliable study used many fewer 
subjects,
 so it counts for less in the final statistics. But the result of the
  meta-analysis would be substantially LESS reliable than the results of 
the
 better study. (Note that the 10 psychotic murderers would mess up the
  results more than proportionally to their number, because they are 
cases of
  actual insanity and many of them may have been internet users -- as 
many
  people in any sample are -- whereas out of the 10,000 people there 
would be
 maybe 100 psychotic people and perhaps no people AS psychotic as the
  psychotic murdereres, and the study method would also fail to identify 
many
 people who really were psychotic despite being well-designed.)

 Thus we can see that a simpleminded meta analysis will give very lousy
  results. But a respectable meta-analysis has a systematic way off 
compiling
  the studies it uses, for example every study published in every issue 
of a
 large group of journals during a fiuxed time period is included. It is
  hoped that by looking only at (supposedly) reliable sources for the 
studies,
  and then by including all of them that meet the set criteria, one 
exercises
  some quality control over the studies and does not omit important 
results,
  and also it is thought that problems with one study that alter the 
results
  in one direction will be balanced out by errors in other studies that 
cause
  an opposite bias. I find the whole process to be rather suspect and 
can see
 a lot to criticize in it, but the point is that there is an accepted
  methodology for doing these meta analyses, and it tries to address all 
the
 major problems with the process.

  Now imagine that the meta analysis gets all the studies it uses by a 
Google
  search. You can immediately see that there will be big problems if 
Google
  leaves out a lot of important stuff, overrepresents other things, and 
so
 forth. The method I am describing (especially using Google) sounds so
  terrible that it may be hard to believe that anyone would consider it 
to be
  a valid type of medical research, but unfortunately this is really the 
case,
  and some of the studies really do use internet searches. This, then, 
is a
  case where Google's faults could lead to people getting the wrong 
medical
 treatments, if decisions are made using results of meta analyses.
  (Fortunately, most medical authorities don't rely much on these kinds 
of
  studies, but they are increasingly being performed and published, 
precisely
 because the internet makes these studies easy to do!)

 Millie Niss

 **********

  * Visit the Writing and the Digital Life blog 
http://writing.typepad.com
  * To alter your subscription settings on this list, log on to 
Subscriber's
  Corner at 
http://www.jiscmail.ac.uk/lists/writing-and-the-digital-life.html
  * To unsubscribe from the list, email [log in to unmask] with a 
blank
  subject line and the following text in the body of the message: 
SIGNOFF
 WRITING-AND-THE-DIGITAL-LIFE

  

**********

* Visit the Writing and the Digital Life blog http://writing.typepad.com
* To alter your subscription settings on this list, log on to Subscriber's Corner at http://www.jiscmail.ac.uk/lists/writing-and-the-digital-life.html
* To unsubscribe from the list, email [log in to unmask] with a blank subject line and the following text in the body of the message: SIGNOFF WRITING-AND-THE-DIGITAL-LIFE