JISCMail - WRITING-AND-THE-DIGITAL-LIFE Archives

I have twice in recent days (since I just moved between my two homes, and
don't have access yet to my old internet bookmarks and old mail) Googled the
WDL blog to try to find it.  I was alarmed to discover that it is quite hard
to find the blog on Google.  The first few results (if one gives search
terms such as "thomas", "digital" and so forth in addition to "WDL", which
has many other meanings) are references to the blog, but most are from
discussions of the blog before it ever existed, so they do not lead to the
blog's address.  The correct blog link is somewhere well down on the list,
and is only there at all with certain choices of search terms (I had to try
several to find the right ones).

I don't know if Sue can do anything to make the blog easier to find on
Google -- as you know, getting good search engine results for sites is now
an entire profession although the tricky part is supposed to be making the
client's website come up at the top when relevant general categories are
searched; it is really unfortunate that searching for the site by its actual
name doesn't work very well...  But aside from what my experience means
about marketing the WDL blog, it is worthwhile to consider that Google,
usually much-praised for good reason -- may be the weak link in the web's
status as a useful and reliable research tool.

If the only way -- or at least the usual way -- to find anything on the web
is to Google it, sites which are not easy to Google can simply become lost
on the web.  Not only won't they get new visitors who find them through
Googling, they will slowly lose their old visitors because people will
forget to bookmark the site and then will be unable to find it again when
they want it.  This is bad enough when you know the site exists and know
many details about it, but in that case you can start asking around for the
URL and searching in cleverer ways so you could find the site again, but if
someone puts up a valuable and excellent web site that isn't widely
marketed, people will not discover it if Google doesn't lead them to it.

This problem wasn't quite as bad when there was more competition in the
search engine market, so that if Google didn't lead you to the site, at
least the people who used some other search engine might get there, and many
people even used multiple search engines for the same search to get wider
results, but now Google is completely dominant and many other apparently
distinct search sites are actually "powered by Google" so they won't give
unique results.

I don't blame Google for this state of things -- it is understandable for
them to try to beat their competition and they actually do provide a better
service than most other search engines (and they aren't known for
eliminating their competition in dishonest and/or unfair ways, the way
Microsoft does), but I think it could become a really bad problem,
especially regarding use of the web for academic and other serious research
purposes.  Too often, a web search forms the primary basis of initial
research, even published research in journals, so that someone could
conceivably write a survey article on something that completely omits a
major point of view or even a major set of facts, if the omitted material
isn't easily accessible by Google.

If the researcher were instead to use a library, subject-specific databases
on CD ROM, indexes to periodicals, actual journals and their indices,
published collections of abstracts, and so forth they would be much less
likely to miss something major because those sources of data have systematic
indexing systems designed by librarians (even if the index seems much less
flexible than a computer search) and are also edited by human beings so as
not to omit things.  (The academic field I studied was math.  In math, there
is a monthly publication called "Current Math Publications" and it lists
every paper in many journals, so that if you search the CMP index, you will
find every paper on the your topic, not just the ones which happen to
accrete to search terms on Google by the secret algorithms of the Google
webspiders.)

I really fear that there will be an increasing number of "literature survey
articles" or even supposedly scientific "meta-analyses" which purport to
draw conclusions about an actual subject (not just about the state of the
literature that is on the web about a subject) by analyzing what all the
different papers one finds on the web say about the subject.  For example,
there is a respected tradition of "meta analyses" in the medical research
literature, where all the studies ever done on a certain subject are
collected and the results are presented in aggregate, generally with some
statistical methods which are supposed to measure how reliable the results
are and weight better or bigger studies more heavily in the analysis and so
forth.

Hopefully the mathematics improves the quality of the results, but clearly a
simple minded meta analysis could yield truly worthless results. Suppose one
did a meta analysis of whether internet use causes insanity.  The meta
analysis collects a bunch of published studies on this topic.  One study
might be a randomized clinical trial in which a well-balanced sample of
10,000 random people was compiled, and each person's amount of internet use
was correlated with their reported episodes of mental illness and also with
the results of a standardized psychiatric examination.  A second study might
be a study of 10 psychotic murderers (out of a bigger group of 30 psychotic
murderers where the 10 were the ones who consented to be interviewed) whom
an untrained investigator has asked whether or not they liked to go online
before committing their crimes.

The simpleminded meta alanysis would try to make a standard coding for all
the studies (all two of them in my example) and would consider that the
aggregate results of the studies was equivalent to a single larger study of
the total number of subjects (10,010 in our example).  Of course in our
example the two studies are not at all comparable -- even though they
purport to answer the same question.  Our results would not be total garbage
only because the second, much less reliable study used many fewer subjects,
so it counts for less in the final statistics.  But the result of the
meta-analysis would be substantially LESS reliable than the results of the
better study.  (Note that the 10 psychotic murderers would mess up the
results more than proportionally to their number, because they are cases of
actual insanity and many of them may have been internet users -- as many
people in any sample are -- whereas out of the 10,000 people there would be
maybe 100 psychotic people and perhaps no people AS psychotic as the
psychotic murdereres, and the study method would also fail to identify many
people who really were psychotic despite being well-designed.)

Thus we can see that a simpleminded meta analysis will give very lousy
results.  But a respectable meta-analysis has a systematic way off compiling
the studies it uses, for example every study published in every issue of a
large group of journals during a fiuxed time period is included.  It is
hoped that by looking only at (supposedly) reliable sources for the studies,
and then by including all of them that meet the set criteria, one exercises
some quality control over the studies and does not omit important results,
and also it is thought that problems with one study that alter the results
in one direction will be balanced out by errors in other studies that cause
an opposite bias.  I find the whole process to be rather suspect and can see
a lot to criticize in it, but the point is that there is an accepted
methodology for doing these meta analyses, and it tries to address all the
major problems with the process.

Now imagine that the meta analysis gets all the studies it uses by a Google
search.  You can immediately see that there will be big problems if Google
leaves out a lot of important stuff, overrepresents other things, and so
forth.  The method I am describing (especially using Google) sounds so
terrible that it may be hard to believe that anyone would consider it to be
a valid type of medical research, but unfortunately this is really the case,
and some of the studies really do use internet searches.  This, then, is a
case where Google's faults could lead to people getting the wrong medical
treatments, if decisions are made using results of meta analyses.
(Fortunately, most medical authorities don't rely much on these kinds of
studies, but they are increasingly being performed and published, precisely
because the internet makes these studies easy to do!)

Millie Niss

**********

* Visit the Writing and the Digital Life blog http://writing.typepad.com
* To alter your subscription settings on this list, log on to Subscriber's Corner at http://www.jiscmail.ac.uk/lists/writing-and-the-digital-life.html
* To unsubscribe from the list, email [log in to unmask] with a blank subject line and the following text in the body of the message: SIGNOFF WRITING-AND-THE-DIGITAL-LIFE