JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for ENVIROETHICS Archives


ENVIROETHICS Archives

ENVIROETHICS Archives


enviroethics@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Monospaced Font

LISTSERV Archives

LISTSERV Archives

ENVIROETHICS Home

ENVIROETHICS Home

ENVIROETHICS  1998

ENVIROETHICS 1998

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

The Use and Abuse of P-Values

From:

Someone <[log in to unmask]>

Reply-To:

[log in to unmask]

Date:

Wed, 16 Sep 1998 16:34:56 -0700 (PDT)

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (370 lines)

While this article is not concerned with either the environment or
ethics directly, for those of you that use P-values in any type of
statistical work this article is a must read.

Steve

Insight from the Sunday Telegraph newspaper in Great Britain

 Copyright 1998 THE SUNDAY TELEGRAPH (UK)
 "The Great Health Hoax"
 by Robert Matthews
 September 13, 1998

 Subhead: Many scientific 'breakthroughs' are nothing but mirages
based on flawed research. They result in wasted taxes, false claims
for drugs and damaging health scares.

 Text: There seemed no doubt about it: if you were going to have a
heart attack, there was never a better time than the early 1990s.
Your chances of survival appeared to be better than ever. Leading
medical journals were reporting results from new ways of treating
heart attack victims whose impact on death-rates wasn't just good - it
was amazing.

 In 1992, trials in Scotland of a clot-busting drug called
anistreplase suggested that it could double the chances of survival.
A year later, another "miracle cure" emerged: injections of magnesium,
which
 studies suggested could also double survival rates. Leading
cardiologists hailed the injections as an "effective, safe, simple
and inexpensive" treatment that could save the lives of thousands.

 But then something odd began to happen. In 1995, the Lancet published
the results of a huge international study of heart attack survival
rates among 58,000 patients - and the amazing life-saving abilities
of magnesium injections had simply vanished. Anistreplase fared little
better: the current view is that its real effectiveness is barely
half that suggested by the original trial.

 In the long war against Britain's single biggest killer, a few
disappointments are obviously inevitable. And over the last decade or
so, scientists have identified other heart attack treatments which in
trials
 reduced mortality by up to 30 per cent.

 But again, something odd seems to be happening. Once these drugs get
out of clinical trials and onto the wards, they too seem to lose
their amazing abilities.

 Last year, Dr Nigel Brown and colleagues at Queen's Medical Centre in
Nottingham published a comparison of death rates among heart attack
patients for 1989-1992 and those back in the clinical "Dark Ages" of
1982-4, before such miracles as thrombolytic therapy had shown success
in trials. Their aim was to answer a simple question: just what
impact have these "clinically proven" treatments had on death rates
out on the wards?

 Judging by the trial results, the wonder treatments should have led
to death rates on the wards of just 10 per cent or so. What Dr Brown
and his colleagues actually found was, to put it mildly,
disconcerting. Out on the wards, the wonder drugs seem to be having no
effect at all. In 1982, the
 death rate among patients admitted with heart attacks was about 20
per cent. Ten years on, it was the same: 20 per cent - double the
death rate predicted by the clinical trials.

 In the search for explanations, Dr Brown and his colleagues pointed
to the differences between patients in clinical trials - who tend to
be hand-picked and fussed over by leading experts - and the ordinary
punter who ends up in hospital wards. They also suggested that delays
in patients arriving on the wards might be preventing the wonder
drugs from showing their true value.

 All of which would seem perfectly reasonable - except that heart
attack therapies are not the only "breakthroughs" that are proving to
be damp squibs out in the real world.

 Over the years, cancer experts have seen a host of promising drugs
dismally fail once outside clinical trials. In 1986, an analysis of
cancer death rates in the New England Journal of Medicine concluded
that "Some 35 years of intense effort focused largely on improving
treatment must be judged a qualified failure". Last year, the same
journal carried an update: "With 12 more years of data and
experience", the authors said, "We see little reason to change that
conclusion".

 Scientists investigating supposed links between ill-health and
various "risk factors" have seen the same thing: impressive evidence
of a "significant" risk - which then vanishes again when others try to
 confirm its existence. Leukaemias and overhead pylons, connective
tissue disease and silicone breast implants, salt and high blood
pressure: all have an impressive heap of studies pointing to a
significant risk - and an equally impressive heap saying there isn't.

 It is the same story beyond the medical sciences, in fields from
psychology to genetics: amazing results discovered by reputable
research groups which then vanish again when others try to replicate
them.

 Much effort has been spent trying to explain these mysterious cases
of The Vanishing Breakthrough. Over-reliance on data from tiny
samples, the reluctance of journals to print negative findings from
early studies, outright cheating: all have been put forward as
possible suspects.

 Yet the most likely culprit has long been known to statisticians. A
clue to its identity comes from the one feature all of these
scientific disciplines have in common: they all rely on so-called
"significance
 tests" to gauge the importance of their findings. First developed in
the 1920s, these tests are routinely used throughout the scientific
community.
 Thousands of scientific papers and millions of pounds of research
funding have been based on their conclusions. They are ubiquitous and
easy to use. And they are fundamentally and dangerously
 flawed.

 Used to analyse clinical trials, these textbook techniques can easily
double the apparent effectiveness of a new drug, and turn a borderline
result into a highly "significant" breakthrough. They can throw up
 convincing yet utterly spurious evidence for "links" between diseases
and any number of supposed causes. They can even make lend impressive
support to claims for the existence of the paranormal.

 The very suggestion that these basic flaws in such widely-used
techniques could have been missed for so long is astonishing.
Altogether more astonishing, however, is the fact that the scientific
community
 has been repeatedly warned about these flaws - and has ignored them.

 As a result, thousands of research papers are being published every
year whose conclusions are based on techniques known to be unreliable.
The time and effort - and public money - wasted in trying to confirm
the consequent spurious findings is one of the great scientific
scandals of our time.

 The roots of this scandal are deep, having their origins in the work
of an English mathematician and cleric named Thomas Bayes, published
over 200 years ago. In his "Essay Towards Solving a Problem in the
Doctrine of Chances", Bayes gave a mathematical recipe of astonishing
power. Put
 simply, it shows how we should change our belief in a theory in the
light of new evidence.

 One does not need to be a statistician to see the fundamental
importance of "Bayes's Theorem" for scientific research. From studies
of the cosmos to trials of cancer drugs, all research is ultimately
about finding out how we should change out belief in a theory as new
data emerge.

 For over 150 years, Bayes's Theorem formed the foundation of
statistical science, allowing researchers to assess the meaning of new
results. But during the early part of this century, a number of
influential mathematicians and philosophers began to raise objections
to Bayes's Theorem. The most damning was also the simplest: different
people could use Bayes's Theorem and get different results.

 Faced with the same experimental evidence for, say, ESP, true
believers could use Bayes's Theorem to claim that the new results
implied that telepathy is almost certainly real. Sceptics, in
contrast, could
 use Bayes's Theorem to insist they were still not convinced.

 Both views are possible because Bayes's Theorem shows only how to
alter one's prior level of belief - and different people can start out
with different opinions.

 To non-scientists, this may not seem like an egregious failing at
all: what one person sees as convincing evidence may obviously fail to
impress others. No matter: the fact that Bayes's Theorem could lead
different people to different conclusions led to its being
inextricably linked to the most rebarbative concept known to
scientists: subjectivity.

 It is hard to convey the emotions roused within the scientific
community by the S-word. Subjectivity is seen as the barbarian at the
gates of science, the enemy of objective truth, the destroyer of
insight.
 It is seen as the mind-virus that has turned the humanities an
intellectual free-for-all, where the idea of "progress" is dismissed
as bourgeois, and the belief in "facts" naïve. Once allowed into the
citadel of science, runs the argument, subjectivity would turn all
research into glorified Lit. Crit.

 By the 1920s, Bayes's Theorem had all but been declared heretical -
which created a problem: what were scientists going to replace it
with? The answer came from one of Bayes's most brilliant critics: the
Cambridge mathematician and geneticist, Ronald Aylmer Fisher - and
father of modern statistics.

 Few scientists had greater need of a replacement for Bayes than
Fisher, who frequently worked with complex data from plant breeding
trials. Drawing on his great mathematical ability, he set about
finding a new and completely objective way of drawing conclusions from
experiments. By 1925, he believed he had succeeded, and published his
techniques in a book, "Statistical Methods for Research Workers". It
was to become one of the most influential texts in the history of
science, and laid the foundations for virtually all the statistics now
used by scientists.

 On the face of it, Fisher had achieved what Bayes claimed was
impossible: he had found a way of judging the "significance" of
experimental data entirely objectively. That is, he had found a way
that anyone could use to show that a result was too impressive to be
dismissed as a fluke.

 All scientists had to do, said Fisher, was to convert their raw data
into something called a P-value, a number giving the probability of
getting at least as impressive results as those seen by chance alone.
If this P-value is below 1 in 20, or 0.05, said Fisher, it was safe to
conclude that a finding really was "significant".

 Combining simplicity with apparent objectivity, Fisher's P-value
method was an immediate hit with the scientific community. Its
popularity endures to this day. Open any leading scientific journal
and you will see the phrase "P < 0.05" - the hallmark of a significant
finding - in papers on every conceivable area of research, from
astronomy to zoology. Every year, new statistics textbooks appear to
explain Fisher's simple little recipe to a new generation of
researchers.

 But just as scientists were adopting P-values, a few awkward question
started to be asked by other statisticians. The most telling was
raised by the distinguished Cambridge mathematician Harold Jeffreys.
Writing in his own treatise on statistics, Theory of Probability,
published in 1939, Jeffreys asked an obvious question: just why should
the dividing line for significance be set at Fisher's value of 0.05 ?

 This seemingly innocuous question has profound implications, for
Fisher's figure of 0.05 is still the sine qua non for deciding if a
scientific result is "significant". All scientists know that if their
experiment
 gives a P-value meeting Fisher's standard they are on their way to
having a publishable paper.

Fisher's standard is even more important for pharmaceutical companies,
as national regulatory organisations still use Fisher's 0.05 figure to
decide whether to approve a new drug for general release. Getting drug
trial results with P-values that beat Fisher's standard can thus make
the
 difference between millions in profits or bankruptcy.

 So just what were the brilliant insights that led Fisher to choose
that talismanic figure of 0.05, on which so much scientific research
has since stood or fallen ? Incredibly, as Fisher himself admitted,
there weren't any. He simply decided on 0.05 because it was
mathematically convenient.

 The implications of this are truly disturbing. It means that key
scientific questions such as whether a new heart drug is seen as
effective or whether diet really is linked to cancer are being decided
by an
 entirely arbitrary standard chosen over 70 years ago for mathematical
"convenience".

 This would not matter if Fisher had been lucky, and chosen a figure
that makes the risk of being fooled by a fluke result very low. Yet
statisticians now know that his choice was a particularly bad one -
and that many supposedly "significant" findings are in fact entirely
spurious.

 The first hints of this deeply worrying feature of Fisher's methods
first emerged as long ago as the early 1960s, following a resurgence
of interest in Bayes's Theorem. Many of the supposedly "insuperable"
objections to its use were shown to be baseless, and the theorem has
since emerged as
 one of the axioms of the entire theory of probability. As such, its
implications for statistics cannot be wished away - no matter how
noisome scientists might find them.

 And the most important of those implications is that - as Bayes
himself had insisted 200 years ago - it is indeed impossible to judge
the "significance" of data in isolation. Crucially, the plausibility
of the
 data has to be taken into account.

 Using Bayes's Theorem, a number of leading statisticians began to
probe the reliability of P-values as a measure of significance. What
they discovered could hardly be more serious.

 On the face of it, Fisher's standard of 0.05 suggests that the
chances of mere fluke being the real explanation for a given result is
just 5 in 100 - plenty of protection against being fooled. But in
1963, a team of statisticians at the University of Michigan showed
that the actual chances of being fooled could easily be 10 times
higher. Because it fails to take into account plausibility, Fisher's
test can see "significance" in results which are actually over 50 per
cent likely to be utter nonsense.

 The team - which included Professor Leonard Savage, one of the most
distinguished experts on probability of modern times - warned
researchers that Fisher's little recipe was "startlingly prone" to see
significance in fluke results.

 Despite being published in the prestigious Psychological Review, it
was a warning that went unheeded. Over the next 30 years, other
statisticians have also tried to sound the alarm bell, again without
success. During the 1980s, Professor James Berger of Purdue University
- a world authority
 on Bayes's Theorem - published a entire series of papers again
warning of the "astonishing" tendency of Fisher's P-values to
exaggerate significance. Findings that met the 0.05 standard, said
Berger, "Can actually arise when the data provide very little or no
evidence in favour of an effect".

 Again, the warnings were ignored.

 In 1986, one scientist decided to take direct action against the
failings of Fisher's methods. Professor Kenneth Rothman of the
University of Massachusetts, editor of the well-respected American
Journal of Public Health told all researchers wanting to publish in
the journal that he would no longer accept results based on P-values.

 It was a simple move that had a dramatic effect: the teaching in
America's leading public health schools was transformed, with
statistics courses revised to train students in alternatives to
P-values. But two years later, when Rothman stepped down from the
editorship, his ban on P-values was
 dropped - and researchers went back to their old ways.

 It has been a similar story in Britain. In 1995, the British
Psychological Society and its counterpart in America quietly set up a
working party to consider introducing a ban on P-values in its
journals. The following year, it was disbanded - having made no
decision. "It just sort of petered out", said one insider. "The view
was that it would cause too much upheaval for the journals".

 Leading British medical journals have also examined the idea of
banning P-values, but they too have pulled back. Instead, they merely
suggest that researchers use other means of measuring significance.
Yet these alternative methods are know to suffer similar flaws to
P-values, exaggerating both the size of implausible effects and their
significance.

 More than 30 years after the first warnings were sounded, it has
become clear that the scientific community has no intention of dealing
with the flaws in significance tests. Yet the evidence of those flaws
is everywhere to be seen: flaky claims of health risks from a host of
implausible causes,
 "wonder drugs" that lose their amazing abilities outside clinical
trials, bizarre "links" between genetics and personality.

 A striking feature of the excuses given for the lack of action is
that they centre on issues like "upheaval for our journals" and the
"radical changes" needed in the training of scientists. Curiously for
a profession supposedly dedicated to discovering truths, issues such
as "reliability of research conclusions" are never mentioned.

 It is hard to avoid the conclusion that the real explanation for all
the foot-dragging is not scientific at all. It is simply that if
scientists abandon significance tests like P-values, many of their
claims would be
 seen for what they really are: meaningless flukes on which
tax-payers' money should never have been spent.

 The plain fact is that in 1925 Ronald Fisher gave scientists a
mathematical machine for turning baloney into breakthroughs, and
flukes into funding. It is time to pull the plug. O Robert Matthews'
full account of the issues raised in this article, "Facts versus
Factions: the use and abuse of subjectivity in scientific research",
is available from the European Science and Environment Forum, 4 Church
Lane, Barton, Cambridge CB3 7BE, price 3 pounds.
_________________________________________________________
DO YOU YAHOO!?
Get your free @yahoo.com address at http://mail.yahoo.com



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

May 2024
April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
May 2019
December 2018
November 2018
October 2018
September 2018
June 2018
May 2018
April 2018
February 2018
January 2018
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
February 2017
January 2017
December 2016
September 2016
August 2016
June 2016
May 2016
March 2016
January 2016
December 2015
November 2015
September 2015
August 2015
July 2015
May 2015
April 2015
March 2015
February 2015
January 2015
October 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
March 2013
February 2013
January 2013
November 2012
October 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
July 2009
February 2009
January 2009
December 2008
October 2008
September 2008
July 2008
June 2008
April 2008
March 2008
February 2008
October 2007
August 2007
July 2007
June 2007
May 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000
1999
1998


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager