JISCMail - ALLSTAT Archives

Email discussion lists for the UK Education and Research communities

Subscriber's Corner

Email Lists

ALLSTAT Archives

allstat@JISCMAIL.AC.UK

View:

Message:

[

First

Last

]

By Topic:

[

First

Last

]

By Author:

[

First

Last

]

Font:

Proportional Font

		LISTSERV Archives
		ALLSTAT Home
		ALLSTAT 2005

Options

Subscribe or Unsubscribe

Get Password

Subject:

SUMMARY: multiple testing

From:

Timothy Mak <[log in to unmask]>

Reply-To:

[log in to unmask]

Date:

Mon, 26 Sep 2005 10:07:42 +0100

Content-Type:

text/plain

Parts/Attachments:

text/plain (204 lines)

Hi Allstat, 

The response to my question exceeded my expectation. While different 
authors offered different advice/opinions, what is clear is that this 
question is a contentious one. In particular, I want to highlight the two 
late replies given by Blaise Egan and Duncan Hedderly, which I think 
address my concern best. Both of the articles favoured the view that no 
multiple testing procedure is needed for the usual type of epidemiological 
research, but do look at the response to the BMJ article which argued that 
statisical adjustments are mandatory! 

I think that the issue may be less complicated than it seems if we 
considered the background of the authors. Epidemiologists tend to favour 
not using these multiple testing procedures, while clinical trialists tend 
to favour them. Epidemiologists tend to ask many loosely related questions 
in a study and therefore using multiple testing procedures does tend to 
add to the confusion by somehow suggesting that they are all under one 
experiment, while clinical trialists usually have one major goal in mind 
in a study, and using multiple comparison procedures helps to reduce false 
positives when such results are used for clinical decisions. 

Lastly if one do decide to adjust for multiple testing, there now seems to 
be clear, better alternatives than Bonferroni correction. I haven't had 
time to read about the False Discovery Rate, or the Holm method, but the 
references are here if you want to look further. 

Thanks again to all who responded. Given below is my original query 
followed by all relevant answers. 

Original query: 

When I learnt statistics, I learnt that if you are going to test more than 
once in your experiment, you should adjust for multiple testing, usually 
by means of a Bonferroni correction. However, having actually done 
statistics for psychiatric research for a year, I found that in practice, 
one can't really do it. That's because in the work that I do, people 
generally want to test many things in a single paper, not to mention the 
quality control tests such as testing for age difference at baseline. I'm 
sure other medical statisticians will no doubt have seen one of these 
papers littered all over with p-values. 

My question is: How have more experienced medical statisticians come to 
terms with this?  We usually collect massive amount of information per 
project. Each score will have sub-scores, and sub-scores are made up of 
individual questions. Are individual questions really not of interest? But 
if we look at each individual question separately then no doubt we'll end 
up with a plethora of tests per paper. After all, is there really no value 
in fishing for significant results? If we don't do this, how are we going 
to discover something new? 

Thanks for any comments. 

******************************

Michael meyners: 

in brief, you might want to use the False Discovery Rate (FDR).  To start
with, see Benjamini & Hochberg, J R Stat Soc B, 57, 1995, 289-300. Also,
you might want to browse a little through the literature for the analysis
of gene expression data, as they have a similar problem as well (I'd say
that their hypotheses are less "dependend" than yours might be, but it
might give you some ideas, though).

*********************

Allan Reese: 

I used to deal with many student surveys, generally in social science or 
education.  The advice I offered was that individual questions were 
generally not of interest for testing as the questionnaire had been 
designed with groups of related questions and often an expectation of 
observing certain interactions.  P-values should therefore be interpreted 
in relation to what the researcher expected (an informal Bayesian 
approach) and patterns of p-values should be looked for.  In particular, 
since most student studies have small samples subject to biases of 
accessibility, having a set of questions that showed non-significant 
effects but all in the expected direction should *not* be reported simply 
as "no significant effects were found".  In practice, I observed that 
effects were generally nowhere near significance or were highly 
significant even for the small (generally about 100 cases) samples.  I 
attributed this to the influence of researchers' prior knowledge - ie they 
were demonstrating effects they anticipated, not looking at random for 
correlations.

It seems to me that statistics should more commonly be presented as used 
in two contexts: (1) exploratory, where a set of data is examined for 
pattern and a reasonable question is to ask how often one is being mislead 
by chance coincidences, and (2) as a quality assurance technique for 
measurements in the known presence of variation.  Researchers too often 
assume that ideas relevant to the latter (sample size, power) can be 
arbitrarily applied to the former. 

A final thought is to suggest that too many papers stop short at the 
p-value.  Authors should be coerced to take the next step and explain 
*what* the (significant) effect is and *why* it is important.  That would, 
for example, put many claims of relative risk into clinical perspective. 

**********************

Roger Newson: 

The issue of multiple comparisons is a fast-moving field at this point in 
the early 21st century, and there is no consensus regarding the best 
approach, even amongst statisticians. However, I have written a paper on 
the subject in The Stata Journal, summarizing other people's thoughts and 
adding a few of my own, and have implemented a few multiple-test 
procedures 
in Stata (Newson, 2003). A preprint of this reference can be downloaded 
from my website, where you can also download a presentation on the subject 

that I gave at the 2003 UK Stata User Meeting.

********************

Tzippy: 

The newest method is the Binyamini's and Hochberg's False Detection Rate
(FDR)
It controls the percentage of false significances in a scenario, rather 
than
the alpha for each test.
Sas's Proc Multtest gives several options for multiple testing, that are
less conservative than Bonferroni.

*********************

Allan White: 

I, too, am concerned about this. One glaring discrepancy has been 
bothering
me for some time. If we conduct a one-way ANOVA which yields a significant
F value, we often follow it up with a Tukey test for all possible pairwise
comparisons. This gives p values which are adjusted to allow for the fact
that we are doing a number of tests, so that the experiment-wise p value
is at the desired level. That is fair enough. However, this is in marked
contrast to what is typically done when we do, say, a 4-way ANOVA, which
yields 15 effects (4 main effects, 6 2-way interactions, 4 3-way 
interactions
and a 4-way interaction). The p values for each of these effects is NEVER
adjusted for the fact that we are looking at 15 effects, i.e. we are 
giving
ourselves 15 chances of finding something significant!

However, in spite of the inconsistencies that we have noted, the problem
of multiple tests is real enough. In the example that I just quoted, the
chances of getting one of more 15 effects significant at a nominal 5 per
cent level is approximately 50:50. We really need to be far more rigorous
and consistent in dealing with this type of problem than we currently are.

Nevertheless, you do have a point about the legitimacy of "fishing
expeditions". If we are too rigorous in correcting p values for multiple
tests, then we run the risk of missing something which is really there.
One solution that occurs to me (but which I have never seen used in 
practice)
is to split the data set in two on a random basis and to carry out the
same analysis on each half of the data. The chances are that only effects
that are really there will appear as significant in both analyses. Effects
that are significant in one half as a result of pure chance will only 
rarely
be significant in the other half. Of course, there is a loss of power in
splitting your data in two in this way but, with a large dataset, this may
matter a lot less than the benefit gained.

**********************

Sue Richards: 

I think the basic principle we work to is:
1. Pre-sepcify, before looking at the data, a limited set of 'primary' 
analyses. 
Hopefully these are not too many, and if clearly stated, then results can 
be 
viewed bearing in mind the multiplicity of tests.
2. All other analyses should be regarded as 'hypothesis generating' only. 
In 
papers, it should be made clear what tests are done, again so that 
multiple 
testing can be born in mind.
There remains the problem of over-interpretation by those who do not 
understand the issue, and we all need to add 'health warnings'.
The most frequent problem is not what is reported, but the lack of detail 
on 
what has been done and NOT reported, meaing that we are unaware of the 
multiple testing.

***********************

Blaise Egan: 

I suggest you read this excellent discussion in the British Medical 
Journal

http://bmj.bmjjournals.com/cgi/content/full/316/7139/1236?view=full&pmid=955

3006

**********************

Duncan Hedderly: 

I probably worry about this less than I ought.  You might find the 
articles by Schulz & Grimes in the Lancet (2005, vol 365, pp1591-95 and 
pp1657-61) interesting

Top of Message | Previous Page | Permalink

JiscMail Tools

Files Area | help

RSS Feeds and Sharing

Search Archives

Advanced Options

Archives

May 2024
April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000
1999
1998

JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk