JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for ALLSTAT Archives


ALLSTAT Archives

ALLSTAT Archives


allstat@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

ALLSTAT Home

ALLSTAT Home

ALLSTAT  2003

ALLSTAT 2003

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Summary: Power calculations and effect sizes for Wilcoxon rank-sum test

From:

Tim Auton <[log in to unmask]>

Reply-To:

Tim Auton <[log in to unmask]>

Date:

Thu, 6 Nov 2003 12:08:48 -0000

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (257 lines)

Thanks to all the allstat members who replied to my earlier query.  Your
responses have been very helpful and helped me in responding to the
proposal.

The original question was:

>I have just been reviewing a study proposal.
>
>Two (medical) treatments are to be compared, and the primary efficacy 
>measure is a continuous variable expected to have a non-normal 
>distribution.
>
>The primary analysis is to use a Wilcoxon test to compare the 
>treatments.
>
>Sample size was calculated, correctly, to have 80% power when p1 = 
>0.75, where p1 is the probability that an observation in group 1 will 
>be less than an observation in group 2.  (This is the nomenclature and 
>definition used by
>nQuery)
>
>
>
>I have two questions for the list
>
>1.  Is there a short name for p1 defined as above, or some variant on 
>it?
>
>2.  Do others share my concern that p1 is not the way most people will 
>think about differences between treatments (although it is the obvious 
>parameter to use to power a Wilcoxon test), which mean it is almost 
>impossible to assess whether p1 = 0.75 is an appropriate effect size to 
>design for.

1.  Names for p1:
'Individual exceedance probability' and 'Dominance probability' have been
suggested.
Roger Newsom pointed out that p1 is closely related to Somer's D (= p1 -
p2), and Robert Newcombe points out that p1 is the same as the AUROC (area
under receiver operating characteristic curve).


2. Concerns about use of p1.
Most respondents affirmed my view that treatment differences are usually
more relevant, and Stephen Senn points out that p1 effectively divides this
by the error standard deviation.
However, this is less true if the outcome scale is unfamiliar or ad hoc.

Detailed responses are given below:

Best wishes

Tim Auton

The views, opinions and judgements expressed in this message are solely
those of the author.  The message contents have not been reviewed or
approved by Protherics.

T R Auton PhD MSc C.Math
Head of Biomedical Statistics
Protherics Molecular Design Ltd
The Heath Business and Technical Park
Runcorn
Cheshire
WA7 4QF
UK
email: [log in to unmask]


From: Roger Newson 
Lecturer in Medical Statistics
Department of Public Health Sciences
King's College London
5th Floor, Capital House
42 Weston Street
London SE1 3QD
United Kingdom
Email: [log in to unmask]


Hello Tim

A possible name for your "p1" is a dominance probability. However, I 
personally tend to think in terms of Somers' D, which (in the case of a 
2-sample Wilcoxon test) can be defined as p1-p2, where p1 is the 
probability that a randomly-chosen member of Population A has a greater 
value than a randomly-chosen member of Population B and p2 is the 
probability that a randomly-chosen member of Population B has a greater 
value than a randomly-chosen member of Population A. Somers' D has the 
advantage that it is still a measure of dominance, even if there are tied 
values.

An alternative parameter tested by the Wilcoxon test is the median 
difference (or the median ratio in the case of positive-valued outcome 
variables). However, median differences and ratios are defined in terms of 
Somers' D. Also, power calculations are more easily defined in terms of 
detectable levels of Somers' D than in terms of detectable levels of median 
difference or ratio, because the Central Limit Theorem typically works a 
lot quicker for Somers' D than for the median difference or ratio. (I 
suspect that the nQuery package assumes that the 2 population distributions 
differ only in location. This assumption will underestimate the standard 
error of Somers' D when the larger of the 2 samples is from the less 
variable of the 2 populations, and will overestimate the standard error of 
Somers' D when the larger of the 2 samples is from the more variable of the 
2 populations.) If you do not want to talk about either Somers' D or median 
differences or ratios, then your best bet is probably to transform the data 
and to measure ratios between geometric means (using log-transformed data) 
or ratios or differences between algebraic means (a less common option, 
using power-transformed data).

You might like to read an article of mine about the parameters behind 
so-called "non-parametric" statistics (Newson, 2002), for which a 
pre-publication draft may be downloaded from my website at
http://www.kcl-phs.org.uk/rogernewson/
where youu can also download some papers, and a presentation, about 
calculating confidence intervals for these parameters using the Stata 
statistical package.

I hope this helps.

Best wishes

Roger

References

Newson R. Parameters behind "nonparametric" statistics: Kendall's tau, 
Somers' D and median differences. The Stata Journal 2002; 2(1): 45-64.


From: E-Mail: (Ted Harding) <[log in to unmask]>

This is quite likely to be true. What people will want to know about the
difference between two treatments is, well, their difference! That is to say
(since apparently there are quantitative data here) what is (X2-X1) likely
to be where X2 is from Group 2 and X1 from Group 1.

One way to get a feel for this would be to see what, for different types of
quantitative distribution (Normal, log-Normal, ... ), the parameter
difference corresponding to P(X2<X1) = 0.75 is. In the case of a Normal
N(mu, sigma^2) this will be in terms of mu/sigma, for instance, so would
still leave some questions hanging in the air. In the case of a log-normal,
the parameter difference would be effectively the same, but it corresponds
to differences between log(X2)-log(X1). It could be worked out what
difference (X2-X1) this corresonded to, but it would be even more sensitive
to sigma than (X2-X1) for a Normal distribution. You would need to make a
judgement about which scale (raw or log) was most meaningful in real life.
And so on.

I suspect the study designers may have got cold feet about facing up to the
quantitative issues, preferring to hide behind a distribution free test
which sweeps most quantitative aspects under the carpet!

Best wishes,
Ted.

From: Dr Dennis O. Chanter
Director
Statisfaction Statistical Consultancy Ltd
Tel: +44 (0)1424 219202
Fax: +44 (0)7005 982219 
Mobile: +44 (0)7904 101470
E-mail: [log in to unmask]

Hello Tim,

Good question!

However, if p1 (no I don't know a succinct name for it) is not the way 
most people think about differences between treatments, then I assume you 
mean that most people think in terms of a difference in treatment means 
(or maybe medians if the distributions are skewed).  But by electing to 
use a Wilcoxon test based on ranks, aren't you (or they) already saying 
that such methods of thinking about treatment differences are not 
appropriate in this case?  If you want to think about treatment 
differences in terms of estimates of location parameters, then (if there 
are distributional difficulties) the obvious choice of test would surely 
be a randomisation test.  OK it might mean that the power calculations 
have to be done by simulation, but at least the characterisation of the 
difference and the choice of test statistics are compatible.

Regards

Dennis

From: Robert G. Newcombe, PhD, CStat, Hon MFPH
Reader in Medical Statistics
University of Wales College of Medicine
Heath Park
Cardiff CF14 4XN, UK.
Phone 029 2074 2329 or 2311
Fax 029 2074 3664
Email [log in to unmask]

I don't think anyone has come up with a snappy name for p1 - which 
after all strictly isn't a parameter of any distribution.  At the 
analysis stage, the corresponding quantity is U/mn, the Mann-Whitney-
Wilcoxon U statistic divided by the product of the two sample sizes.  
It is the same as the AUROC (area under receiver operating 
characteristic curve).  Care is needed when distributions are 
discrete and hence ties are likely - the definition must include half 
the probability that the two observations are equal, and also, even 
when this is done, the observed value is then a biased estimate of 
the true value - see Hanley JA, McNeil BJ (1982),  The meaning and 
use of the area under a receiver operating characteristic (ROC) 
curve,  Radiology 143, 29-36.  Even though you refer to a continuous 
distribution above, this may still be an issue as in practice all 
continuous variables other than those captured digitally are recorded 
in discrete form.

I agree that this pseudo-parameter is an obvious choice to base a MWW 
power calculation on - and indeed also to use as a sample-size-free 
summary measure for summarising the results.  Unfortunately it isn't 
very familiar to most statisticians, let alone users - a situation I 
would very much like to alter.  If the outcome scale is obscure or 
developed ad hoc, then there is little point in summarising the 
results by giving the estimated median difference with a CI, as 
others will have little idea of what size of difference on this scale 
is important.  In this situation, a relative measure seems more 
appropriate, and p1 alias U/mn alias AUROC seems the obvious choice.  

The best way to visualise what a p1 value amounts to is to draw two 
Gaussian distributions each with SD 1, and with peaks d units apart, 
i.e. d is the standardised difference.  Then the p1 corresponding to 
any value of d is PHI(d/sqrt(2)), where PHI is the cdf of the 
standard Normal distribution.  For p1 = 0.75, d/sqrt(2) is 0.674, so 
this choice of p1 corresponds to two Gaussian distributions with 
peaks 0.674 * sqrt(2) = 0.95 SDs apart.  This admittedly only 
converts one relative measure, p1, into another, d, but it does 
facilitate an interpretable visual display.

Hope this helps.

   
From: Stephen Senn
Professor of Statistics
Department of Statistics
15 University Gardens
<http://www.gla.ac.uk>University of Glasgow
G12 8QQ
email [log in to unmask]

Dear Tim,
This is related in my view to the issue of sigma-divided measures. I have 
called p1 the 'individual exceedence probability' in

S.J. Senn. (1997) Testing for individual and population equivalence based 
on the proportion of similar responses, Statistics in Medicine, 16,
1303-1306. and also in

  In my view as a concept it makes no sense except in the context of random 
sampling and hence is not really relevant to clinical trials.

Regards

Stephen

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

May 2024
April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000
1999
1998


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager