In addition to suggestions for references reported to the list earlier, here is a more discursive reply, and the only reply giving a reference that explicitly recommends stars:
-----Original Message-----
From: [log in to unmask] [mailto:[log in to unmask]]
Sent: 11 May 2008 19:30
I used to work in psychological research, and the use of super-scripted
asterisks is not only common there, but is recommended in the American
Psychological Association's publication manual. Specifically, in section
3.69 it states, "Identify statistically significant F ratios with
asterisks, and provide the probability values in a probability footnote
(see section 3.70)" (APA Pub. Man., 5th ed., p. 160; see also Table
Example 7 on p. 162, and Table Example 8 on p. 163). In section 3.70, it
goes on to state, "A probability note indicates the results of tests of
significance. Asterisks indicate those values for which the null
hypothesis is rejected, with the probability (p value) specified in the
probability note. Include a probability note only when relevant to
specific data within the table. Assign a given alpha level the same
number of asterisks from table to table within your paper, such as *p <
.05 and **p < .01; the largest probability receives the fewest
asterisks" (p. 170).
My understanding of the nature of the problem is as follows.
Traditionally, there are three approaches to statistical inference:
Bayesian, Frequentist (Neyman & Pearson), and Fiducial/Fisherian (R.A.
Fisher).
Under the Frequentist approach, the researcher compares the observed
test statistic to the corresponding critical value for a specified alpha
level. Equivalently, the researcher can compare the p-value of the
observed test-statistic to the alpha level. (For simplicity, I am
ignoring adjustments for multiple comparisons, and two-tailed versus
one-tailed tests.) Of course, the alpha level is supposed to be chosen
before collecting the data, and the specific value should depend upon
what the researcher thinks is an appropriate level given the various
trade-offs. In practice, conventions of .10, .05, and .01 are used in
the mistaken belief that this is somehow objective. Even worse is when
researchers decide to use either .05 or .01 after they have collected
the data and run the analyses --something which I saw a lot of in
psychology, but I digress.
The use of a single super-scripted asterisk makes some sense for
hypothesis testing in the Frequentist approach, but the use of multiple
super-scripted asterisks (e.g., "*p<.10, **p<.05, ***p<.01, ****p<.001")
does not because there is only one alpha level chosen by the researcher
for that test. Other researchers may feel that a different alpha level
would be more appropriate, and thus it is desirable to report the
observed test statistic (and possibly its associated p-value).
In the Fiducial/Fisherian approach, the p-value serves as a form of
evidence against the null hypothesis. (The Bayesians are quick to point
out that it is rather a strange form of evidence, in part because it
includes the probability of the observed result *and anything more
extreme than it* given that the null hypothesis is true.) In this sense,
multiple super-scripted asterisks (e.g., "*p<.10, **p<.05, ***p<.01,
****p<.001") serve as a crude summary of the degree of "evidence". It is
much more desirable to report the p-value itself because how much
"evidence" is sufficient to reject the null hypothesis is again a matter
of opinion, and researchers may differ.
Unfortunately, as is all to common, I foolishly never took a course on
Bayesian statistical analyses whilst I was in graduate school, and so my
knowledge of the Bayesian approach is rather poor. Nevertheless, I doubt
that the use of super-scripted asterisks makes any sense under that
approach either.
(If any of the above is in error, I am sure that someone on allstat will
correct me. :-)
My understanding of this topic is based heavily upon (my recollection
of) the following articles:
Goodman, S. N. (1999a). Toward evidence-based medical statistics. 1: The
p value fallacy. Annals of Internal Medicine, 130 (12), 995-1004.
http://www.annals.org/content/vol130/issue12/
Gigerenzer, G. (1993). The Superego, the Ego, and the Id in statistical
reasoning. In G. Keren and C. Lewis (Eds.), A handbook for data analysis
in the behavioral sciences: Methodological issues (pp. 311-339).
Hillsdale, NJ: Lawrence Erlbaum Associates.
Gigerenzer, G., Krauss, S., & Vitouch, O. (2004). The null ritual: What
you always wanted to know about significance testing but were afraid to
ask. In D. Kaplan (Ed.), The SAGE handbook of quantitative methodology
for the social sciences (pp. 391-408). Thousand Oaks, CA: SAGE
Publications.
Hubbard, R., & Bayarri, M. J. (2003). Confusion over measures of
evidence (p's) versus errors (alpha's) in classical statistical testing
(with discussion). The American Statistician, 57 (3), 171-182.
It is possible that one or more of them may serve as the authoritative
reference you seek.
Finally, I don't know if you were already thinking about this, but this
would be a great topic for those wonderful short articles you write in
the RSS journal, "Significance". (I hope I am not confusing you with
another author.)
Kindest regards,
Chris
------------------------------
Date: Wed, 7 May 2008 13:39:44 +0100
From: "Allan Reese (Cefas)" <[log in to unmask]>
Subject: Use of asterisk (stars / * ** ***) when reporting statistics
Dear colleagues
I recently commented to a journal editor that the * notation was
regarded a=
s outmoded and widely deplored, and he responded that he'd not seen any
con=
demnation in the places he read. I'm sure he is right, and the same
probab=
ly goes for most other editors. In the allstat archive (20 July 2000),
the=
re is a summary of statisticians' comments on the reporting of p values.
=20
QUESTION: can anyone recommend a cogent and authoritative reference for
edi=
tors that will persuade them that current practices on the reporting of
sta=
tistical results can and should be improved?
A Google search shows asterisks used by UK Department of Health, Social
Ser=
vices and Public Safety (DHSSPS), Home Office, the National Office of
Anima=
l Health, and many other groups. Wikipedia says "Popular levels of
signifi=
cance are 5%, 1% and 0.1%" but quotes J. Scott Armstrong that attempts
to e=
ducate researchers on how to avoid pitfalls of using statistical
significan=
ce have had little success.
The strongest advice against that I've found are:
Demographic Research:
"Submissions to our journal should present indicators of statistical
signif=
icance in a manner that facilitates the interpretation of results,
perhaps =
in separate table columns when appropriate. Significance asterisks are a
po=
or substitute for this."
Political Analysis:
In most cases, the uncertainty of numerical estimates is better conveyed
by=
confidence intervals or standard errors (or complete likelihood
functions =
or posterior distributions), rather than by hypothesis tests and
p-values. =
However, for those authors who wish to report "statistical
significance," s=
tatistics with probability levels of less than .001, .01, and .05 may be
fl=
agged with 3, 2, and 1 asterisks, respectively, with notes that they are
si=
gnificant at the given levels.=20
Allan
***************************************************************************=
********
This email and any attachments are intended for the named recipient
only. =
Its unauthorised use, distribution, disclosure, storage or copying is
not p=
ermitted. If you have received it in error, please destroy all copies
and =
notify the sender. In messages of a non-business nature, the views and
opi=
nions expressed are the author's own and do not necessarily reflect
those o=
f the organisation from which it is sent. All emails may be subject to
mon=
itoring.
***************************************************************************=
********
------------------------------
***********************************************************************************
This email and any attachments are intended for the named recipient only. Its unauthorised use, distribution, disclosure, storage or copying is not permitted. If you have received it in error, please destroy all copies and notify the sender. In messages of a non-business nature, the views and opinions expressed are the author's own and do not necessarily reflect those of the organisation from which it is sent. All emails may be subject to monitoring.
***********************************************************************************
|