Dear All,
sorry for the long e-mail: several people asked me to distribute the
answers I received to my three questions, these are given below.
Many thanks to all who took the time to give me assistance with these
matters.
Regards
Graham
QUESTION 1.
I have a data set to analyse regarding a range of professionals'
response to a questionnaire. I have done PCA on all the results and
then follow up investigation to see which professional groups differ
from other professional groups with respect to the derived components.
However in looking to publish the results it is clear that some journals
would be more interested in a subset of those professional groups. So
how to proceed? I could use the PCs as derived from all the
questionnaires to generate 'scores' for all individuals which I then
analyse for the sub set of professions; or I could generate PCs derived
only from individuals belonging to the sub set of professions and
analyse those (these PCs are similar but not identical).
In doing further (effectively post-hoc) analysis on all the professions
my subset professions are often placed in different 'homogeneous groups'
however when I simply work with the sub set of professions then the
analysis more frequently says that these subset professions belong to
one 'homogeneous group' (presumably because of the reduction in overall
sample size is reducing power). (One of my sub set professions has only
8 members compared to ~50, ~100 in the other two sub set professions;
the overall - all professions - sample size is about 1200.)
So really, my question how to proceed with analysing the sub-set of
professions. What is the more effective/rigorous analysis in terms of
statistics and also in terms of what editors of journals may accept.
Any advice/opinion/references would be welcomed
REPLIES:
(a) what is wrong with you deciding the appropriate analysis, not the
'journal'? [but take advice from their peer reviewers of course] And
discuss with a statistician face to face – more productive for initial
consultations.
And what is wrong with publishing the results for each subset,
indicating that the data was not collected for such a comparison?? [the
fact that each individual profession is 'homogeneous' is of some
interest, even if the information is indicative only and not
'statistically significant' - and ideally you should compare/contrast
with whether that is what would be expected from underlying theory or
other evidence]
I suggest that you first consider how to plot/display the data both
overall, and for each subgroup - "a picture is worth a thousand words" -
ideally in a format that allows you to identify any outliers [as these
will have strong effects particularly on v small samples]
You can then consider reporting the differences, whether these are
statistically significant or not at the 5% level, stating what test you
have used. And if your sample size is really too small for that, then
simply say so [or gather a larger sample]. For statistical purposes
though, it is generally considered that a sample subgroup size of 8 is v
small for statistical testing, so you'll want to use a robust test.
There are several possible hypothesis [and thus tests] eg whether all
groups are the same, pairwise tests between groups, whether particular
groups have higher average scores than the 'general' population, whether
the variance within each profession is the same etc. Hard to assess what
the appropriate tests are [and/or whether these should be based on your
PCA scores or on the underlying data] without knowledge of the data
collection and results thus far.
[When considering making comparisons between a particular group and the
'overall average' you may wish to consider if your sample is
'representative' of the population in terms of the mix of those in each
profession, and apply re-weighting as needed]
'Practical Non-parametric Statistics' [W.Conover] has various test that
are more robust in the sense of not making parametric assumptions - any
book on non-parametric statistics will cover some. There are lots of
tests in '100 Statistical tests' [G Kanji] which may also help. The
other ways to be more 'robust' is to test at the 1% level rather than
the 5% level.
Another important point is that as part of the usual scientific method,
you shouldn't really use the same data to both form the hypothesis and
test the hypothesis. You can use the whole data set only if your
hypothesis is derived from some other source [underlying theory, other
previous evidence etc]. Otherwise you need to split your data in half in
some way [random selection ideally], work out the hypothesis/hypotheses
from 1/2 the data then test these using the other 1/2
On your point on length, maybe the solution is to draft a full paper on
the complete findings and put it on [your?] a website, and in your
journal articles mention the context/key findings in the opening
paragraph or two as context, and provide a web reference for those who
want to see the complete analysis. If the full results are interesting
that provides access for those interested, whilst dealing with journal
article length issues
QUESTION 2
In terms of ranking type data, simpler type analyses can be carried out
'non parametrically' using techniques found in texts like siegal and
castalan. However I was wondering how to approach analyses that - had
the data being normal etc. you would use a multiple regression/factorial
ANOVA . Any advice on this would be welcome as would any recommendations
for any text I could read that would give me info on this (I do have a
maths degree, but it is a bit 'rusty' so application rather than theory
would be preferred).
REPLIES:
(a) it would be worth having a look at Brunner, Domhof and Langer,
Nonparametric Analysis of Longitudinal Data in Factorial Experiments,
Wiley 2002 and also at the following:
1. Akritas MG, Arnold SF, Brunner E. Nonparametric hypotheses and rank
statistics for unbalanced factorial designs. Journal of the American
Statistical Association 1997;92(437):258-265.
and other papers by Akritas and Brunner as well as
2. Koch GG, Tangen CM, Jung JW, Amara IA. Issues for covariance analysis
of dichotomous and ordered categorical data from randomized clinical
trials and non- parametric strategies for addressing them. Statistics in
Medicine 1998;17(15-16):1863-1892.
3. Koch GG, Tangen C, Tudor G, Stokes ME. Strategies and Issues for the
Analysis of Ordered Categorical- Data from Multifactor Studies in
Industry - Discussion. Technometrics 1990;32(2):137-149.
and
4. Lesaffre E, Senn S. A note on non-parametric ANCOVA for covariate
adjustment in randomized clinical trials. Statistics in Medicine
2003;22(23):3583-3596.
for a correction to 2
QUESTION 3:
I have data, based upon a 5 point likert scale that I wish to analyse in
a manner akin to that of factorial ANOVA (actually that is ANCOVA) I
have a number of dichotomous independent variables and one co-variate
and I would like to know to what degree they influence the likert results.
How 'dangerous' is it to drop the likert data into a standard ANCOVA? If
it is 'possible' are there diagnostics to check that it has worked? or
is it just a really bad idea?
(a) if you're willing to accept that a unit change along the Likert
scale means roughly the same thing wherever you are positioned along the
scale, then an ANCOVA is a good option, equivalently fitted as a linear
regression. An assumption is that the errors are normally distributed in
the population; so you could check how consistent the residuals are with
a normal distribution. Secondly you could check another assumption, that
the variability of these residuals is constant across the range of the
predicted values from the model. If those are without problem, then you
might check that the assumed linear relationship between the Likert
outcome and the continuous covariate is not an other-than-linear
systematic relationship. If the sample size is pretty small, perhaps <20
then you may question whether you have enough data to reliably estimate
all of those paramters. You might also question the assumption of
normality, which is hard to check in small samples.
If assumptions are not met, or sample size is too small to allow them to
be assessed, one next option to consider might be dichotomising the
outcome variable and using logistic regression; provided this is
acceptable thing to do from an interpretation point of view, and if you
are not losing too much information by doing this; which you could
assess from the distribution of the Likert outcome (e.g. if distribution
is pretty much zero/redundant at either end of the scale and only two or
three of the outcome values are predominant). Another option may be to
use non-parametric methods between each independent variable in turn and
the Likert outcome (e.g. Chi-squared test for trend or Spearman
correlation coefficient) and if none or only one is significant to stop
there without need for a model with multiple predictors. Ordinal
response models are also available which I think you could find in a
stats journal article from a few years ago with M Campbell as a
co-author: Lall R, Campbell MJ, Walters SJ, Morgan K. A review of
ordinal regression models applied on health-related quality of life
assessments. Stat Methods Med Res. 2002 Feb;11(1):49-67.
(b) Usually not a terrible idea. However, if you have a five point scale
you might consider using a proportional odds (logistic regression for
categorical data) model. This is an extension of generalised linear
models and gives you the flexibility of incorporating factors and
covariates while respecting the ordinal nature of the data.
Run the analysis and look at the residuals. If they are distributed
unimodally and symmetrically, then the parametric procedure will give
fairly accurate results.
(c) I worry about this a lot. I haven't come across much in the way of
detailed exploration of the dangers of analysing Likert-scale data as
though it were interval data. A starting point (but also possibly the
existing terminus) is given in the following (flawed?) papers:
Rasmussen, J.L. (1989) Analysis of Likert-scale data: a
re-interpretation of Gregoire & Driver, Psych. Bull. 105(1) 167-170.
Grigoire, T.G. (1989) Analysis of Likert-scale data revisited. Psych.
Bull. 105(1), 171.
Gregoire, T.G. & Driver, B.L. (1987) Analysis of ordinal data to detect
population differences. Psych. Bull. 101(1) 159-165.
I haven't chased this for a couple of years. Obviously there are
alternatives, e.g. proportional odds/cumulative logit models, which can
prove more satisfactory in some situations. A clear difficulty for, for
example, paired t-tests for Likert-scale data, is that there does not
seem an obvious and simple chance mechanism to provide the null.
(d) First, the term "likert" can be traced to people naming stuff after
Rensis Likert, so it is a proper name. Even within psychological
disciplines there appear to be ding-dong debates on what qualifies as a
Likert scale, some views being much narrower than others. If you are
using the term as a label for an ordered categorical scale with integer
grades, then that is broader than many interpretations, and, also more
importantly, not a term that will be universally understood within
statistical sciences.
I don't have a strong feeling for what "danger" might mean here. A good
analysis for data of the form I think you have might well be an ordered
logit model. That said, I suspect that if you throw the data, as is,
into an ANCOVA, the scientific conclusions might well be very similar.
The best strategy in cases like this is often to try different methods
and see how much difference that makes. Otherwise you get into rather
anguished debates about what is valid and invalid, which often turn out
to be based on personal preferences and prejudices.
(e) There are ways to take ordinal scores and transform them into linear
measurement, given certain assumptions. If you key in Rasch into
MEDLINE, you will find 600+ articles using this methodology. The Rasch
model is a probabilistic model which operationalises the axioms of
additive conjoint measurement. Thus if the data from your scale meets
the model expectations, then a logit transformation provides you with a
metric which can be used in ANOVA and ANCOVA's, and the like, given
appropriate distributional properties. The 'assumptions' are that the
data meet model expectations,which includes unidimensionality to give a
valid summed score (Actually this still is required for ordinal data and
you can use Mokken scaling to test if you have achieved ordinal
Measurement).
However, if you have just a single Likert-style variable (and this was
not entirely clear in your message), then there is not much that can be
done with this approach, although the MEDLINE articles will provide you
with sufficient empirical evidence to show that all such items are
ordinal in nature. Faced with that situation I might try ordinal
regression, or collapse to use a binary logistic regression.
(f) You may or may not be aware that, in ordinary linear model analysis
such as would be used for an outcome measure on a continuous scale,
ANOVA and regression analysis are essentially the same, in that they
have the same form of underlying model, but the latter is used where the
explanatory variables are not designed-in in a balanced way. ANCOVA is
the most general case, where there are some design factors and some
non-design that need to be adjusted for. But wherever there is
imbalance and potential confounding or interaction between factors
(effect modification), the interpretation of results is likely to be
much more complicated than in the designed-experiment ANOVA case, and if
that's your situation you would be wise to involve an experienced
statistician.
The extent to which treating Likert-scale variables as if they were
continuous will give you misleading answers will depend on how many
categories are actually used, and how the responses are distributed
across the categories. In many cases, a linear approach will give
decent approximations. The standard diagnostic plots, e.g. residual v
fitted, may give some insight into the model fit.
Since you asked for help on "non-parametric" ANOVA, you may get a lot of
stuff that's not particularly relevant to your case. I think what you
need is the ability to use regression techniques on an ordered
categorical response, which is what Likert scales are. Probably the most
widely used now is that of McCullagh (see McCullagh and Nelder,
Generalized Linear Models), which I have found very useful. In
psychology, models originated by Rasch have been used for some time.
Obviously, you'll need access to software that can fit your chosen
model, but proper statistical packages should have no trouble with this.
Again, if you haven't used this kind of model I'd advise getting expert
help.
(g) This is something that routinely comes up here and I'd like to see
what the other Allstatters think. Personally I believe it to be a bad
idea unless your Likert scale fits the Rasch model and can be
demonstrated to act like a truly interval scale instrument. If there are
not equal distances between points on the scale, then I for one would
not be happy to trust the results of the AN(C)OVA.
--
Dr G.S.Clarke
Lecturer in Physiology & Biometry
Faculty of Health Studies
University of Wales, Bangor
Fron Heulog
Ffriddoedd Road
Bangor
Gwynedd LL57 2EF
Tel: 01248 383157
e-mail: [log in to unmask]
|