JISCMail - ALLSTAT Archives

Email discussion lists for the UK Education and Research communities
Subscriber's Corner
Email Lists
ALLSTAT Archives

allstat@JISCMAIL.AC.UK

View:

Message:
[
First
Last
]
By Topic:
[
First
Last
]
By Author:
[
First
Last
]
Font:
Proportional Font
		LISTSERV Archives
		ALLSTAT Home
		ALLSTAT 2005
Options

Subscribe or Unsubscribe
Get Password
Subject:
LONG: COMPILED REPLIES FROM LIST: to questions on Analysis of subgroups, Analysis of Ranked/Likert data
From:
"G.S.Clarke" <[log in to unmask]>
Reply-To:
G.S.Clarke
Date:
Wed, 30 Nov 2005 09:22:38 +0000
Content-Type:
text/plain
Parts/Attachments:
text/plain (306 lines)
Dear All,

sorry for the long e-mail: several people asked me to distribute the 
answers I received to my three questions, these are given below.

Many thanks to all who took the time to give me assistance with these 
matters.

Regards

Graham

QUESTION 1.

I have a data set to analyse regarding a range of professionals' 
response to a questionnaire.  I have done PCA on all the results and 
then follow up investigation to see which professional groups differ 
from other professional groups with respect to the derived components.

However in looking to publish the results it is clear that some journals 
would be more interested in a subset of those professional groups.  So 
how to proceed?  I could use the PCs as derived from all the 
questionnaires to generate 'scores' for all individuals which I then 
analyse for the sub set of professions; or I could generate PCs derived 
only from individuals belonging to the sub set of professions and 
analyse those (these PCs are similar but not identical).

In doing further (effectively post-hoc) analysis on all the professions 
my subset professions are often placed in different 'homogeneous groups' 
however when I simply work with the sub set of professions then the 
analysis more frequently says that these subset professions belong to 
one 'homogeneous group' (presumably because of the reduction in overall 
sample size is reducing power).  (One of my sub set professions has only 
8 members compared to ~50, ~100 in the other two sub set professions; 
the overall - all professions - sample size is about 1200.)

So really, my question how to proceed with analysing the sub-set of 
professions.  What is the more effective/rigorous analysis in terms of 
statistics and also in terms of what editors of journals may accept.

Any advice/opinion/references would be welcomed

REPLIES:

(a) what is wrong with you deciding the appropriate analysis, not the 
'journal'? [but take advice from their peer reviewers of course] And 
discuss with a statistician face to face – more productive for initial 
consultations.

And what is wrong with publishing the results for each subset, 
indicating that the data was not collected for such a comparison?? [the 
fact that each individual profession is 'homogeneous' is of some 
interest, even if the information is indicative only and not 
'statistically significant' - and ideally you should compare/contrast 
with whether that is what would be expected from underlying theory or 
other evidence]

I suggest that you first consider how to plot/display the data both 
overall, and for each subgroup - "a picture is worth a thousand words" - 
ideally in a format that allows you to identify any outliers [as these 
will have strong effects particularly on v small samples]

You can then consider reporting the differences, whether these are 
statistically significant or not at the 5% level, stating what test you 
have used. And if your sample size is really too small for that, then 
simply say so [or gather a larger sample]. For statistical purposes 
though, it is generally considered that a sample subgroup size of 8 is v 
small for statistical testing, so you'll want to use a robust test.

There are several possible hypothesis [and thus tests] eg whether all 
groups are the same, pairwise tests between groups, whether particular 
groups have higher average scores than the 'general' population, whether 
the variance within each profession is the same etc. Hard to assess what 
the appropriate tests are [and/or whether these should be based on your 
PCA scores or on the underlying data] without knowledge of the data 
collection and results thus far.

[When considering making comparisons between a particular group and the 
'overall average' you may wish to consider if your sample is 
'representative' of the population in terms of the mix of those in each 
profession, and apply re-weighting as needed]

'Practical Non-parametric Statistics' [W.Conover] has various test that 
are more robust in the sense of not making parametric assumptions - any 
book on non-parametric statistics will cover some. There are lots of 
tests in '100 Statistical tests' [G Kanji] which may also help. The 
other ways to be more 'robust' is to test at the 1% level rather than 
the 5% level.

Another important point is that as part of the usual scientific method, 
you shouldn't really use the same data to both form the hypothesis and 
test the hypothesis. You can use the whole data set only if your 
hypothesis is derived from some other source [underlying theory, other 
previous evidence etc]. Otherwise you need to split your data in half in 
some way [random selection ideally], work out the hypothesis/hypotheses 
from 1/2 the data then test these using the other 1/2

On your point on length, maybe the solution is to draft a full paper on 
the complete findings and put it on [your?] a website, and in your 
journal articles mention the context/key findings in the opening 
paragraph or two as context, and provide a web reference for those who 
want to see the complete analysis. If the full results are interesting 
that provides access for those interested, whilst dealing with journal 
article length issues


QUESTION 2

In terms of ranking type data, simpler type analyses can be carried out 
'non parametrically' using techniques found in texts like siegal and 
castalan.  However I was wondering how to approach analyses that - had 
the data being normal etc. you would use a multiple regression/factorial 
ANOVA . Any advice on this would be welcome as would any recommendations 
for any text I could read that would give me info on this (I do have a 
maths degree, but it is a bit 'rusty' so application rather than theory 
would be preferred).


REPLIES:

(a) it would be worth having a look at Brunner, Domhof and Langer, 
Nonparametric Analysis of Longitudinal Data in Factorial Experiments, 
Wiley 2002 and also at the following:
1. Akritas MG, Arnold SF, Brunner E. Nonparametric hypotheses and rank 
statistics for unbalanced factorial designs. Journal of the American 
Statistical Association 1997;92(437):258-265.
and other papers by Akritas and Brunner as well as
2. Koch GG, Tangen CM, Jung JW, Amara IA. Issues for covariance analysis 
of dichotomous and ordered categorical data from randomized clinical 
trials and non- parametric strategies for addressing them. Statistics in 
Medicine 1998;17(15-16):1863-1892.
3. Koch GG, Tangen C, Tudor G, Stokes ME. Strategies and Issues for the 
Analysis of Ordered Categorical- Data from Multifactor Studies in 
Industry - Discussion. Technometrics 1990;32(2):137-149.
and
4. Lesaffre E, Senn S. A note on non-parametric ANCOVA for covariate 
adjustment in randomized clinical trials. Statistics in Medicine 
2003;22(23):3583-3596.
for a correction to 2


QUESTION 3:

I have data, based upon a 5 point likert scale that I wish to analyse in 
a manner akin to that of factorial ANOVA (actually that is ANCOVA) I 
have a number of dichotomous independent variables and one co-variate 
and I would like to know to what degree they influence the likert results.

How 'dangerous' is it to drop the likert data into a standard ANCOVA? If 
it is 'possible' are there diagnostics to check that it has worked? or 
is it just a really bad idea?

(a) if you're willing to accept that a unit change along the Likert 
scale means roughly the same thing wherever you are positioned along the 
scale, then an ANCOVA is a good option, equivalently fitted as a linear 
regression. An assumption is that the errors are normally distributed in 
the population; so you could check how consistent the residuals are with 
a normal distribution. Secondly you could check another assumption, that 
the variability of these residuals is constant across the range of the 
predicted values from the model. If those are without problem, then you 
might check that the assumed linear relationship between the Likert 
outcome and the continuous covariate is not an other-than-linear 
systematic relationship. If the sample size is pretty small, perhaps <20 
then you may question whether you have enough data to reliably estimate 
all of those paramters. You might also question the assumption of 
normality, which is hard to check in small samples.

If assumptions are not met, or sample size is too small to allow them to 
be assessed, one next option to consider might be dichotomising the 
outcome variable and using logistic regression; provided this is 
acceptable thing to do from an interpretation point of view, and if you 
are not losing too much information by doing this; which you could 
assess from the distribution of the Likert outcome (e.g. if distribution 
is pretty much zero/redundant at either end of the scale and only two or 
three of the outcome values are predominant). Another option may be to 
use non-parametric methods between each independent variable in turn and 
the Likert outcome (e.g. Chi-squared test for trend or Spearman 
correlation coefficient) and if none or only one is significant to stop 
there without need for a model with multiple predictors. Ordinal 
response models are also available which I think you could find in a 
stats journal article from a few years ago with M Campbell as a 
co-author: Lall R, Campbell MJ, Walters SJ, Morgan K. A review of
ordinal regression models applied on health-related quality of life 
assessments. Stat Methods Med Res. 2002 Feb;11(1):49-67.

(b) Usually not a terrible idea. However, if you have a five point scale 
you might consider using a proportional odds (logistic regression for 
categorical data) model. This is an extension of generalised linear 
models and gives you the flexibility of incorporating factors and 
covariates while respecting the ordinal nature of the data.

Run the analysis and look at the residuals. If they are distributed 
unimodally and symmetrically, then the parametric procedure will give 
fairly accurate results.

(c)  I worry about this a lot. I haven't come across much in the way of 
detailed exploration of the dangers of analysing Likert-scale data as 
though it were interval data. A starting point (but also possibly the 
existing terminus) is given in the following (flawed?) papers:

Rasmussen, J.L. (1989) Analysis of Likert-scale data: a 
re-interpretation of Gregoire & Driver, Psych. Bull. 105(1) 167-170.

Grigoire, T.G. (1989) Analysis of Likert-scale data revisited. Psych. 
Bull. 105(1), 171.

Gregoire, T.G. & Driver, B.L. (1987) Analysis of ordinal data to detect 
population differences. Psych. Bull. 101(1) 159-165.

I haven't chased this for a couple of years. Obviously there are 
alternatives, e.g. proportional odds/cumulative logit models, which can 
prove more satisfactory in some situations. A clear difficulty for, for 
example, paired t-tests for Likert-scale data, is that there does not 
seem an obvious and simple chance mechanism to provide the null.

(d) First, the term "likert" can be traced to people naming stuff after 
Rensis Likert, so it is a proper name. Even within psychological 
disciplines there appear to be ding-dong debates on what qualifies as a 
Likert scale,  some views being much narrower than others. If you are 
using the term as a  label for an ordered categorical scale with integer 
grades,  then that is broader than many interpretations, and, also more 
importantly, not a term that will be universally understood within 
statistical sciences.

I don't have a strong feeling for what "danger" might mean here. A good 
analysis for data of the form I think you have might well be an ordered 
logit model. That said, I suspect that if you throw the data, as is, 
into an ANCOVA, the scientific conclusions might well be very similar.

The best strategy in cases like this is often to try different methods 
and see how much difference that makes. Otherwise you get into rather 
anguished debates about what is valid and invalid, which often turn out 
to be based on personal preferences and prejudices.

(e) There are ways to take ordinal scores and transform them into linear 
measurement, given certain assumptions. If you key in Rasch into 
MEDLINE, you will find 600+ articles using this methodology. The Rasch 
model is a probabilistic model which operationalises the axioms of 
additive conjoint measurement. Thus if the data from your scale meets 
the model expectations, then a logit transformation provides you with a 
metric which can be used in ANOVA and ANCOVA's, and the like, given 
appropriate distributional properties. The 'assumptions' are that the 
data meet model expectations,which includes unidimensionality to give a 
valid summed score (Actually this still is required for ordinal data and 
you can use Mokken scaling to test if you have achieved ordinal 
Measurement).

However, if you have just a single Likert-style variable (and this was 
not entirely clear in your message), then there is not much that can be 
done with this approach, although the MEDLINE articles will provide you 
with sufficient empirical evidence to show that all such items are 
ordinal in nature. Faced with that situation I might try ordinal 
regression, or collapse to use a binary logistic regression.

(f) You may or may not be aware that, in ordinary linear model analysis 
such as would be used for an outcome measure on a continuous scale, 
ANOVA and regression analysis are essentially the same, in that they 
have the same form of underlying model, but the latter is used where the 
explanatory variables are not designed-in in a balanced way.  ANCOVA is 
the most general case, where there are some design factors and some 
non-design that need to be adjusted for.  But wherever there is 
imbalance and potential confounding or interaction between factors 
(effect modification), the interpretation of results is likely to be 
much more complicated than in the designed-experiment ANOVA case, and if 
that's your situation you would be wise to involve an experienced 
statistician.

The extent to which treating Likert-scale variables as if they were 
continuous will give you misleading answers will depend on how many 
categories are actually used, and how the responses are distributed 
across the categories.  In many cases, a linear approach will give 
decent approximations.  The standard diagnostic plots, e.g. residual v 
fitted, may give some insight into the model fit.

Since you asked for help on "non-parametric" ANOVA, you may get a lot of 
stuff that's not particularly relevant to your case.  I think what you 
need is the ability to use  regression techniques on an ordered 
categorical response, which is what Likert scales are. Probably the most 
widely used now is that of McCullagh (see McCullagh and Nelder,
Generalized Linear Models), which I have found very useful.  In 
psychology, models originated by Rasch have been used for some time. 
Obviously, you'll need access to software that can fit your chosen 
model, but proper statistical packages should have no trouble with this. 
Again, if you haven't used this kind of model I'd advise getting expert
help.

(g)  This is something that routinely comes up here and I'd like to see 
what the other Allstatters think. Personally I believe it to be a bad 
idea unless your Likert scale fits the Rasch model and can be 
demonstrated to act like a truly interval scale instrument. If there are 
not equal distances between points on the scale, then I for one would 
not be happy to trust the results of the AN(C)OVA.

-- 
Dr G.S.Clarke
Lecturer in Physiology & Biometry
Faculty of Health Studies
University of Wales, Bangor
Fron Heulog
Ffriddoedd Road
Bangor
Gwynedd LL57 2EF

Tel: 	01248 383157
e-mail: [log in to unmask]
Top of Message | Previous Page | Permalink
JiscMail Tools

Files Area | help
RSS Feeds and Sharing

Search Archives

Advanced Options