JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for ALLSTAT Archives


ALLSTAT Archives

ALLSTAT Archives


allstat@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

ALLSTAT Home

ALLSTAT Home

ALLSTAT  2005

ALLSTAT 2005

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

LONG: COMPILED REPLIES FROM LIST: to questions on Analysis of subgroups, Analysis of Ranked/Likert data

From:

"G.S.Clarke" <[log in to unmask]>

Reply-To:

G.S.Clarke

Date:

Wed, 30 Nov 2005 09:22:38 +0000

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (306 lines)

Dear All,

sorry for the long e-mail: several people asked me to distribute the 
answers I received to my three questions, these are given below.

Many thanks to all who took the time to give me assistance with these 
matters.

Regards

Graham

QUESTION 1.

I have a data set to analyse regarding a range of professionals' 
response to a questionnaire.  I have done PCA on all the results and 
then follow up investigation to see which professional groups differ 
from other professional groups with respect to the derived components.

However in looking to publish the results it is clear that some journals 
would be more interested in a subset of those professional groups.  So 
how to proceed?  I could use the PCs as derived from all the 
questionnaires to generate 'scores' for all individuals which I then 
analyse for the sub set of professions; or I could generate PCs derived 
only from individuals belonging to the sub set of professions and 
analyse those (these PCs are similar but not identical).

In doing further (effectively post-hoc) analysis on all the professions 
my subset professions are often placed in different 'homogeneous groups' 
however when I simply work with the sub set of professions then the 
analysis more frequently says that these subset professions belong to 
one 'homogeneous group' (presumably because of the reduction in overall 
sample size is reducing power).  (One of my sub set professions has only 
8 members compared to ~50, ~100 in the other two sub set professions; 
the overall - all professions - sample size is about 1200.)

So really, my question how to proceed with analysing the sub-set of 
professions.  What is the more effective/rigorous analysis in terms of 
statistics and also in terms of what editors of journals may accept.

Any advice/opinion/references would be welcomed

REPLIES:

(a) what is wrong with you deciding the appropriate analysis, not the 
'journal'? [but take advice from their peer reviewers of course] And 
discuss with a statistician face to face – more productive for initial 
consultations.

And what is wrong with publishing the results for each subset, 
indicating that the data was not collected for such a comparison?? [the 
fact that each individual profession is 'homogeneous' is of some 
interest, even if the information is indicative only and not 
'statistically significant' - and ideally you should compare/contrast 
with whether that is what would be expected from underlying theory or 
other evidence]

I suggest that you first consider how to plot/display the data both 
overall, and for each subgroup - "a picture is worth a thousand words" - 
ideally in a format that allows you to identify any outliers [as these 
will have strong effects particularly on v small samples]

You can then consider reporting the differences, whether these are 
statistically significant or not at the 5% level, stating what test you 
have used. And if your sample size is really too small for that, then 
simply say so [or gather a larger sample]. For statistical purposes 
though, it is generally considered that a sample subgroup size of 8 is v 
small for statistical testing, so you'll want to use a robust test.

There are several possible hypothesis [and thus tests] eg whether all 
groups are the same, pairwise tests between groups, whether particular 
groups have higher average scores than the 'general' population, whether 
the variance within each profession is the same etc. Hard to assess what 
the appropriate tests are [and/or whether these should be based on your 
PCA scores or on the underlying data] without knowledge of the data 
collection and results thus far.

[When considering making comparisons between a particular group and the 
'overall average' you may wish to consider if your sample is 
'representative' of the population in terms of the mix of those in each 
profession, and apply re-weighting as needed]

'Practical Non-parametric Statistics' [W.Conover] has various test that 
are more robust in the sense of not making parametric assumptions - any 
book on non-parametric statistics will cover some. There are lots of 
tests in '100 Statistical tests' [G Kanji] which may also help. The 
other ways to be more 'robust' is to test at the 1% level rather than 
the 5% level.

Another important point is that as part of the usual scientific method, 
you shouldn't really use the same data to both form the hypothesis and 
test the hypothesis. You can use the whole data set only if your 
hypothesis is derived from some other source [underlying theory, other 
previous evidence etc]. Otherwise you need to split your data in half in 
some way [random selection ideally], work out the hypothesis/hypotheses 
from 1/2 the data then test these using the other 1/2

On your point on length, maybe the solution is to draft a full paper on 
the complete findings and put it on [your?] a website, and in your 
journal articles mention the context/key findings in the opening 
paragraph or two as context, and provide a web reference for those who 
want to see the complete analysis. If the full results are interesting 
that provides access for those interested, whilst dealing with journal 
article length issues


QUESTION 2

In terms of ranking type data, simpler type analyses can be carried out 
'non parametrically' using techniques found in texts like siegal and 
castalan.  However I was wondering how to approach analyses that - had 
the data being normal etc. you would use a multiple regression/factorial 
ANOVA . Any advice on this would be welcome as would any recommendations 
for any text I could read that would give me info on this (I do have a 
maths degree, but it is a bit 'rusty' so application rather than theory 
would be preferred).


REPLIES:

(a) it would be worth having a look at Brunner, Domhof and Langer, 
Nonparametric Analysis of Longitudinal Data in Factorial Experiments, 
Wiley 2002 and also at the following:
1. Akritas MG, Arnold SF, Brunner E. Nonparametric hypotheses and rank 
statistics for unbalanced factorial designs. Journal of the American 
Statistical Association 1997;92(437):258-265.
and other papers by Akritas and Brunner as well as
2. Koch GG, Tangen CM, Jung JW, Amara IA. Issues for covariance analysis 
of dichotomous and ordered categorical data from randomized clinical 
trials and non- parametric strategies for addressing them. Statistics in 
Medicine 1998;17(15-16):1863-1892.
3. Koch GG, Tangen C, Tudor G, Stokes ME. Strategies and Issues for the 
Analysis of Ordered Categorical- Data from Multifactor Studies in 
Industry - Discussion. Technometrics 1990;32(2):137-149.
and
4. Lesaffre E, Senn S. A note on non-parametric ANCOVA for covariate 
adjustment in randomized clinical trials. Statistics in Medicine 
2003;22(23):3583-3596.
for a correction to 2


QUESTION 3:

I have data, based upon a 5 point likert scale that I wish to analyse in 
a manner akin to that of factorial ANOVA (actually that is ANCOVA) I 
have a number of dichotomous independent variables and one co-variate 
and I would like to know to what degree they influence the likert results.

How 'dangerous' is it to drop the likert data into a standard ANCOVA? If 
it is 'possible' are there diagnostics to check that it has worked? or 
is it just a really bad idea?

(a) if you're willing to accept that a unit change along the Likert 
scale means roughly the same thing wherever you are positioned along the 
scale, then an ANCOVA is a good option, equivalently fitted as a linear 
regression. An assumption is that the errors are normally distributed in 
the population; so you could check how consistent the residuals are with 
a normal distribution. Secondly you could check another assumption, that 
the variability of these residuals is constant across the range of the 
predicted values from the model. If those are without problem, then you 
might check that the assumed linear relationship between the Likert 
outcome and the continuous covariate is not an other-than-linear 
systematic relationship. If the sample size is pretty small, perhaps <20 
then you may question whether you have enough data to reliably estimate 
all of those paramters. You might also question the assumption of 
normality, which is hard to check in small samples.

If assumptions are not met, or sample size is too small to allow them to 
be assessed, one next option to consider might be dichotomising the 
outcome variable and using logistic regression; provided this is 
acceptable thing to do from an interpretation point of view, and if you 
are not losing too much information by doing this; which you could 
assess from the distribution of the Likert outcome (e.g. if distribution 
is pretty much zero/redundant at either end of the scale and only two or 
three of the outcome values are predominant). Another option may be to 
use non-parametric methods between each independent variable in turn and 
the Likert outcome (e.g. Chi-squared test for trend or Spearman 
correlation coefficient) and if none or only one is significant to stop 
there without need for a model with multiple predictors. Ordinal 
response models are also available which I think you could find in a 
stats journal article from a few years ago with M Campbell as a 
co-author: Lall R, Campbell MJ, Walters SJ, Morgan K. A review of
ordinal regression models applied on health-related quality of life 
assessments. Stat Methods Med Res. 2002 Feb;11(1):49-67.

(b) Usually not a terrible idea. However, if you have a five point scale 
you might consider using a proportional odds (logistic regression for 
categorical data) model. This is an extension of generalised linear 
models and gives you the flexibility of incorporating factors and 
covariates while respecting the ordinal nature of the data.

Run the analysis and look at the residuals. If they are distributed 
unimodally and symmetrically, then the parametric procedure will give 
fairly accurate results.

(c)  I worry about this a lot. I haven't come across much in the way of 
detailed exploration of the dangers of analysing Likert-scale data as 
though it were interval data. A starting point (but also possibly the 
existing terminus) is given in the following (flawed?) papers:

Rasmussen, J.L. (1989) Analysis of Likert-scale data: a 
re-interpretation of Gregoire & Driver, Psych. Bull. 105(1) 167-170.

Grigoire, T.G. (1989) Analysis of Likert-scale data revisited. Psych. 
Bull. 105(1), 171.

Gregoire, T.G. & Driver, B.L. (1987) Analysis of ordinal data to detect 
population differences. Psych. Bull. 101(1) 159-165.

I haven't chased this for a couple of years. Obviously there are 
alternatives, e.g. proportional odds/cumulative logit models, which can 
prove more satisfactory in some situations. A clear difficulty for, for 
example, paired t-tests for Likert-scale data, is that there does not 
seem an obvious and simple chance mechanism to provide the null.

(d) First, the term "likert" can be traced to people naming stuff after 
Rensis Likert, so it is a proper name. Even within psychological 
disciplines there appear to be ding-dong debates on what qualifies as a 
Likert scale,  some views being much narrower than others. If you are 
using the term as a  label for an ordered categorical scale with integer 
grades,  then that is broader than many interpretations, and, also more 
importantly, not a term that will be universally understood within 
statistical sciences.

I don't have a strong feeling for what "danger" might mean here. A good 
analysis for data of the form I think you have might well be an ordered 
logit model. That said, I suspect that if you throw the data, as is, 
into an ANCOVA, the scientific conclusions might well be very similar.

The best strategy in cases like this is often to try different methods 
and see how much difference that makes. Otherwise you get into rather 
anguished debates about what is valid and invalid, which often turn out 
to be based on personal preferences and prejudices.

(e) There are ways to take ordinal scores and transform them into linear 
measurement, given certain assumptions. If you key in Rasch into 
MEDLINE, you will find 600+ articles using this methodology. The Rasch 
model is a probabilistic model which operationalises the axioms of 
additive conjoint measurement. Thus if the data from your scale meets 
the model expectations, then a logit transformation provides you with a 
metric which can be used in ANOVA and ANCOVA's, and the like, given 
appropriate distributional properties. The 'assumptions' are that the 
data meet model expectations,which includes unidimensionality to give a 
valid summed score (Actually this still is required for ordinal data and 
you can use Mokken scaling to test if you have achieved ordinal 
Measurement).

However, if you have just a single Likert-style variable (and this was 
not entirely clear in your message), then there is not much that can be 
done with this approach, although the MEDLINE articles will provide you 
with sufficient empirical evidence to show that all such items are 
ordinal in nature. Faced with that situation I might try ordinal 
regression, or collapse to use a binary logistic regression.

(f) You may or may not be aware that, in ordinary linear model analysis 
such as would be used for an outcome measure on a continuous scale, 
ANOVA and regression analysis are essentially the same, in that they 
have the same form of underlying model, but the latter is used where the 
explanatory variables are not designed-in in a balanced way.  ANCOVA is 
the most general case, where there are some design factors and some 
non-design that need to be adjusted for.  But wherever there is 
imbalance and potential confounding or interaction between factors 
(effect modification), the interpretation of results is likely to be 
much more complicated than in the designed-experiment ANOVA case, and if 
that's your situation you would be wise to involve an experienced 
statistician.

The extent to which treating Likert-scale variables as if they were 
continuous will give you misleading answers will depend on how many 
categories are actually used, and how the responses are distributed 
across the categories.  In many cases, a linear approach will give 
decent approximations.  The standard diagnostic plots, e.g. residual v 
fitted, may give some insight into the model fit.

Since you asked for help on "non-parametric" ANOVA, you may get a lot of 
stuff that's not particularly relevant to your case.  I think what you 
need is the ability to use  regression techniques on an ordered 
categorical response, which is what Likert scales are. Probably the most 
widely used now is that of McCullagh (see McCullagh and Nelder,
Generalized Linear Models), which I have found very useful.  In 
psychology, models originated by Rasch have been used for some time. 
Obviously, you'll need access to software that can fit your chosen 
model, but proper statistical packages should have no trouble with this. 
Again, if you haven't used this kind of model I'd advise getting expert
help.

(g)  This is something that routinely comes up here and I'd like to see 
what the other Allstatters think. Personally I believe it to be a bad 
idea unless your Likert scale fits the Rasch model and can be 
demonstrated to act like a truly interval scale instrument. If there are 
not equal distances between points on the scale, then I for one would 
not be happy to trust the results of the AN(C)OVA.

-- 
Dr G.S.Clarke
Lecturer in Physiology & Biometry
Faculty of Health Studies
University of Wales, Bangor
Fron Heulog
Ffriddoedd Road
Bangor
Gwynedd LL57 2EF

Tel: 	01248 383157
e-mail: [log in to unmask]

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

May 2024
April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000
1999
1998


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager