JISCMail - ALLSTAT Archives

Dear all,

Thanks you for all your replies to my query (see below).

When referring to modelling, some people spoke of the issue regarding
inclusion of ordinal variables in a regression model. I know that there
have been many discussions on the use of ordinal variables in a
regression....One major point is that we can't say for certain that the
distance between,say, levels 1 and 2 is half the distance between levels
1 and 3 etc. Usually, categorical independent variables with i
categories are entered into the regression as i-1 dummy binary
variables.  However, I have seem texts/papers which give examples of
ordinal variables being entered into the multiple regression. For
example, Lewis-Beck (1993) "Regression Analayis Vol 2" gives a
discussion detailing the inclusion of an ordinal variable in a multiple
regression (as an independent variable)versus the inclusion of the
associated dummy variables .

I have seen also ordinal logistic regression  examples which have a
subjective variable as the response (categories of back pain
1=worse,....,6=complete relief) e.g. "Anderson & Philips - Regression,
Discrimination and Measurement Models for Ordered Categorical Variables,
Applied Statistics, 1981, Vol 30, pp22-31", this same paper also
includes an ordinal subjective independent variable (pain change
categories 1=getting better....3=worse) in the model  as an alternative
to using dummy binary variables.

I have a few points:

1)Does anyone know of any other references where we have a model (be it
multiple regression or logistic regression) which uses a *subjective
ordinal* variable as an independent variable?  

2) A few of your replies suggest that subjective responses should never
be averaged.  Why is this?  Is it because of the reason I give above
e.g. we can't say for certain that the distance between levels 1 and 2
is half the distance between levels 1 and 3 etc.?  I can see your point
but I'm a little confused as I have seen examples of Conjoint analysis
using the 'aggregate' model whereby respondents' subjective responses
are averaged to form an 'aggregate' response.    Have you any views on
this?

3) On another topic, if I had subjective responses of the form 
1=strongly disagree...5=strongly agree for n people and I wanted to do a
cluster analysis to group individuals, which form of distance measure
and linkage measure would be most appropriate?  I assume  that I would
have to try several combinations of linkage method and distance measure
and compare results?


Many thanks,
Kim.

*****************************************************
Hello all.

A quick query.  My data is comprised of some subjective and some
objective variables.  Can models be constructed which contain subjective
variables both as the predictors and response?

Can subjective variables be used in standard non parametric tests (e.g.
Wilcoxon Matched pairs, Friedman 2 way ANOVA, Mann Whitney, Kruskal
Wallis, Phi coeeficient etc)?

Many thanks,
Kim.
************************************************************
Replies

Before I leap off in the abyss of total ignorance, let me ask you a few 
questions for clarification.

Fact:  the terms 'subjective' and 'objective' do not fully explain the 
significance and usefulness of the item 'measured.'  Better, think of an

'objective' measure as one which has little variation, relative to the 
value reported.  a ruler marked in 0.5 cm, for example, has a stdev of 
about 0.1 cm, between people measuring.  If you are measuring a distance

of 3 m (300 cm), this degree of 'error' is usually acceptable.

Think of a 'subjective' measure as one with a lot of variation between 
reporters.  this is usually because the different people involved do not

have a common scale to use.  I report the product finish is 2 on a scale

of 1-5  (i.e., reject able), and the sales person reports it as 4 (ship 
the puppy!).  We are looking at the same item, but we have different 
concerns and these lead to different locations on the scale.

I'm sure you can think of similar situations in your technical area. 
 the doctor may report that the drug reduced the symptoms, while the 
patient still doesn't feel 'cured.'

Now, if you rephrase your question into 'data with low coefficient of 
variation' and 'data with high coefficient of variation,' does it still 
make sense?

Perhaps by 'subjective' you mean an ordinal or even binomial scale.  - 
'the patient died,' or 'the patient lived.'  I would urge you to refine 
the measurement so that everything winds up on a more - or - less linear

scale.  You have to get serious about what you are measuring - the 
example of surface finish above started out with a print spec, "surface 
finish of castings  shall be acceptable."

Does this help as much as it confuses?
*********************************
Hello All

Just as a follow on to my email yesterday....

The variables I have are subjective in the sense that they ask the
respondents beliefs on an issue and the respondent has a choice of, for
example, 1=strongly disagree...5=strongly agree.

The objective questions are questions like 'how many employees do you
have?'.

Many thanks for your time, 
All the best,
Kim.
*****************************************
Hi Kim,

The subjective variables are subjective in a conceptual sense. In a
statistical sense they may be dependent (or sometimes even independent).
And, yes, nonparametric statistics apply to them in particular if their
scaling is only (approximately) ordinal or nominal. If subjective
variables would not apply to nonpar stats, which variables then would?
*********************************************
Kim

my response would be 'why not?'

your "subjective" variables measure attitudes and there may be
variability in the data because one individual's interpretation of the
question and the categories can be different from another's. But this is
not the only source of variability.

similarly even for 'objective' variables such as numbers of employees
you 'have', this can be interpreted differently by different people
[headcount vs full time equivalent, whether sub-contractors, temporary
staff and consultants included, whether average over time or a point in
time snapshot] all of which can cause variability in the data

but if there is a relationship between, say, a variable measuring a
chief executive's views on the extent to which it is important to treat
staff fairly, and the proportion of staff in the firm who get dismissed
during a downturn, then there seems no obvious reason not to collect
data to try to describe and model that hypothesised phenomenon.

Similarly you may well find correlations between different atttitudes eg
between liking the content of the Guardian and holding particular
political beliefs. The real consideration is how much variability there
is in your variables, whether they accurately record the underlying
attitudes or latent variables, and therefore what any relationship found
actually means. A real concern would be to make sure the "subjective"
variables used as response and predictor dont actually measure the same
thing.

One other important to remember to base any hypothesis on one data set
and test using another [if necessary split data in two subsets]. Other
point is that identifying the patterns in the data, relationships and
clusters is far more important than carrying out formal tests.

You may also want to try converting categorical data [certainly where
the categories are ordered] into binary data [eg agrees/disagrees] and
try logit or probit regression or multi-level modelling
http://multilevel.ioe.ac.uk/intro/index.html

hope this helps

regards
****************************************************
My comments from yesterday still apply (I think!).

a)    The responses to your 'subjective' questions will be better if 
each respondent perceives the statment in a similar fashion.  But you 
knew that.  In the US right now, how you state something about 'gay 
marriage' can be very sensitive.  Today I had someone tell me, almost 
spontaneously, that they rated 'this gay marriage thing' very high as a 
concern.  Well, in WI, we recently had our legislature posturing for a 
week or two, with no end result.  My respondent thought the whole 
charade was a waste of time, and "everybody should be left alone." 
 I.e., gay marriage was fine by her.

b)    Even if the respondents perceive the statement similarly, one 
person's 'strongly agree' is another's 'tepid agree.'  If you could 
measure the level of emotional response, perhaps you would see that 
difference easily.  Or not - we can't tell yet.  the difference between 
emotional involvement level and stated response is a variance, and 
usually a large one.

c)    Once you accept that the 1 - 5 scale is approximately an interval 
scale, you can go ahead and use it like any other interval scale - 
average, stdev, and (if assumed approx Normal) subject to multiple 
regression analysis.  If you insist that it is an ordinal scale, then 
don't go reporting any averages or stdevs, OK?

d)    an arcsin transform may help achieve a more Normal dist., but you 
have to trust the data validity first.

*****************************************
Kim,

Your subjective variables can be regarded as being measured on an
ordinal scale of measurement. Therefore, any use of these variables that
is valid for 
such variables is OK. Thus the nonparametric tests that you mention
could legitimately be used. 

As for the construction of models, if you were thinking of something
like a regression model, doesn't this depend on the data being on at
least an interval scale of measurement? If so, then such an application
would not, strictly speaking, be valid for ordinal data such as yours.