Dear all, Thanks you for all your replies to my query (see below). When referring to modelling, some people spoke of the issue regarding inclusion of ordinal variables in a regression model. I know that there have been many discussions on the use of ordinal variables in a regression....One major point is that we can't say for certain that the distance between,say, levels 1 and 2 is half the distance between levels 1 and 3 etc. Usually, categorical independent variables with i categories are entered into the regression as i-1 dummy binary variables. However, I have seem texts/papers which give examples of ordinal variables being entered into the multiple regression. For example, Lewis-Beck (1993) "Regression Analayis Vol 2" gives a discussion detailing the inclusion of an ordinal variable in a multiple regression (as an independent variable)versus the inclusion of the associated dummy variables . I have seen also ordinal logistic regression examples which have a subjective variable as the response (categories of back pain 1=worse,....,6=complete relief) e.g. "Anderson & Philips - Regression, Discrimination and Measurement Models for Ordered Categorical Variables, Applied Statistics, 1981, Vol 30, pp22-31", this same paper also includes an ordinal subjective independent variable (pain change categories 1=getting better....3=worse) in the model as an alternative to using dummy binary variables. I have a few points: 1)Does anyone know of any other references where we have a model (be it multiple regression or logistic regression) which uses a *subjective ordinal* variable as an independent variable? 2) A few of your replies suggest that subjective responses should never be averaged. Why is this? Is it because of the reason I give above e.g. we can't say for certain that the distance between levels 1 and 2 is half the distance between levels 1 and 3 etc.? I can see your point but I'm a little confused as I have seen examples of Conjoint analysis using the 'aggregate' model whereby respondents' subjective responses are averaged to form an 'aggregate' response. Have you any views on this? 3) On another topic, if I had subjective responses of the form 1=strongly disagree...5=strongly agree for n people and I wanted to do a cluster analysis to group individuals, which form of distance measure and linkage measure would be most appropriate? I assume that I would have to try several combinations of linkage method and distance measure and compare results? Many thanks, Kim. ***************************************************** Hello all. A quick query. My data is comprised of some subjective and some objective variables. Can models be constructed which contain subjective variables both as the predictors and response? Can subjective variables be used in standard non parametric tests (e.g. Wilcoxon Matched pairs, Friedman 2 way ANOVA, Mann Whitney, Kruskal Wallis, Phi coeeficient etc)? Many thanks, Kim. ************************************************************ Replies Before I leap off in the abyss of total ignorance, let me ask you a few questions for clarification. Fact: the terms 'subjective' and 'objective' do not fully explain the significance and usefulness of the item 'measured.' Better, think of an 'objective' measure as one which has little variation, relative to the value reported. a ruler marked in 0.5 cm, for example, has a stdev of about 0.1 cm, between people measuring. If you are measuring a distance of 3 m (300 cm), this degree of 'error' is usually acceptable. Think of a 'subjective' measure as one with a lot of variation between reporters. this is usually because the different people involved do not have a common scale to use. I report the product finish is 2 on a scale of 1-5 (i.e., reject able), and the sales person reports it as 4 (ship the puppy!). We are looking at the same item, but we have different concerns and these lead to different locations on the scale. I'm sure you can think of similar situations in your technical area. the doctor may report that the drug reduced the symptoms, while the patient still doesn't feel 'cured.' Now, if you rephrase your question into 'data with low coefficient of variation' and 'data with high coefficient of variation,' does it still make sense? Perhaps by 'subjective' you mean an ordinal or even binomial scale. - 'the patient died,' or 'the patient lived.' I would urge you to refine the measurement so that everything winds up on a more - or - less linear scale. You have to get serious about what you are measuring - the example of surface finish above started out with a print spec, "surface finish of castings shall be acceptable." Does this help as much as it confuses? ********************************* Hello All Just as a follow on to my email yesterday.... The variables I have are subjective in the sense that they ask the respondents beliefs on an issue and the respondent has a choice of, for example, 1=strongly disagree...5=strongly agree. The objective questions are questions like 'how many employees do you have?'. Many thanks for your time, All the best, Kim. ***************************************** Hi Kim, The subjective variables are subjective in a conceptual sense. In a statistical sense they may be dependent (or sometimes even independent). And, yes, nonparametric statistics apply to them in particular if their scaling is only (approximately) ordinal or nominal. If subjective variables would not apply to nonpar stats, which variables then would? ********************************************* Kim my response would be 'why not?' your "subjective" variables measure attitudes and there may be variability in the data because one individual's interpretation of the question and the categories can be different from another's. But this is not the only source of variability. similarly even for 'objective' variables such as numbers of employees you 'have', this can be interpreted differently by different people [headcount vs full time equivalent, whether sub-contractors, temporary staff and consultants included, whether average over time or a point in time snapshot] all of which can cause variability in the data but if there is a relationship between, say, a variable measuring a chief executive's views on the extent to which it is important to treat staff fairly, and the proportion of staff in the firm who get dismissed during a downturn, then there seems no obvious reason not to collect data to try to describe and model that hypothesised phenomenon. Similarly you may well find correlations between different atttitudes eg between liking the content of the Guardian and holding particular political beliefs. The real consideration is how much variability there is in your variables, whether they accurately record the underlying attitudes or latent variables, and therefore what any relationship found actually means. A real concern would be to make sure the "subjective" variables used as response and predictor dont actually measure the same thing. One other important to remember to base any hypothesis on one data set and test using another [if necessary split data in two subsets]. Other point is that identifying the patterns in the data, relationships and clusters is far more important than carrying out formal tests. You may also want to try converting categorical data [certainly where the categories are ordered] into binary data [eg agrees/disagrees] and try logit or probit regression or multi-level modelling http://multilevel.ioe.ac.uk/intro/index.html hope this helps regards **************************************************** My comments from yesterday still apply (I think!). a) The responses to your 'subjective' questions will be better if each respondent perceives the statment in a similar fashion. But you knew that. In the US right now, how you state something about 'gay marriage' can be very sensitive. Today I had someone tell me, almost spontaneously, that they rated 'this gay marriage thing' very high as a concern. Well, in WI, we recently had our legislature posturing for a week or two, with no end result. My respondent thought the whole charade was a waste of time, and "everybody should be left alone." I.e., gay marriage was fine by her. b) Even if the respondents perceive the statement similarly, one person's 'strongly agree' is another's 'tepid agree.' If you could measure the level of emotional response, perhaps you would see that difference easily. Or not - we can't tell yet. the difference between emotional involvement level and stated response is a variance, and usually a large one. c) Once you accept that the 1 - 5 scale is approximately an interval scale, you can go ahead and use it like any other interval scale - average, stdev, and (if assumed approx Normal) subject to multiple regression analysis. If you insist that it is an ordinal scale, then don't go reporting any averages or stdevs, OK? d) an arcsin transform may help achieve a more Normal dist., but you have to trust the data validity first. ***************************************** Kim, Your subjective variables can be regarded as being measured on an ordinal scale of measurement. Therefore, any use of these variables that is valid for such variables is OK. Thus the nonparametric tests that you mention could legitimately be used. As for the construction of models, if you were thinking of something like a regression model, doesn't this depend on the data being on at least an interval scale of measurement? If so, then such an application would not, strictly speaking, be valid for ordinal data such as yours.