Dear listmembers, some time ago I asked about information concerning the assumptions of statistical analytical methods and the level of the data. Thanks for responding to my question, below you find a summary. I didn't have time yet to study every suggested article, so I might get back to this topic in the future..... Gabry Ruth Helm: May I suggest you read Streiner (1995) "Health Measurement Scales" Oxford, Oxford Medical Press. Page 38. Knut M. Wittkowski: Except for a few special cases Spearman/Pearson correlation coefficient Friedman/ANOVA applying methods based on the linear model (interval/absolute scale) cannot be meaningfully applied to ranks. You may want to follow the discussion rejuvenated by the seemingly easy approach advocated by Conover, WJ; Iman, R (1981) Rank transformations as a bridge between parametric and nonparametric statistics. Am Statist 35: 124-134 which is still recommended by SAS, see http://www.bio.ri.ccf.org/docs/ASA/plunch.html September 9, 1998 Is Rank Transformation Method a bad idea?: Guang-Hwa "Andy" Chang, Ph.D., Youngstown State University The rank transformation (RT) refers to the replacement of data by their ranks, with a subsequent analysis using the usual normal theory procedure, but calculated on the ranks rather than on the original data. This idea was originally suggested by Lemmer and Stoker (1967) and advocated by Conover and Iman. The availability of statistical packages for parametric tests makes the rank transformation method appealing. SAS had also added this option in their package. However, Blair, Sawilowsky and Higgins (1987) showed that, for 4x3 factorial designs, a severe inflation in Type I error of the RT statistics for testing interaction is observed as either the cell size becomes large or the row and column main effects are large. It was a huge disappointment. Is the rank transformation method a bad idea? Some research results after the simulation study by Blair et al. will be presented in this talk. See also: Akritas, MG; Arnold, SF; Brunner, E (1997) Nonparametric hypotheses and rank statistics for unbalanced factorial designs. Journal of the American Statistical Association 92: 258-265 Haas, CN (1999) On modeling correlated random variables in risk assessment. Risk Anal 19: 1205-1214 Osher Doctorow: I do not represent the majority opinion in statistics/probability, but a small minority opinion. That might actually be enough justification for considering my opinions seriously, but that probably will depend on your own opinion about what I call the Planet of the Apes. The mathematics which statisticians use actually is called probability, and there are all kinds of fads concerning which name is given to any particular study. Usually more theoretical research that does not involve real data goes under the name of probability in practice, while studies that are mostly data oriented go under the name of statistics, but in between there is much opinion. Statisticians often are more concerned with using real data to test theories or hypotheses or to estimate parameters which are important theoretical constant values or population constant values (like the population mean) of which statistics (such as the sample mean) are considered to be estimates. Logic-based probability (LBP), which I introduced in 1980, differs from the mainstream Bayesian conditional probability (BCP for short here) in such a simple way that it is almost laughable: instead of dividing important probabilities, we subtract them under quite general conditions. The nice thing about LBP is that you don't run into the difficulty of dividing by zero, which results in mathematical contradictions. BCP is not defined when the denominator probability is zero, which makes it harder to handle rare events (events of probability at or near zero) among others. Abstracts of 46 of my papers are available on the internet at http://www.logic.univie.ac.at at the Institute for Logic of the University of Vienna (select ABSTRACTS and then select BY AUTHOR and then select Osher Doctorow). I advise you to try to read them or to read those which seem relevant if you want to know what LBP does. I have tried to write them for general as well as specialized readers as much as I can, because I am very interested in what most people learn rather than only the top 1% or so. I have applied LBP to criminology, military strategy and tactics, politics, economics, management, and so on, and it differs from its rival BCP in giving much more elaborate answers. Roughly speaking, BCP (the mainstream approach) gives you some sample estimate of the population as your answer, and its only advice is: keep on sampling to get better estimates or estimates that change in time. Roughly speaking, LBP will embarrass some people in the opposite direction - it gets at the causes and influences in the problem. Factor analysis tries to do that in psychology, but it is not an LBP method and its "factors" are roughly speaking summaries of how the data cluster or hang together in a sense. Cluster analysis in mainstream statistics is similar to factor analysis. In LBP, for example, if you tell me that you need to decide whether or not to occupy the Golan Heights in Israel in order to control surrounding territory, you will get the direct answer: occupy it, because a height is a critical extremum point (maximum or minimum point) and such points are among those which have most (military) influence in LBP theory. No nonsense about that, provided that you don't mind the Missouri "show me" philosophy. Your particular problem, which involves categorical data as I recall, would be especially well handled by LBP if all else is constant. Generally speaking, LBP does best when the problem involves influence/causation with rare or fairly rare events, events that influence each other fairly much or highly (unlike independent or low/non-influence events), boundary/border events (including geographical boundaries, surfaces of objects or organisms, interdisciplinary problems, problems on the boundary of two real or abstract fields, etc.), events which are subsets (contained in) the events which they influence, and/or events which have probability at or near zero. Whereas BCP often uses normal/Gaussian statistics (bell-shaped) and t statistics, LBP uses the uniform or equiprobable type statistics and the non-symmetric or lopsided statistics (skewed or bent to one side in their probability graphs) such as the gamma (including exponential and chi-square) and F statistics. You might recall that analysis of variance (ANOVA) and regression use F statistics in mainstream statistics, but that is more of a coincidence than general choice in BCP - both approaches happen to agree just in those cases, although LBP might still try to see whether uniform/equiprobable statistics work better (they often do). Let me know what you think either after reading this email or after reading the abstracts. It has been fun writing this. Do keep my name on file in criminology statistics/probability, since it's more fun for me than the usual problem areas. Paul Wicks: Just a few points 1) I appreciate the factor analysis problem - assuming normality is all you can do. Alternatively- if you can- use Principal component analysis. 2) If you are unable to regress the mean, try regressing the median (see quantile regression) or ordinal regression. But there are few other tricks at your disposal - fortunately the psychology/ psychiatry are quite lean at this. Of course, they shouldnt be. Lilian de Menezes: The Statisrical Approach to Social Measurement, David J. Bartholomew, Academic Press, 1996 may be helpful. Jay Warner: If you can possibly set up your catagories into ordered sequences - ordinal data - then you can do it. There is a good deal of discussion on whether the increments in such a deal are even, but frankily, this is a small sacrifice to make in exchange for the improved informaiton. As an example, the increments on a Likert scale are defintely not equal. But the increments are probably different for diffreent respondents. We should spend time 'norming' the scale so peole will be more precise in their answers. I ran across one case where the researcher threw out every response where the resondent put the check mark midway between two choices. The resondent was trying to use the scale as continuous, while the researcher was unable to use this more precise information. Eric Wong: Item response theory (particularly RASCH models) will solve the problems. Ivailo Partchev: There are basically the probit and the logit approaches Probit: In Lisrel, you can start with Prelis and evaluate a matrix of polychoric correlations along with their asymptotic covariance matrix. These are then used as inputs in a Lisrel run. We have been taking a fairly close look at this for a certain class of models, using both real and simulated data, and it seems to work fine More recently, Muthen&Muthen's Mplus seems to have adopted the logit approach, but I am afraid this is still on my to-do list. On the other hand, there is also software for multivariate item response models, such as Acer ConQuest, which can do things like latent regression, or estimate the covariance matrix of latent variables measured via a partial credit model. We have been trying it out -- results are quite consistent with what we get from Prelis-Lisrel. For an example, you may take a look at our latest article: Steyer, R. & Partchev, I. (2000) Latent state-trait modeling with logistic item response models - there is a pdf file at http://www.uni-jena.de/svw/metheval/publikationen/start.html Mitchum Bock A recent volume in the Springer "Statistics for Social Science and Public Policy" Series may be of interest, although I've only skim-read it myself. Ordinal Data Modeling Valen E. Johnson James H. Albert ISBN 0-387-98718-5 Miland Joshi: You might find useful Alan Agresti's Introduction to Categorical Data Analysis. He deals with ordinal as well as nominal data. Jarl Kampen: Consider the article Kampen, J.K. & M. Swyngedouw (2000), "The Ordinal Controversy Revisited" published in the Februari issue of Quality & Quantity. You'll find many helpfull references there. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%