Print

Print


Dear listmembers,
some time ago I asked about information concerning the assumptions of
statistical analytical methods and the level of the data. Thanks for
responding to my question, below you find a summary. I didn't have time
yet to study every suggested article, so I might get back to this topic
in the future.....
Gabry


Ruth Helm:
May I suggest you read  Streiner (1995) "Health Measurement Scales"
Oxford,
Oxford Medical Press. Page 38.


Knut M. Wittkowski:
Except for a few special cases

        Spearman/Pearson correlation coefficient
        Friedman/ANOVA

applying methods based on the linear model (interval/absolute scale)
cannot
be meaningfully applied to ranks. You may want to follow the discussion
rejuvenated by the seemingly easy approach advocated by

Conover, WJ; Iman, R (1981)
        Rank transformations as a bridge between
        parametric and nonparametric statistics.
        Am Statist 35: 124-134

which is still recommended by SAS, see

http://www.bio.ri.ccf.org/docs/ASA/plunch.html

September 9, 1998
Is Rank Transformation Method a bad idea?:
Guang-Hwa "Andy" Chang, Ph.D.,
Youngstown State University

The rank transformation (RT) refers to the replacement of data by their
ranks, with a subsequent analysis using the usual normal theory
procedure,
but calculated on the ranks rather than on the original data. This idea
was
originally suggested by Lemmer and Stoker (1967) and advocated by
Conover
and Iman. The availability of statistical packages for parametric tests
makes the rank transformation method appealing. SAS had also added this
option in their package. However, Blair, Sawilowsky and Higgins (1987)
showed that, for 4x3 factorial designs, a severe inflation in Type I
error
of the RT statistics for testing interaction is observed as either the
cell
size becomes large or the row and column main effects are large. It was
a
huge disappointment. Is the rank transformation method a bad idea? Some
research results after the simulation study by Blair et al. will be
presented in this talk.

See also:

Akritas, MG; Arnold, SF; Brunner, E (1997)
        Nonparametric hypotheses and rank statistics for unbalanced
factorial
designs.        Journal of the American Statistical Association 92:
258-265

Haas, CN (1999)
        On modeling correlated random variables in risk assessment.
        Risk Anal 19: 1205-1214



Osher Doctorow:
I do not represent the majority opinion in statistics/probability, but a

small minority opinion.   That might actually be enough justification
for
considering my opinions seriously, but that probably will depend on your
own
opinion about what I call the Planet of the Apes.

The mathematics which statisticians use actually is called probability,
and
there are all kinds of fads concerning which name is given to any
particular
study.  Usually more theoretical research that does not involve real
data
goes under the name of probability in practice, while studies that are
mostly data oriented go under the name of statistics, but in between
there
is much opinion.  Statisticians often are more concerned with using real

data to test theories or hypotheses or to estimate parameters which are
important theoretical constant values or population constant values
(like
the population mean) of which statistics (such as the sample mean) are
considered to be estimates.

Logic-based probability (LBP), which I introduced in 1980, differs from
the
mainstream Bayesian conditional probability (BCP for short here) in such
a
simple way that it is almost laughable: instead of dividing important
probabilities, we subtract them under quite general conditions.  The
nice
thing about LBP is that you don't run into the difficulty of dividing by

zero, which results in mathematical contradictions.   BCP is not defined

when the denominator probability is zero, which makes it harder to
handle
rare events (events of probability at or near zero) among others.

Abstracts of 46 of my papers are available on the internet at
http://www.logic.univie.ac.at at the Institute for Logic of the
University
of Vienna (select ABSTRACTS and then select BY AUTHOR and then select
Osher
Doctorow).   I advise you to try to read them or to read those which
seem
relevant if you want to know what LBP does.  I have tried to write them
for
general as well as specialized readers as much as I can, because I am
very
interested in what most people learn rather than only the top 1% or
so.   I
have applied LBP to criminology, military strategy and tactics,
politics,
economics, management, and so on, and it differs from its rival BCP in
giving much more elaborate answers.

Roughly speaking, BCP (the mainstream approach) gives you some sample
estimate of the population as your answer, and its only advice is: keep
on
sampling to get better estimates or estimates that change in time.

Roughly speaking, LBP will embarrass some people in the opposite
direction -
it gets at the causes and influences in the problem.   Factor analysis
tries
to do that in psychology, but it is not an LBP method and its "factors"
are
roughly speaking summaries of how the data cluster or hang together in a

sense.  Cluster analysis in mainstream statistics is similar to factor
analysis.   In LBP, for example, if you tell me that you need to decide
whether or not to occupy the Golan Heights in Israel in order to control

surrounding territory, you will get the direct answer: occupy it,
because a
height is a critical extremum point (maximum or minimum point) and such
points are among those which have most (military) influence in LBP
theory.
No nonsense about that, provided that you don't mind the Missouri "show
me"
philosophy.

Your particular problem, which involves categorical data as I recall,
would
be especially well handled by LBP if all else is constant.   Generally
speaking, LBP does best when the problem involves influence/causation
with
rare or fairly rare events, events that influence each other fairly much
or
highly (unlike independent or low/non-influence events), boundary/border

events (including geographical boundaries, surfaces of objects or
organisms,
interdisciplinary problems, problems on the boundary of two real or
abstract
fields, etc.), events which are subsets (contained in) the events which
they
influence, and/or events which have probability at or near zero.
Whereas
BCP often uses normal/Gaussian statistics (bell-shaped) and t
statistics,
LBP uses the uniform or equiprobable type statistics and the
non-symmetric
or lopsided statistics (skewed or bent to one side in their probability
graphs) such as the gamma (including exponential and chi-square) and F
statistics.  You might recall that analysis of variance (ANOVA) and
regression use F statistics in mainstream statistics, but that is more
of a
coincidence than general choice in BCP - both approaches happen to agree

just in those cases, although LBP might still try to see whether
uniform/equiprobable statistics work better (they often do).

Let me know what you think either after reading this email or after
reading
the abstracts.  It has been fun writing this.  Do keep my name on file
in
criminology statistics/probability, since it's more fun for me than the
usual problem areas.

Paul Wicks:
Just a few points

1) I appreciate the factor analysis problem - assuming
normality is all you can do. Alternatively- if you can- use
Principal component analysis.

2) If you are unable to regress the mean, try regressing
the median (see quantile regression) or ordinal regression.

But there are few other tricks at your disposal -
fortunately the psychology/ psychiatry are quite lean at
this.

Of course, they shouldnt be.

Lilian de Menezes:
The Statisrical Approach to Social Measurement, David J. Bartholomew,
Academic Press, 1996 may be helpful.

Jay Warner:
If you can possibly set up your catagories into ordered sequences -
ordinal
data - then you can do it.  There is a good deal of discussion on
whether
the increments in such a deal are even, but frankily, this is a small
sacrifice to make in exchange for the improved informaiton.

As an example, the increments on a Likert scale are defintely not equal.

But the increments are probably different for diffreent respondents.  We

should spend time 'norming' the scale so peole will be more precise in
their answers.  I ran across one case where the researcher threw out
every
response where the resondent put the check mark midway between two
choices.  The resondent was trying to use the scale as continuous, while

the researcher was unable to use this more precise information.

Eric Wong:
Item response theory (particularly RASCH models) will
solve the problems.

Ivailo Partchev:
There are basically the probit and the logit approaches

Probit: In Lisrel, you can start with Prelis and evaluate a matrix of
polychoric correlations along with their asymptotic covariance matrix.
These
are then used as inputs in a Lisrel run. We have been taking a fairly
close
look at this for a certain class of models, using both real and
simulated
data, and it seems to work fine

More recently, Muthen&Muthen's Mplus seems to have adopted the logit
approach, but I am afraid this is still on my to-do list. On the other
hand,
there is also software for multivariate item response models, such as
Acer
ConQuest, which can do things like latent regression, or estimate the
covariance matrix of latent variables measured via a partial credit
model.
We have been trying it out -- results are quite consistent with what we
get
from Prelis-Lisrel. For an example, you may take a look at our latest
article: Steyer, R. & Partchev, I. (2000) Latent state-trait modeling
with
logistic item response models - there is a pdf file at
http://www.uni-jena.de/svw/metheval/publikationen/start.html

Mitchum Bock
A recent volume in the Springer "Statistics for Social Science and
Public
Policy" Series may be of interest, although I've only skim-read it
myself.

Ordinal Data Modeling
Valen E. Johnson
James H. Albert

ISBN 0-387-98718-5

Miland Joshi:
You might find useful Alan Agresti's Introduction to Categorical Data
Analysis. He deals with ordinal as well as nominal data.

Jarl Kampen:
Consider the article

Kampen, J.K. & M. Swyngedouw (2000), "The Ordinal Controversy Revisited"

published in the Februari issue of Quality & Quantity. You'll find many
helpfull references there.





%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%