Hi Allstat members,
I am reviewing a manuscript in which the authors have used principle
component analysis to attempt to classify patients into two groups
(schizophrenic and non-schizophrenic) based on post mortem neurochemical
measurements (understand what they have done using this analysis). The
authors also use PLS technique to confirm the analysis from the PCA.
However, I don't feel their terminology is consistent and I am not sure
how they have used PLS to classify their patients.
I have reproduced their statistics section of their paper below and
would welcome some help in interpreting what they have done.
Specifically, have they in fact used DISCRIMINANT ANALYSIS (where they
refer to PLSDA) or principle component regression through the use of for
example Proc PLS in SAS. If they have used the latter, does this
technique classify subjects into groups like PCA analysis. I know it
will give factors like PCA, is this all they have done?
"We applied principle component analysis (PCA) and partial least square
discriminant analysis (PLSDA) to the large data table. Partial least
square regression (PLS), here used for discriminant analysis purposes,
is a regression method with some similarities to multiple regression
(MR), but without some of the short comings of the latter. In all of
our calculations the variables were scaled to zero mean and unit
variance (auto-scaling). All calculations reported here were
statistically significant by the cross-validation criterion. In
addition....."
It sounds to me as if they are using PLS for principle component
regression to pull out the factors that best separate or classify
(group) the multivariate data set into just a few factors or components
i.e. as a data reduction technique similar to PCA. Can one do this with
Proc PLS in SAS?
The authors go on to say "PLSDA: to obtain a projection that better
displays the between group separation a PLSDA calculation was carried
out. Here the different schizophrenic groups observed in the PCA, and
the controls were assigned dummy variables denoting their respective
group belonging (1 or 0 giving the possible combinations 100, 010, 001
denoting the different groups). In this case a three component model
accounted for......"
I may be having problems understanding the use of PLS (or PCR which
ever) since I thought one would use PLS for predicting a value of e.g.
the concentration of a chemical when one has a calibration data set, or
predicting the pharmacological activity of a new drug when you have just
the physico-chemical properties of this drug and again a training data
set which will include pharmacological activity measures.
I hope some one is able to help me better understand the use of PLS (or
PCR or discriminant analysis) in this paper.
Kind regards.
Alex.
--
Alex M. Gray, Ph.D
PPD Development
Research Triangle Park
3900 North Paramount Parkway
Morrisville, NC 27560
USA.
|