Dear all,
I have a small question concerning PCA again. I have a small question.
I have questionnaire data consisting of binary (yes/no) responses. The
responses are to questions on a variety of topics.
Can PCA be carried out when all of the variables are binary? If so, is
a correlation -based PCA (where all the variables are standardised) or
covariance based PCA (where the variables are not standardised)
appropriate?
Many thanks,
Kim
********************************************************
PS I had several replies to my previous question. Here they are:
Dear all,
Just a quick question...
My data consists of replies from respondents for x questions, each
respondent has to give an opinion to each of the questions. There are 5
categories: 1=strongly disagree...5=strongly agree.
a)I have z subjects who each gave opinions both before and after an
event takes place. b)I also have two independent groups. One group gave
opinions before the event and a different group gave opinions after the
event.
Bearing in mind the nature of the data, my question is, to test if the
'event' changed the subjects opinions:
For a) is it appropriate to use the Wilcoxon signed ranks test, and For
b) is it appropriate to use the Mann Whitney U test.
Many thanks for your advice,
Kim.
****************************************
Dear Kim,
That is completely correct. Just bear in mind that at least the Wilcoxon
Matched-Pairs Signed-Ranks Test assumes identical (but not necessarily
normal) distributions. If that is not the case a good alternative would
be the Sign Test. The Mann Whitney U Test is also called the Wilcoxon
Rank Sum Test and also has alternatives, the Median Test, the
Kolmogorov-Smirnov Two-Sample Test, etc.
Regards
*****************************
Kim;
I have a colleague here who has just carried out a similar analysis
(questionnaire applied before and after a course). Note that the
questionnaire was annonymous, and so she could not pair the data. She
used a Mann Whitney U test, basically to see if the mean value given
was the same/different. Whilst I believe that this is OK, I have a
couple of minor reasons for disquiet:
1: although the U test copes with ties in the rank, I'm not sure how
well it copes when more or less all the data will be tied in rank
(since there are only 5 possible outcomes).
2: I feel that by looking only at the mean value you are losing some of
the information. For example there may be no significant change in the
mean value, but one group may be more concentrated in the extremes than
the other. This, I believe will be missed by the U test.
Because of these I would be tempted to do a chi-square test to see if
the ratio of responses before and after was the same. This will have 4
degrees of freedom, and so could be decomposed, if significant, into 4
orthogonal comparisons, each with one degree of freedom (although
finding a text on exactly how to do this with anthing more complex than
a 1xn table has eluded me so far).
I would be interested in any other responses that you get, so would
encourage you to summarise and post to the newsgroup.
Good luck with your analysis
******************
Hello Kim,
It might be appropriate to use Cohen's
kappa statistic which is normally used
to assess inter-rater agreement. In your
case you could use it to compare before
and after.
Best Regards,
***********************
Kim,
The answer to each of your questions is YES.
***************************
a,
Wilcoxon should work ok.
Bear in mind if you get lots of 'zero' differences you could be in
trouble!
b,
Mann-Whitney should be fine for this.
Bear in mind assumptions of Mann-Whitney - random samples, independence
(obviously!) and that the data differ in location only (I would imagine
both samples follow the same distribution and
have the same variation)
You may also want to consider Kruskal-Wallis for this.
Hope that helps
********************************
Dear Mrs Pearce,
Basically your question fits to the analysis of a multiway frequency
table using a multinomial model, which is equivalent to fitting a
Poisson regression.
The dimensions of this table are defined by the x questions plus any
other relevant factors you considered necessary for the problem at hand.
Your response variable in the Poisson regression are the counts in each
cell of the multiway table, while the explanatory variables are the x
questions plus any other factors.
The analysis is straightforward and if you are using S+, you can find
more details in the well known textbook "Modern Applied Statistics with
S+", written by Venables and published by Springer (page 200 of the 4th
edition), while if you are using SAS an exposition of this approach can
be found in "Categorical Data Analysis Using the SAS system" by Stokes
et.al and published by the SAS institute (Chapter 9).
Best regards
****************************************
|