Hello everyone,
Many thanks to all those who replied to my email last week (see below). I greatly appreciate it.
The most suitable method is 'multiple correspondence analysis'. I found the following references particularly useful:
Correspondence Analysis in Practice (2007). Chapman & Hall
Biplots in Practice (2010) (Michael Greenacre) available for free online at www.multivariatestatistics.org
And the following links:
http://www.statsoft.com/textbook/correspondence-analysis/
marketing-bulletin.massey.ac.nz/V14/MB_V14_T2_Bendixen.pdf
http://www.utd.edu/~herve/Abdi-MCA2007-pretty.pdf
The most useful packages that I have found for analysis and plotting are:
SAS (for multiple correspondence analysis: proc CORRESP) and
Minitab (for simple correspondence analysis).
Thanks again to everyone,
Kindest Regards,
Kim
-----Original Message-----
From: Kim Pearce
Sent: 20 September 2011 11:05
To: [log in to unmask]
Subject: PCA categorical variables
Hello everyone,
I would appreciate your views on the following...
For a Principal Component Analysis, we have N subjects and p variables. Say one of our variables is categorical (nominal) with categories corresponding to either 'yes', 'no', 'don't know' or 'confidential'. Would the 4 categories be entered into the PCA as 3 dummy binary variables...i.e. x1, x2 and x3 coded, perhaps, like so:
x1 x2 x3
yes 1 0 0
no 0 1 0
confidential 0 0 1
don't know 0 0 0
(where, here, 'don't know' is the reference category) i.e. just as in regression, a q category variable is entered as q-1 dummy variables.
Thanks so much for your views,
Kim
You may leave the list at any time by sending the command
SIGNOFF allstat
to [log in to unmask], leaving the subject line blank.
|