Print

Print


Hi Everyone,

Thanks to all those who took the time to respond to my email. For my part I went ahead with CHAID (AnswerTree) & PCA. Since I was using SAS, I used a procedure call PROC VARCLUS which actually uses PCA to create clusters of variables to choose from. Once the clusters are created, its much simpler to pick a few meaningful variables from each of the clusters to model on.

Regards,
Indrajit




----- Original Message ----
From: Indrajit Sengupta <[log in to unmask]>
To: [log in to unmask]
Sent: Wednesday, September 9, 2009 11:48:35 AM
Subject: Variable reduction problem

Hi All,

I am faced with a modeling scenario where, there are too many variables (almost 1000) and too few observations (approx 700). I am trying to reduce the number variable to something around 100-150 before I go into modeling. I thought of using Principle Component Analysis & Factor Analysis to reduce variables but all these techniques require the Observations to Variables ratio to be at least 5, which I don't have. Is there any other technique available to reduce the variables? My independent variables are mostly demographic & household related variables hence would be highly correlated.

Thanks in advance,
Indrajit