If you wouldn't mind sparing a few minutes, I'd like to ask for help with a
statistics problem I've been working on.
The problem is similar to the following:
100 X variables, covering the gamut of 'youth' characteristics, collected
mostly in elementary schools. These values include scores on various
standardized tests; extra-curricular activities involvement; number of
books read per year; various physical characteristics; estimates of
neighborhood income levels; distances home residence is from school, etc.,
etc. Outliers have been trimmed. Some of the X variables appear to follow
a normal distribution, while others are clearly bimodal.
6 or so Y variables, assessing measures of 'success' later in life, such as
annual income, net worth, size of residence home, subjective measures of
happiness with status in life, etc. Also included are some 'failure'
measures, such as number of days spent in jail; number of divorces; and,
days spent in hospitals. Some Y variables follow a normal distribution,
while others are bimodal or considerably skewed.
Total N is approximately 30,000.
I've compiled all the X's and Y's into a single large worksheet of columns
and rows. Additionally, I've constructed a second worksheet by converting
all the continuous variables to categorical variables, which should allow
for the use of statistical approaches suitable with categorical variables.
A number of cells throughout the worksheet are blank where data were not
available.
My objective is to attempt to ascertain which ***severable-variable
constellation(s)*** of the 100 predictor variables are statistically most
important (in this data set) for predicting 'success,' as well as to
avoid 'failure.'
Although I definitely hope various 'constellations' emerge, if in the end
no models turns out to have much of any predictive value, that's okay.
Perhaps attempting to make such predictions is far more difficult than
first might appear.
Minitab is the statistics package I'm most accustomed to using (from my old
college days, and since). However, I also am familiar with SAS, and other
statistics programs.
Please help me by suggesting which statistical approach(s) **you** would
most likely use to squeeze the relavent relationships from this data set.
I very much appreciate any help you can provide, and look forward to
hearing back from you.
Sincerely,
Nicholas Kormanik
[log in to unmask]
Salt Lake City, Utah
|