I'd greatly appreciate any suggestions you might be able to offer regarding
the following problem.
The data are all ** categorical ** , interval in nature. The data are not
necessarily normally distributed. Shown below is a small sub-sample of
the large database (do not attempt to analyze this sub-sample). Here are
the challenges I face:
1) what statistical approach could one used to ascertain which two (or
more) of the input columns, taken together, are most consistently associated
with high levels of the output column.
2) which specific levels of the above two (or more) input columns would
one use to achieve the highest level of the output column.
3) list the five top 'combinations' yielding the highest output.
I really appreciate your help, and look forward to hearing back from you
with an approach that I could try.
Nicholas Kormanik
Salt Lake City, Utah
P.S. -- This is not a 'homework' assignment, or something like that. This
is quite serious and important to me in work I am now attempting.
Cases Input1 Input2 Input3 Input4 Output
O 5 3 4 1 5
P 4 1 4 1 5
V 1 2 3 1 5
Y 4 3 2 1 5
Z 2 1 3 0 5
K 4 1 1 0 4
N 1 3 1 0 4
R 4 1 5 1 4
T 4 2 3 0 4
U 2 2 1 1 4
C 4 3 5 0 3
D 4 1 2 1 3
E 3 1 5 1 3
J 3 1 3 1 3
L 4 2 4 0 3
W 1 3 2 0 3
B 3 2 2 1 2
F 3 2 5 1 2
G 4 1 5 1 2
I 3 3 2 1 2
Q 3 1 5 0 2
S 3 3 4 1 2
A 2 3 5 1 1
H 1 3 5 0 1
M 4 3 1 1 1
X 2 3 2 0 1
|