I'd greatly appreciate any suggestions you might be able to offer regarding
the following problem.
The data are all ** categorical ** , interval in nature. The data are not
necessarily normally distributed. Shown below is a small sub-sample of
the large database. Here are the challenges:
1) what statistical approach could one used to ascertain which two (or
more) of the input columns, taken together, are most consistently
associated with high levels of the output column.
2) which specific levels of the above two (or more) input columns would
one use to achieve the highest level of the output column.
3) list the five top 'combinations' yielding the highest output.
I really appreciate your help, and look forward to hearing back from you
with an approach that I could try.
Nicholas Kormanik
Salt Lake City, Utah
Cases Input1 Input2 Input3 Input4 Output
O 5 3 4 1 5
P 4 1 4 1 5
V 1 2 3 1 5
Y 4 3 2 1 5
Z 2 1 3 0 5
K 4 1 1 0 4
N 1 3 1 0 4
R 4 1 5 1 4
T 4 2 3 0 4
U 2 2 1 1 4
C 4 3 5 0 3
D 4 1 2 1 3
E 3 1 5 1 3
J 3 1 3 1 3
L 4 2 4 0 3
W 1 3 2 0 3
B 3 2 2 1 2
F 3 2 5 1 2
G 4 1 5 1 2
I 3 3 2 1 2
Q 3 1 5 0 2
S 3 3 4 1 2
A 2 3 5 1 1
H 1 3 5 0 1
M 4 3 1 1 1
X 2 3 2 0 1
|