I am struggling with the interpretation of the cutoff value in case of
unequal sample sizes.
Denote by z the linear discriminant function of Fisher for separating 2
groups.
Let z1 be the mean score of the first group and z2 the mean score of the
second group.
The standard (very simple) classification rule states than that an
observation x0 is classified in group 1 if its score z0 is closer to z1
than to z2 which is equivalent to comparing the score z0 with the
cutoffvalue (z1+z2)/2...
The problem is that some books (and also the SAS-software?) use another
cutoff value in case the sample sizes are unequal:
they compare z0 with (n1z1+n2z2)/(n1+n2)??!!
This implies that if the first group is much larger than the second
group (n1 >>n2) the cut-off value is shifted towards z1 and more group1
observations will be classified in the second (smaller) group ???
Wouldn't you expect more classifications into the larger group, not into
the smaller group???
Can anyone explain the logic of this approach or point out where my
reasoning is incorrect?
Thanks!
Martina Vandebroek
|