Dear Allstatters!
I've been reading through Philip B. Stark's amazing on-line statistics texts
and came upon a question.
In chapter 28 on the multinomial distribution (
http://www.stat.berkeley.edu/~stark/SticiGui/Text/chiSquare.htm<http://www.stat.berkeley.edu/%7Estark/SticiGui/Text/chiSquare.htm>)
he derives the Chi-square statistic based on each category having a binomial
probability distribution and hence by standardizing the errors:
(X-np) / sqrt(np(1-p))
This is followed by squaring and summing across the table.
Then he goes on to say "There are theoretical reasons, beyond the scope of
this book, that make it preferable to omit the factors (1 - *p**i*) in the
denominators of the terms in the sum. (If there are many categories, and
none of the category probabilities is large, then (1 - *p**i*)½ is nearly
unity, and it does not matter whether we include the factors.)"
Which results in the well known formula chi sq= SUM((X-np)^2/np)
Now, I am (clearly) not a statistician, but I really appreciated this step
by step derivation. I have been however unable to find any references for
this "omission" of the (1-p) term. (Yes, I had a look at the 1900 Pearson
article, but it is way beyond my understanding).
I am also curious as to the relationship between this derivation and
z-scores? It looks to me that Chi square is effectively a (simplified) sum
of squared z-scores? Am I completely off base?
I would really appreciate any enlightening comments or pointers. I can sum
them up for the list, although it is more likely I'll get the impression
that I asked a completely ridiculous question :)
Thank you for taking the time to read this!
Sincerely,
Maja Zaloznik
You may leave the list at any time by sending the command
SIGNOFF allstat
to [log in to unmask], leaving the subject line blank.
|