Hi,
I'm using R to analyze the normality of a variable (yield) which is
supposed to be continuous. However (supposedly) due to the precision of
the measuring device, the data are rounded and thus many replicated values
appear. In fact when looking at the normal qqplot of the data, they appear
quite normal but the qqplot has a "stairs-like" shape because data appears
as if they were discrete because of the rounding process.
I first tried a KS test but got a warning that the presence of ties
(replicated values) makes the calculation of p-values impossible. OK so I
thought about using a Chisquare test "goodness of fit" to check if those
"discrete like" data can be assumed normal. First I standardized the data
and cut it into 12 bins of approx equal length. The observed count of data
in each bin was computed and each bin contains at least 7 data. Then I
computed the expected counts for these bins under the null hypothesis
(normality). Then I computed the chisquare statistics and I got a value of
72.4. Using 12-(2+1)=9 degrees of freedom (12 bins and I estimated 2
parameters when standardizing the data), this got a p-value of the order
of 10 power minus 12......thus strongly rejecting the hypothesis of
normality.
I assume there was something wrong in how I did this because the qqplot
appeared really close to normal (see it here:
http://img18.imageshack.us/img18/7787/yieldh.jpg).
I guess there is an issue on how the bins were created or simply that a
Chi-square test is not appropriate in this situation. Therefore a few
questions:
- Is there a proper way to cut a "continuous BUT discrete-like" variable
(due to rounding) to build a chisquare test for normality?
- What should I care for when creating the bins?
- Is this an issue related to the bins containing the same value
replicated X times?
- Is there any other goodness of fit test for normality that would be
helpful in such circumstances (rounded data) and would provide accurate
p-values?
Thanks a lot!
Aziz
|