To add a little to the previous answer.
Chi-square tests are riddled with complications.
If you've got a 2x2 table, then if one of your expected frequencies is
below 5, then you've got 25% below 5. If the table is larger (say
3x5) then you need 3 cells to have expected frequencies below 5 before
it's a problem.
However, this is a bit like the normal distribution assumption or
interval data assumption in lots of other tests - if it's not
satisfied, it matters. But how much does it matter?
As well as the problem of expected values, you have the problem of
whether to use Yates Correction (which SPSS calls Continuity
Correction).
If you don't use Yates correction, your p-value will be a little too
low. If you use Yates correction, your p-value will be a little too
high.
The solution is to use Fisher's exact test. Always. Compare the
p-values from the 3 tests (Pearson Chi, Continuity Correct Chi, and
Fisher's exact), you'll find that the Exact test p-value is almost
always in the middle of the other two.
It's an (other) example of a historical legacy - statistics, and
statistics taught to/by psychologists is burdened with them. Before
we had computers, Fisher's exact test was hard to do if you had large
sample sizes, and if you had larger than a 2x2 table, if was almost
impossible. Nowadays we've got computers, so it's easy.
But textbooks are written as if we don't have computers, so they go on
about chi-square tests and assumptions of chi-square tests and cell
sizes.
(Another interesting one is the issue of independence. Independence
is an assumption of every statistical test [at least the ones that
aren't designed to take account of non-independence*], but why is it
that people always mention it for chi-square tests, but not for the
others?
Jeremy
*Multilevel models, Huber-White sandwich estimators. Not the sort of
thing you come across often.
On 31/01/2008, Fiona Kennedy <[log in to unmask]> wrote:
>
>
>
> Hi all,
>
> In chi-square analysis i understand that the expected count must be above 5.
> I have read various texts; some that say that no expected count should fall
> below 5 and others that say if more that 20% do this is problematic. Does
> anyone know of any definitive rules for this?
>
> In my data i have a couple of significant results where my expected counts
> do fall below 5 and i am unsure of whether to ignore these results
> completely because of the inaccuracy of not quite meeting the expected count
> assumption or whether to use Fisher's exact Test (when it is calculated in
> 2x2 tables).
>
> Also, does anyone know whether there are any rules of thumb regarding the
> strength of assocation tests (Phi/Cramer's V) in terms of what constitutes a
> valuable/strong association? Finally if anyone can suggest a good text that
> explains all of these complexities?
>
> Thanks!
>
> Fiona
>
>
> Fiona Kennedy
> Postgraduate researcher
> Centre for Appearance Research
>
>
> Faculty of Health & Life Sciences University of the West of England
> Bristol, BS16 1QY
>
> tel: 0117 3281890
> fax: 0117 3283645
>
> A date for your diary! The Centre for Appearance Research is pleased to
> announce "Appearance Matters 3" Conference - 1-2 July 2008. Full details
> available at http://science.uwe.ac.uk/appearancematters or
> contact [log in to unmask]
>
>
> ________________________________
> This email was independently scanned for viruses by McAfee anti-virus
> software and none were found
>
>
--
Jeremy Miles
Learning statistics blog: www.jeremymiles.co.uk/learningstats
Psychology Research Methods Wiki: www.researchmethodsinpsychology.com
|