Dear Allstaters,
I would be very grateful for any help and advice with the following
problem..
Can anyone help me resolve whether or not using Fisher's exact test to test
for an association is appropriate in a situation when data is available for
the whole population? Specialist gene expression software (GeneData
Expressionist and some others) have included this test to test the null
hypothesis of no assocication between differentially expressed genes and a
particular GO category (e.g. genetic pathway) and we are unsure about the
validity of this test.
The problem is as follows:
Suppose that we randomly assign plants to one of two groups; one of which
is treated with a chemical and the other of which is left untreated. x
hours after treatment, RNA is extracted from each plant and tested on an
Affymetrix GeneChip (microarray). This results in gene expression
measurements for about 10,000 genes for each plant. An analysis to
identify those genes which show evidence of differential expression is
conducted (with appropriate adjustment for multiple comparisons) and a list
of 100 'differentially expressed' genes is identified.
The question that is then posed is e.g. 'Is there any evidence that Pathway
A is implicated in response to the chemical?' (Note that this question may
be posed for a particular pathway of interest or for many pathways). The
genes are then used to construct a 2 way table: Differentially Expressed /
Not Differentially Expressed versus In Pathway A / Not in Pathway A (note
that neither of these classifications is without error). A Fisher's Exact
Test is then carried out and if the p-value is significant, the conclusion
is that Pathway A is over- (or under-) represented in the group of DE genes
and that this over-representation cannot be explained by 'pure chance'.
Our problem with the use of this test is that we are familiar with its use
in inferential statistics, where we would test a sample and apply
inferences to a population (about whether there is a true association
between A and B). However in this case, the 'sample' actually represents
the entire population of genes on the microarray (i.e. 10,000) - and we
know how each of these is classified with regard to both DE and Pathway A
(athough as I have said above, there will be error associated with both
classifications). For example, our 2x2 table may show that 10 out of the
100 DE genes are classified as being part of Pathway A - and that there are
20 genes in Pathway A in total. We can see from these numbers that Pathway
A is over-represented in the DE genes (10/100 v 10/9900). What is the
Fisher's Exact test testing in these circumstances and is it valid? If we
find that the test is significant what would we conclude?
I would be very grateful for any help/comments/references here as I
currently seem to be going round in circles!
Many thanks,
Carol Yarrow
|