The important thing to recognise about the old-fashioned tables of critical values for the F ratio is that the F distribution is used for more than one purpose. The main issue here is the distinction between using the F-ratio as a hypothesis test in an ANOVA table arising from any kind of linear model, and using a ratio of empirical variances as a direct test of H0: var1=var 2 vs. H1: var1 not = var2. (There is a third use, constructing Clopper-Pearson 'exact' confidence limits for a proportion, but that needn't concern us here - and is more readily done using the beta distribution facilities in software, even Excel is fine for this.)
As far as I can tell (experts on the history of statistics may correct me on this) Fisher et al had the F distribution tabulated specifically with the F test in mind. Consider for simplicity F with 1 and 60 df. The critical value for a default 5% alpha level is 4.00. This is the square of the critical value of t with 60 df for the usual 2-tailed test, viz. 2.00. (It's worth remembering that this is 1.96+2.4/df, to an excellent approximation, for all but very small df.) An F test with 1 and 60 df is essentially the square of a t test statistic (which could be either unpaired or paired i.e. 1 sample based on paired differences). We run the t-test 2-sided by default, and the single tail F probability corresponds, because F will be >4 if t>2 or t<-2. F is a squared measure, t is an unsquared one, so a two-tailed t-test generalises into a 1-sided interpretation of F. (Sometimes the resulting F will be <1 - i.e. if |t| < 1 - in this case H0 is simply not rejected.) In this situation the numerator df is small, one less than the number of groups being compared.
Comparing two empirical variance estimates is a totally separate issue. Here, both df1 and df2 are usually large. It is usual to calculate F = max/min, then refer to F tables with the appropriate df, and this is then a ONE sided p-value for comparing them. When n1 and n2 are unequal, the df in the numerator and denominator will depend on which sample variance is the larger. I think it's normal to double the 1-sided p-value, for consistency, but a case could be made for adding an alternate-tail probability relating to 1/F.
HOWEVER, I wouldn't recommend this test. The snag is that it is highly non-robust, extremely sensitive to departures from the tacitly assumed Gaussian distributional form. In fact, it works just as effectively as a test for normality as for heteroscedasticity (the two tend to co-exist anyway). If you really want to compare the spread of two samples, only (i.e. disregarding location), what is needed is something much more robust. One possibility is the ancillary Levene test that SPSS uses to try to help to choose between equal and unequal variances t-tests (true t-test and Welch test). Pretend you're going to compare the two samples for location using a t-test, but disregard all the output apart from the first 2 columns of the pivot table that give the ancillary test. (When using the SPSS unpaired t routine I always disregard this test as such, as I prefer to use the more robust unequal-variances form of the test - unless we have to generalise into an ANOVA model. Like ancillary tests in general, it is more likely to signal cause for concern by p<0.05 when sample sizes are large, but that is precisely when there is less concern - so such tests are arguably unhelpful to the issue of comparing means.)
Hope this helps.
Robert G. Newcombe PhD CStat FFPH
Professor of Medical Statistics
Department of Primary Care and Public Health
Centre for Health Sciences Research
Cardiff University
4th floor, Neuadd Meirionnydd
Heath Park, Cardiff CF14 4YS
Tel: 029 2068 7247
Fax: 029 2068 7236
Home page http://www.cardiff.ac.uk/medicine/epidemiology_statistics/research/statistics/newcombe
For location see http://www.cardiff.ac.uk/locations/maps/heathpark/index.html
>>> Jay Warner <[log in to unmask]> 10/06/07 06:23:16 >>>
I believe the convention is to always use the F value larger than 1
(i. e., select var-1 and var-2 so that the F ratio is > 1). Adding
this information forces the F-test to be one tailed.
I believe it came from the days when the tables didn't always have
enough precision in the smaller values of F, below 1. But what do I
know -- I wasn't there.
This is neither more nor less conservative.
Of course, if you are going to set up CI's for the variance, then you
need both.
On Jun 9, 2007, at 8:28 PM, David B. Klein <[log in to unmask]> wrote:
> I'm wondering what the rationale would be for using one-tailed
> hypotheses on an F-ratio test of variances as a default. I notice
> Excel does it this way. It's more conservative, to be sure, but why
> not just lower the alpha level if that's what you're after? I don't
> see what in this situation implies a one-tailed test ... you are
> interested in equality of variances, not in one being greater than
> the other. (?)
|