Esther
I think your email raises a number of different issues concerning the use of
statistics for contaminated land.
The CIEH/CL:AIRE guidance indicates that it is preferable to assume a normal
distribution for concentrations than use the Chebychev method because the
Chebychev method lacks statistical power to find that the Null hypothesis,
in this case that the land is not contaminated, is false i.e. when other
more powerful methods would provide sufficient evidence to overturn the Null
hypothesis. The spreadsheet may all too readily suggest there is
insufficient evidence to use a normal distribution despite the fact that the
Chebychev method will frequently give a false negative. The most obvious
case of this is found when the sample distribution is positively skewed and
the lower bound estimate for the population mean at a 95% level of
confidence becomes negative: the upshot being there is no critical value low
enough to suggest that the site has a problem based on the LCL95%! This is
far more likely to occur for the Chebychev method than classical normal
distribution-based statistics. Some other methods, e.g. simple
bootstrapping, are immune from this problem although inexplicably use of
bootstrap methods is discounted in the CIEH/CL:AIRE guidance because results
are usually similar to those obtained assuming a normal distribution:
something most people would probably see as confirmation for the bootstrap
method!
In the first instance I suggest you question if the sample distribution
really should be so non-normal. Are you sure all the samples come from the
same type of stratum (similar soil texture and colour)? Do you have
sufficient samples? If there is an expectation that the stratum could be a
problem then around thirty samples would usually be sufficient to reliably
define the distribution unless it is highly skewed. Fewer samples may be
required for a uniformly contaminated material: but in this case one would
expect a more normal distribution of sample concentration. Concentrations
should be plotted by location and depth to see if they are spatially
correlated. A non-normal distribution of concentration could be obtained by
combining samples from clean and dirty strata.
Assuming that samples are all from the same material, there are sufficient
samples and there is a pollutant linkage, then determination of Part IIa can
be based on the balance of probabilities. The expected value of the mean
population concentration is the sample mean. If the sample mean
concentration exceeds the unacceptable limit for the contaminant based the
exposure model and relevant toxicological data then on the balance of
probabilities the site is contaminated. In the spreadsheet you should use a
51% confidence interval to ensure that the sample mean exceeds the critical
value.
If the data are truly non-normal then for something as contentious as Part
IIa I suggest you get advice regarding alternative methods for calculating
confidence intervals. There is free alternative software available such as
ProUCL, but this does not substitute for spatial analysis of the data and
testing for the completeness and suitability (non bias) of the data. In
addition you should consider the likely scale of variability for your
non-normal distribution of concentrations since this may have an impact on
whether there is likely to be a significant pollutant linkage depending on
the mode of exposure (ingestion, inhalation etc).
Kind regards,
Jonathan.
These are my personal views and do not necessarily reflect the views of my
employer.
(for Paul's benefit)
Dr Jonathan Welch
Associate Director
AECOM Ltd
----- Original Message -----
From: "Esther MacRae" <[log in to unmask]>
To: <[log in to unmask]>
Sent: Friday, September 16, 2011 1:47 PM
Subject: Statistics upper or lower bound?
Hello all,
One for those of you familiar with the CIEH/CL:AIRE statistics calculator...
In a Part IIA scenario when you have a non-normal set of data the CIEH
calculator defauls to calculating the evidence based on Chebychev. You are
therefore given a choice for your Evidence against Null hypothesis; an upper
bound and lower bound.
What would you base your decision on if your upper bound is >51%, but your
lower bound is <51%?
My view is base it on the lower bound, however the User Manual is ambigous,
stating (on pg 22) "there is the requirement for the user to make a
judgement about whether the Null Hypothesis can be rejected given the upper
and lower bounds of the evidence against the Null Hypothesis." Or am I
misinterpreting the manual?
thanks
Esther MacRae
Scientific Officer
The Highland Council
[log in to unmask]
|