Kevin thx for the chat…. My answere
for what its worth.
My understanding is…. That you
should consider any exceedance > Cc within the strictures of your conceptual
model.
And you should consider what the data
tells you about the underlying distribution…..using the various tests.
If the level of data, the uncertainty and
the conceptual model is acceptable then you can accept the results of the tests
and accept the exceedance. However if you have outliers then I think two answers
are available….
Given sufficient data collection and for
the risks and uncertainties- you can chose to keep the outliers in and
demonstrate that the contamination is not at such levels or extent that it jeopardizes
the “clean” status of the site.- hence proving the distribution is
substantially and significantly below GAC/Cc. ie @ 95%
However, if outliers are identified and
removed from the data set either because data collection is not sufficient to
prevent the producing an unacceptable answer..or if you chose to remove the
outliers to improve the distribution then this raises the question of what is
the distribution and extent of other unknown outliers and really it should fundamentally
test your conceptual model and the data collection. I take your point Kevin but
to me once you decide you have outliers for what ever reason then you need to really
scrutinize the underlying SI and probably re zone and redefine the problem.
As regards the mechanics.. then I think
the you are right there is an issue with choosing normal or log normal but I
would argue that the user should be aware that each set of data could have a
different distribution and the world is not normal or log normal… (these
are just statistical conveniences) you should realize that failure to be normal
doesn’t mean it is log normal…. However I think this is getting
overly involved all we are doing at this stage is estimating the distribution
if you want a really correct answer get expert help.. I would also suggest that
paolo has done what he can to make the calculator fail safe by giving the
chebychev test… chebychev probably isn’t right it is just
more conservative and less wrong… and when you operating at the upper 95th/90th
the distribution issues are probably less worrisome than in part IIA tests especially
when the distributions can be heavily skewed
Hopefully a usefull response.
Rob Ivens
Scientific Officer
01306 879232
From: Contaminated
Land Management Discussion List
[mailto:[log in to unmask]] On Behalf Of Kevin Privett
Sent: 23 July 2008 12:28
To:
[log in to unmask]
Subject: CIEH stats - outlier test
Can anyone offer clarification of the outlier test?
Appendix B (3.) of the CIEH guidance states the
Grubb’s Test assumes the other data values in a dataset, except for the
suspect observation, are normally distributed. It tells you to check the
normality of the remaining dataset using the method in Appendix C.
Appendix B (4.) states that if the (remaining)
dataset is non-normal, consider using a log transform and check if this is
normal.
However, when you use the Statistics Calculator
spreadsheet it appears to do something different.
On the outlier test sheet you have a choice of
drop-down “use normal distribution to check for outliers” or
“use log-normal distribution to check for outliers”.
This appears to be checking for outliers based on the
distribution of the whole dataset, ie including any suspect values, not on the
dataset once outliers are removed.
Here is an example: take the following dataset and
assume the critical value (SGV) is 20:-
14 |
9 |
13 |
19 |
14 |
14 |
14 |
11 |
18 |
18 |
28 |
38 |
If you follow the instructions for the stats
calculator the summary page tells you this is a non-normal dataset, so you
choose “log-normal to check for outliers” and it says there are no
outliers. As it is non-normal, the Chebychev Test is used to calculate
the UCL (=27.6) which exceeds the critical value. Lets say that means an
exceedance of SGV so remediation is required for planning purposes. [Conclusion
1]
However, if you follow Appendix B you have to determine
if the dataset is normal once suspect values have been removed. The only
way I can see to do this, apart from just visual assessment, is to choose
“normal distribution to check for outliers” in the
calculator. This procedure indicates 28 and 38 in this dataset are
outliers (something that you might suspect simply by looking at the values
without having to use the calculator).
Now here is the interesting bit. If you remove
these two outliers (lets assume they represent part of another dataset) the
remaining data are normally distributed. According to my reading of
Appendix B (3.) this means that the Grubbs Test for outliers is appropriate (ie
the use of the calculator’s “normal distribution”
method which removes 28 and 38 is justified). It also means that the
one-sample t-test is used to calculate the UCL (16.2) which is less than the
critical value. Lets say this means the bulk of the site does not exceed
the SGV but there are potentially 2 hotspots of 28 & 38 requiring remediation.
[Conclusion 2]
So, if you follow the calculator instructions you
arrive at Conclusion 1 but if you follow Appendix B of the guidance you arrive
at Conclusion 2.
Am I doing something stupid, have I missed something,
or is there an inconsistency between the approaches in the guidance and the
calculator?
Feedback would be welcomed.
Regards,
Kevin Privett.
Dr Kevin Privett
Geo-Environmental Associate
Hydrock
Consultants Ltd
Over Court Barns
Over Lane
Almondsbury
BS32 4DF
Tel: (01454) 619533
Fax: (01454) 614125
Cell phone: (07799) 430870
Offices in
Disclaimer
The information in this e-mail is confidential
and may be read, copied or used only by the intended recipients. If you are not
the intended recipient you are hereby notified that any perusal, use,
distribution, copying or disclosure is strictly prohibited. If you have
received this e-mail in error please advise us immediately by return e-mail at [log in to unmask]">[log in to unmask] and delete the e-mail document without making a copy. Whilst
every effort has been made to ensure this email is virus free, no
responsibility is accepted for loss or damage arising from viruses or changes
made to this message after it was sent.
________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________