Print

Print


Kevin thx for the chat.... My answere for what its worth.

 

My understanding is.... That you should consider any exceedance > Cc
within the strictures of your conceptual model.

And you should consider what the data tells you about the underlying
distribution.....using the various tests.

If the level of data, the uncertainty and the conceptual model is
acceptable then you can accept the results of the tests and accept the
exceedance. However if you have outliers then I think two answers are
available.... 

 

Given sufficient data collection and for the risks and uncertainties-
you can chose to keep the outliers in and demonstrate that the
contamination is not at such levels or extent that it jeopardizes the
"clean" status of the site.- hence proving the distribution is
substantially and significantly below GAC/Cc. ie @ 95%

 

However, if outliers are identified and removed from the data set either
because data collection is not sufficient to prevent the producing an
unacceptable answer..or if you chose to remove the outliers to improve
the distribution then this raises the question of what is the
distribution and extent of other unknown outliers and really it should
fundamentally test your conceptual model and the data collection. I take
your point Kevin but to me once you decide you have outliers for what
ever reason then you need to really scrutinize the underlying SI and
probably re zone and redefine the problem.

 

As regards the mechanics.. then I think the you are right there is an
issue with choosing normal or log normal but I would argue that the user
should be aware that each set of data could have a different
distribution and the world is not normal or log normal... (these are
just statistical conveniences) you should realize that failure to be
normal doesn't mean it is log normal.... However I think this is getting
overly involved all we are doing at this stage is estimating the
distribution if you want a really correct answer get expert help.. I
would also suggest that paolo has done what he can  to make the
calculator fail safe by giving the chebychev test... chebychev  probably
isn't right it is just more conservative and less wrong... and when you
operating at the upper 95th/90th the distribution issues are probably
less worrisome than in part IIA tests especially when the distributions
can be heavily skewed

 

Hopefully a usefull response.

 

 

Rob Ivens

Scientific Officer

01306 879232

 

________________________________

From: Contaminated Land Management Discussion List
[mailto:[log in to unmask]] On Behalf Of Kevin
Privett
Sent: 23 July 2008 12:28
To: [log in to unmask]
Subject: CIEH stats - outlier test

 

Can anyone offer clarification of the outlier test?

 

Appendix B (3.) of the CIEH guidance states the Grubb's Test assumes the
other data values in a dataset, except for the suspect observation, are
normally distributed.  It tells you to check the normality of the
remaining dataset using the method in Appendix C.

 

Appendix B (4.) states that if the (remaining) dataset is non-normal,
consider using a log transform and check if this is normal.

 

However, when you use the Statistics Calculator spreadsheet it appears
to do something different.

 

On the outlier test sheet you have a choice of drop-down "use normal
distribution to check for outliers" or "use log-normal distribution to
check for outliers".  

 

This appears to be checking for outliers based on the distribution of
the whole dataset, ie including any suspect values, not on the dataset
once outliers are removed.

 

Here is an example: take the following dataset and assume the critical
value (SGV) is 20:- 

14

9

13

19

14

14

14

11

18

18

28

38

 

If you follow the instructions for the stats calculator the summary page
tells you this is a non-normal dataset, so you choose "log-normal to
check for outliers" and it says there are no outliers.  As it is
non-normal, the Chebychev Test is used to calculate the UCL (=27.6)
which exceeds the critical value.  Lets say that means an exceedance of
SGV so remediation is required for planning purposes. [Conclusion 1]

 

However, if you follow Appendix B you have to determine if the dataset
is normal once suspect values have been removed.  The only way I can see
to do this, apart from just visual assessment, is to choose "normal
distribution to check for outliers" in the calculator.  This procedure
indicates 28 and 38 in this dataset are outliers (something that you
might suspect simply by looking at the values without having to use the
calculator).  

 

Now here is the interesting bit.  If you remove these two outliers (lets
assume they represent part of another dataset) the remaining data are
normally distributed.  According to my reading of Appendix B (3.) this
means that the Grubbs Test for outliers is appropriate (ie the use of
the calculator's  "normal distribution" method which removes 28 and 38
is justified).  It also means that the one-sample t-test is used to
calculate the UCL (16.2) which is less than the critical value.  Lets
say this means the bulk of the site does not exceed the SGV but there
are potentially 2 hotspots of 28 & 38 requiring remediation. [Conclusion
2]

 

So, if you follow the calculator instructions you arrive at Conclusion 1
but if you follow Appendix B of the guidance you arrive at Conclusion 2.


 

Am I doing something stupid, have I missed something, or is there an
inconsistency between the approaches in the guidance and the calculator?


 

Feedback would be welcomed.

 

 

 

Regards,

Kevin Privett.

 

Dr Kevin Privett

Geo-Environmental Associate

 

Hydrock Consultants Ltd

Over Court Barns

Over Lane

Almondsbury

Bristol

BS32 4DF

 

Tel: (01454) 619533

Fax: (01454) 614125

[log in to unmask] <mailto:[log in to unmask]> 

Cell phone: (07799) 430870

 

Offices in Bristol, Plymouth, Northampton, Stoke-on-Trent.
www.hydrock.com

 

Disclaimer


The information in this e-mail is confidential and may be read, copied
or used only by the intended recipients. If you are not the intended
recipient you are hereby notified that any perusal, use, distribution,
copying or disclosure is strictly prohibited.  If you have received this
e-mail in error please advise us immediately by return e-mail at
[log in to unmask] and delete the e-mail document without making a
copy. Whilst every effort has been made to ensure this email is virus
free, no responsibility is accepted for loss or damage arising from
viruses or changes made to this message after it was sent.

 


________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

Scanned by MailDefender - managed email security from intY -
www.maildefender.net

________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________