Print

Print


Kevin thx for the chat…. My answere for what its worth.

 

My understanding is…. That you should consider any exceedance > Cc within the strictures of your conceptual model.

And you should consider what the data tells you about the underlying distribution…..using the various tests.

If the level of data, the uncertainty and the conceptual model is acceptable then you can accept the results of the tests and accept the exceedance. However if you have outliers then I think two answers are available….

 

Given sufficient data collection and for the risks and uncertainties- you can chose to keep the outliers in and demonstrate that the contamination is not at such levels or extent that it jeopardizes the “clean” status of the site.- hence proving the distribution is substantially and significantly below GAC/Cc. ie @ 95%

 

However, if outliers are identified and removed from the data set either because data collection is not sufficient to prevent the producing an unacceptable answer..or if you chose to remove the outliers to improve the distribution then this raises the question of what is the distribution and extent of other unknown outliers and really it should fundamentally test your conceptual model and the data collection. I take your point Kevin but to me once you decide you have outliers for what ever reason then you need to really scrutinize the underlying SI and probably re zone and redefine the problem.

 

As regards the mechanics.. then I think the you are right there is an issue with choosing normal or log normal but I would argue that the user should be aware that each set of data could have a different distribution and the world is not normal or log normal… (these are just statistical conveniences) you should realize that failure to be normal doesn’t mean it is log normal…. However I think this is getting overly involved all we are doing at this stage is estimating the distribution if you want a really correct answer get expert help.. I would also suggest that paolo has done what he can  to make the calculator fail safe by giving the chebychev test… chebychev  probably isn’t right it is just more conservative and less wrong… and when you operating at the upper 95th/90th the distribution issues are probably less worrisome than in part IIA tests especially when the distributions can be heavily skewed

 

Hopefully a usefull response.

 

 

Rob Ivens

Scientific Officer

01306 879232

 


From: Contaminated Land Management Discussion List [mailto:[log in to unmask]] On Behalf Of Kevin Privett
Sent: 23 July 2008 12:28
To: [log in to unmask]
Subject: CIEH stats - outlier test

 

Can anyone offer clarification of the outlier test?

 

Appendix B (3.) of the CIEH guidance states the Grubb’s Test assumes the other data values in a dataset, except for the suspect observation, are normally distributed.  It tells you to check the normality of the remaining dataset using the method in Appendix C.

 

Appendix B (4.) states that if the (remaining) dataset is non-normal, consider using a log transform and check if this is normal.

 

However, when you use the Statistics Calculator spreadsheet it appears to do something different.

 

On the outlier test sheet you have a choice of drop-down “use normal distribution to check for outliers” or “use log-normal distribution to check for outliers”. 

 

This appears to be checking for outliers based on the distribution of the whole dataset, ie including any suspect values, not on the dataset once outliers are removed.

 

Here is an example: take the following dataset and assume the critical value (SGV) is 20:-

14

9

13

19

14

14

14

11

18

18

28

38

 

If you follow the instructions for the stats calculator the summary page tells you this is a non-normal dataset, so you choose “log-normal to check for outliers” and it says there are no outliers.  As it is non-normal, the Chebychev Test is used to calculate the UCL (=27.6) which exceeds the critical value.  Lets say that means an exceedance of SGV so remediation is required for planning purposes. [Conclusion 1]

 

However, if you follow Appendix B you have to determine if the dataset is normal once suspect values have been removed.  The only way I can see to do this, apart from just visual assessment, is to choose “normal distribution to check for outliers” in the calculator.  This procedure indicates 28 and 38 in this dataset are outliers (something that you might suspect simply by looking at the values without having to use the calculator). 

 

Now here is the interesting bit.  If you remove these two outliers (lets assume they represent part of another dataset) the remaining data are normally distributed.  According to my reading of Appendix B (3.) this means that the Grubbs Test for outliers is appropriate (ie the use of the calculator’s  “normal distribution” method which removes 28 and 38 is justified).  It also means that the one-sample t-test is used to calculate the UCL (16.2) which is less than the critical value.  Lets say this means the bulk of the site does not exceed the SGV but there are potentially 2 hotspots of 28 & 38 requiring remediation. [Conclusion 2]

 

So, if you follow the calculator instructions you arrive at Conclusion 1 but if you follow Appendix B of the guidance you arrive at Conclusion 2. 

 

Am I doing something stupid, have I missed something, or is there an inconsistency between the approaches in the guidance and the calculator? 

 

Feedback would be welcomed.

 

 

 

Regards,

Kevin Privett.

 

Dr Kevin Privett

Geo-Environmental Associate

 

Hydrock Consultants Ltd

Over Court Barns

Over Lane

Almondsbury

Bristol

BS32 4DF

 

Tel: (01454) 619533

Fax: (01454) 614125

[log in to unmask]

Cell phone: (07799) 430870

 

Offices in Bristol, Plymouth, Northampton, Stoke-on-Trent.  www.hydrock.com

 

Disclaimer


The information in this e-mail is confidential and may be read, copied or used only by the intended recipients. If you are not the intended recipient you are hereby notified that any perusal, use, distribution, copying or disclosure is strictly prohibited.  If you have received this e-mail in error please advise us immediately by return e-mail at [log in to unmask]">[log in to unmask] and delete the e-mail document without making a copy. Whilst every effort has been made to ensure this email is virus free, no responsibility is accepted for loss or damage arising from viruses or changes made to this message after it was sent.

 


________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________


________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________
Scanned by MailDefender - managed email security from intY - www.maildefender.net