Whilst this email is intended for the attention of Mr Kallner, I have posted it
here to the mailing list as someone else may also be able to answer my
question.
In recent discussions here on the mailing list, Mr Kallner provided references
to documents hosted on the ACB website. I refer specifically to: "Measurement
Verification in the Clinical Laboratory" (Khatami et al, dated "090605"), in which
reference is made to an excel spreadsheet created my Mr Kallner, and also to
a second document: "Statistical Background Information: Estimating
Imprecision Using ANOVA. Spreadsheet A, version 6.81", also written by Mr
Kallner.
I have read through these documents with interest, and can see the benefit of
using ANOVA as a means of estimating imprecision, across or within
instruments. The "Statistical Background Information" document I found to be
particularly useful.
I quickly put the ANOVA algorithm (as described by Mr Kallner) into computer
code for future use. However, when testing the code I came across an
anomaly which I hope Mr Kallner (or the mailing list) may be able to resolve.
The function code (attached, if anyone is interested) returns identical results
to that of the excel spreadsheet by Mr Kallner, so I assume my interpretation
is somewhat accurate. The problem I am having is when I send groups of pure
Gaussian data to the function sampled from a single population with consistent
mean and variance, the Total SD calculated from ANOVA is approximately 90%
of what it should be. A straight forward standard deviation calculation of the
combined data set (using N-1 df) reveals the SD as it should be.
So my question is, assuming my code is correct (I am open to the fact it may
not be*), does this mean that ANOVA implicitly relies on differences in the
group data sets to calculate an accurate Total Variance?
The results of testing my code are:
--ORIGINAL POPULATION--
mean = 100
SD = 1
--SAMPLE--
N = 25 (5 groups of 5 data points)
---ANOVA---
mean = 100.012
SD = 0.915
---STANDARD---
mean = 100.005
SD = 0.993
Note how the standard calculation for SD (with N-1 df) gives 0.993, whereas
the ANOVA calculation gives 0.915. These results are based on iterating the
calculation 100,000 times. I believe these should be close to identical.
Thank you!
Andy Minett
*one part of the algorithm I am slightly unsure about is in the purification of
the Mean Square Between variance. It is described as "dividing the difference
between MSbetween and MSwithin, by the average number of observations in
the series". How is the average number of observations in the series defined?
Is this the total observations divided by the number of groups?
------ACB discussion List Information--------
This is an open discussion list for the academic and clinical community working in clinical biochemistry.
Please note, archived messages are public and can be viewed via the internet. Views expressed are those of the individual and they are responsible for all message content.
ACB Web Site
http://www.acb.org.uk
Green Laboratories Work
http://www.laboratorymedicine.nhs.uk
List Archives
http://www.jiscmail.ac.uk/lists/ACB-CLIN-CHEM-GEN.html
List Instructions (How to leave etc.)
http://www.jiscmail.ac.uk/
# Code written in AutoIT v3.3.4.0
#include <Array.au3>
;----- from data points -----
Global $a1[6] = ["",2.50,2.50,2.48,2.52,2.53]
Global $a2[6] = ["",2.61,2.62,2.64,2.58,2.60]
Global $a3[6] = ["",2.51,2.53,2.56,2.54,2.52]
Global $a4[6] = ["",2.48,2.49,2.54,2.51,2.51]
Global $a5[6] = ["",2.49,2.51,2.53,2.48,2.52]
Global $a[6] = ["",$a1,$a2,$a3,$a4,$a5]
$a = _ANOVA($a,0)
_ArrayDisplay($a)
;----- from group stats -----
Global $b1[4] = ["",UBound($a1) - 1,_mean($a1),_SD($a1,True)]
Global $b2[4] = ["",UBound($a2) - 1,_mean($a2),_SD($a2,True)]
Global $b3[4] = ["",UBound($a3) - 1,_mean($a3),_SD($a3,True)]
Global $b4[4] = ["",UBound($a4) - 1,_mean($a4),_SD($a4,True)]
Global $b5[4] = ["",UBound($a5) - 1,_mean($a5),_SD($a5,True)]
Global $b[6] = ["",$b1,$b2,$b3,$b4,$b5]
$b = _ANOVA($b,1)
_ArrayDisplay($b)
Func _ANOVA(ByRef $aArray, $iType = 0)
;*************************************************************
; ANOVA : Analysis of Variance
;*************************************************************
; Arrays of data must be passed in a single array.
; Each nested array can be different sizes etc.
;
; e.g. $PassedArray[n] = ["",$aData1,$aData2...]
;
; $iType - 0 :actual data points passed in array
; - 1 :stats passed in array (each array must be ["",n,Mean,SD])
;----- Local Variables -----
Local $iGroup, $aData, $iGlobalMean, $iGlobalN, $iAverageGroupN
Local $iMeanSquareBetweenV, $iMeanSquareWithinV, $iIntermediateV
Local $iSum
;----- Collect Data -----
Local $aGroupData[UBound($aArray)][3] ;[group][n,mean,sd]
$aGroupData[0][0] = UBound($aArray) - 1 ;Total No. of Groups
Switch $iType
Case 0 ;Calculate n,Mean,SD for each group
For $iGroup = 1 to $aGroupData[0][0]
$aData = $aArray[$iGroup]
$aGroupData[$iGroup][0] = UBound($aData) - 1
$aGroupData[$iGroup][1] = _Mean($aData)
$aGroupData[$iGroup][2] = _SD($aData,True)
Next
Case 1 ;Put passed stats into array
For $iGroup = 1 to $aGroupData[0][0]
$aData = $aArray[$iGroup]
$aGroupData[$iGroup][0] = $aData[1]
$aGroupData[$iGroup][1] = $aData[2]
$aGroupData[$iGroup][2] = $aData[3]
Next
EndSwitch
;----- Mean Square Within (Var) -----
$iGlobalN = 0
$iGlobalMean = 0 ;for Mean Square Between
$iAverageGroupN = 0 ;for Intermediate
$iMeanSquareWithinV = 0
For $iGroup = 1 to $aGroupData[0][0]
$iMeanSquareWithinV += ($aGroupData[$iGroup][0] - 1) * ($aGroupData[$iGroup][2] ^ 2)
$iGlobalN += $aGroupData[$iGroup][0]
$iAverageGroupN += $aGroupData[$iGroup][0]
$iGlobalMean += $aGroupData[$iGroup][1]
Next
$iGlobalMean /= $aGroupData[0][0]
$iAverageGroupN /= $aGroupData[0][0]
$iMeanSquareWithinV = $iMeanSquareWithinV / ($iGlobalN - $aGroupData[0][0])
;----- Mean Square Between (Var) -----
$iSum = 0
For $iGroup = 1 to $aGroupData[0][0]
$iSum += $aGroupData[$iGroup][0] * ($aGroupData[$iGroup][1] - $iGlobalMean) ^ 2
Next
$iMeanSquareBetweenV = $iSum / ($aGroupData[0][0] - 1)
;----- Intermediate Variance (Var) -----
$iIntermediateV = ($iMeanSquareBetweenV - $iMeanSquareWithinV) / $iAverageGroupN
If $iIntermediateV < 0 Then $iIntermediateV = 0
;----- Total Variance (Var) -----
$iTotalVarianceV = $iMeanSquareWithinV + $iIntermediateV
;----- Return Data -----
Local $aReturn[8]
$aReturn[1] = $iGlobalN ;Total Observations
$aReturn[2] = $iGlobalMean ;Global Mean
$aReturn[3] = $iMeanSquareWithinV ^ 0.5 ;MSwithin (Repeatability) (SD)
$aReturn[4] = $iMeanSquareBetweenV ^ 0.5 ;MSbetween (SD)
$aReturn[5] = $iIntermediateV ^ 0.5 ;Intermediate Imprecision (SD)
$aReturn[6] = $iTotalVarianceV ^ 0.5 ;Total Imprecision (SD)
$aReturn[7] = ($aReturn[6] / $iGlobalMean) * 100;Total Imprecison (CV)
Return $aReturn
EndFunc
;**********************************************
; SD, Mean, Sum
;**********************************************
Func _SD(ByRef $aArr, $bSample = False, $iRound = -1)
;$bSample - if the data set represents a sampling of a population
;set this to true to reduce df to N - 1
Local $iU = _Mean($aArr), $iN = (UBound($aArr) - 1), $iE, $i
For $i = 1 to $iN
$iE += ($aArr[$i] - $iU) ^ 2
Next
If $bSample Then $iN -= 1
$iRet = Sqrt($iE / $iN)
If $iRound <> -1 Then $iRet = Round($iRet,$iRound)
Return $iRet
EndFunc
Func _Mean(ByRef $aArr, $iRound = -1)
$iRet = _Sum($aArr) / (UBound($aArr) - 1)
If $iRound <> -1 Then $iRet = Round($iRet,$iRound)
Return $iRet
EndFunc
Func _Sum(ByRef $aArr)
Local $iE, $i
For $i = 1 to (UBound($aArr) - 1)
$iE += $aArr[$i]
Next
Return $iE
EndFunc
------ACB discussion List Information--------
This is an open discussion list for the academic and clinical community working in clinical biochemistry.
Please note, archived messages are public and can be viewed via the internet. Views expressed are those of the individual and they are responsible for all message content.
ACB Web Site
http://www.acb.org.uk
Green Laboratories Work
http://www.laboratorymedicine.nhs.uk
List Archives
http://www.jiscmail.ac.uk/lists/ACB-CLIN-CHEM-GEN.html
List Instructions (How to leave etc.)
http://www.jiscmail.ac.uk/
|