OK, sorry for the delay. Let's go thru the Pearson Algorithm.
THE PEARSON ALGORITHM
==================
Holmes and Haggett developed an algorithm to decide, when given an array of numbers, which of those numbers were significant. That algorithm looks like this:
* Take your original array with "N" numbers
* Sort it into descending order. Call that your "original array"
* Generate "N" arrays. The first generated array has the total in the first number and zero in the rest. The second generated array has the total spread evenly over the first two numbers and zero in the rest. The third generated array has the total spread evenly over the first three numbers and zero in the rest. The fourth generated array has the total spread evenly over the first four numbers and zero in the rest. And so on...
* When you have generated your arrays, select a "ruler". The "ruler" is a statistic used to measure the goodness-of-fit between two arrays
* Use your "ruler" to measure the goodness-of-fit between your original array and each of your generated arrays.
* Plot those goodness-of-fits on a graph
* Select the generated array that has the maximum goodness-of-fit to your original array
* The number of non-zero numbers in that selected generated array is the number of significant numbers in your original array.
So if your second generated array (which has two non-zero numbers, remember) has the maximum goodness-of-fit to your original array, then your two biggest numbers are significant and the rest are not. There are a few knuckles if you have negative numbers, but they are easily dealt with.
In our case, we used the sample Pearson product-moment correlation coefficient (http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient) as our ruler. We could have used others. We called it the "Pearson algorithm" because the phrase "using the sample Pearson product-moment correlation coefficient with the algorithm developed by Holmes and Haggett" is too much of a mouthful.
The original Holmes and Haggett paper (http://onlinelibrary.wiley.com/doi/10.1111/j.1538-4632.1977.tb00591.x/pdf
) has an example. It takes the array [89, 46, 23, 36, 19, 131, 29, 15, 22, 12] and selects the array with two non-zero values, since that has the highest goodness-of-fit of 0.866 (see Figure 1 in the paper).
We reproduced that goodness-of-fit calculation to get 0.866 and the figures are given in the methodology paper. Later in the week, I'll post the calculations on RADSTATS. The maths isn't hard but the calc is lengthy, so please bear with me.
******************************************************
Please note that if you press the 'Reply' button your
message will go only to the sender of this message.
If you want to reply to the whole list, use your mailer's
'Reply-to-All' button to send your message automatically
to [log in to unmask]
Disclaimer: The messages sent to this list are the views of the sender and cannot be assumed to be representative of the range of views held by subscribers to the Radical Statistics Group. To find out more about Radical Statistics and its aims and activities and read current and past issues of our newsletter you are invited to visit our web site www.radstats.org.uk.
*******************************************************
|