On Wed, 27 Jun 2007, Malcolm J. Currie wrote:
>> The KAPPA HISTAT application uses a data histogram to calculate the median
>> of a data set. It says that it also calculates the mode but it actually
>> seems to calculate a fudge factor and in my data is a bad value.
>
> Fudge factor? It uses the Pearson 3*median-2*mean formula as
:-) I did read the documentation. There was no reference as to where the
magic number came from.
>> Is there a reason why it can't simply write out the peak of the
>> histogram?
>
> While superficially it may seem obvious to use the peak of the
> histogram, there are questions of resolution for floating-point data
> (continuous versus discrete distributions) as we found with the scourge
> that is KPG1_FRACx. There is also the mater of multiple modes. The
> histogram used to find percentiles is likely to be too fine to select
> the mode you want, and indeed there may be many such equally but
> sparsely populated bins. Which mode do you select?
Example data file attached. Does a nice job in histogram with 64 bins.
The mean is miles out and the median is slightly lower than the peak.
>
> I could introduce a new parameter that requests that the mode be
> determined from the most populated histogram bins, but the output may be
> multi-valued. HISTAT could then optionally create a coarser histogram
> whose bin width or number of bins may be set by the user (given that the
> mode is sensitive to the exact choice), and use the peak of that
> histogram as the mode.
>
sounds good. Does it really need multiple MODEs? If the user is doing this
are they allowed to take the risk that it picks the wrong one? I suppose
it could return * if the peaks are about the same height.
> Another possibility is to fit function to a coarse histogram and
> determine the peak. A parabola is often that function. ESP:HISTPEAK
> already does this. It derives four modes with and without smoothing the
> histogram, a fitted peak, and projected using chords.
Didn't know about HISTPEAK. Tried it and it moaned at me saying that the
data didn't have enough spread. Presumably it chose too many bins.
[Aside: ESP HISTPEAK didn't understand the "xserve" device]
> So Tim what would you like me to do?
I'm happy for you to decide. If HISTPEAK can be made to work then that
would be fine. Is it a lot of work to tweak HISTAT or are there kaplibs
routines that can be used?
--
Tim Jenness
JAC software
http://www.jach.hawaii.edu/~timj
|