> The KAPPA HISTAT application uses a data histogram to calculate the median
> of a data set. It says that it also calculates the mode but it actually
> seems to calculate a fudge factor and in my data is a bad value.
Fudge factor? It uses the Pearson 3*median-2*mean formula as
documented. People have suggested better variants of this for
astronomical data. Of course, the mode and mean may not necessarily
bracket the median, so merely tweaking the coefficients is only part of
the story.
> Is there a reason why it can't simply write out the peak of the
> histogram?
While superficially it may seem obvious to use the peak of the
histogram, there are questions of resolution for floating-point data
(continuous versus discrete distributions) as we found with the scourge
that is KPG1_FRACx. There is also the mater of multiple modes. The
histogram used to find percentiles is likely to be too fine to select
the mode you want, and indeed there may be many such equally but
sparsely populated bins. Which mode do you select?
I could introduce a new parameter that requests that the mode be
determined from the most populated histogram bins, but the output may be
multi-valued. HISTAT could then optionally create a coarser histogram
whose bin width or number of bins may be set by the user (given that the
mode is sensitive to the exact choice), and use the peak of that
histogram as the mode.
Another possibility is to fit function to a coarse histogram and
determine the peak. A parabola is often that function. ESP:HISTPEAK
already does this. It derives four modes with and without smoothing the
histogram, a fitted peak, and projected using chords.
So Tim what would you like me to do?
Malcolm
|