I recently asked a question on the list to know which is the best estimator
of the mean of a log-normal distribution. Here are the numerous answers.
Thank you very much everybody.
Jean-Michel Lemieux
-----------------------------------------------------------------------------
Typically, if the distribution of the data is non-normal the median is the
best estimate of the "center" of the distribution, the next best is the
mode, then lastly the mean - since the mean is strongly influenced by
outliers in the data.
Jason Bruenning
Process Analyst
Plexus/EAC
Phone (920) 751-3219
Fax (920) 720-6701
Mailto:jason.bruennin
------------------------------------------------------------------------------
Hi
I'm not sure if this is what you're after. Apologies if any of this is
familiar to you.
You can calculate the mean of the transformed data then there are
expressions for transforming the mean and variance from the log scale
back to the arithmetic scale.
I'm sending over an extract of a paper I'm working on which involves
this in an attachment plus references. It's a Word 97 file.
Hope this is of some use to you.
Regards
Michelle
-------------------------------------------------------------------------------
The mle is
exp(mu+var/2)
Jim
-------------------------------------------------------------------------------
If the data are truly log normal then the best estimate of the mean is the
mean on the log scale antilogged. This is known as the geometric mean, and
should be very similar to the median.
Tim Cole
[log in to unmask] Phone +44(0)20 7905 2666 Fax +44(0)20 7242 2723
Epidemiology & Public Health, Institute of Child Health, London WC1N 1EH, UK
------------------------------------------------------------------------------
Hi Lemieux,
I think, if the data is lognormal, that means it would be normally
distributed after a log transformation. So apply a log transformation and
obtain a normally distributed responses, and the best estimator of the mean
in that case (even the MLE) is the arithmetic mean of this transformed
responses. Then transforme back
So to conclude, I would suggest the anti-log of the arithmetic mean of the
log transformed responses as the best estimator of the mean.
Maurille FEUDJO
PhD student
Medical Statistics Unit
LSHTM
-------------------------------------------------------------------------------
The geometric mean (re-transformed mean of logged values) is the same as
the median when the data are continuous and truly lognormal. In cases
where the data are nearly lognormal it is a good statistic to report.
Suspicion of it is declining. Especially when you explain about the
closeness to the median.
_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
_/_/_/ _/_/ _/_/_/ _/ Ronan M Conroy ([log in to unmask])
_/ _/ _/ _/ _/ _/ Lecturer in Biostatistics
_/_/_/ _/ _/_/_/ _/ Royal College of Surgeons
_/ _/ _/ _/ _/ Dublin 2, Ireland
_/ _/ _/_/ _/_/_/ _/ +353 1 402 2431 (fax 2329)
http://www.rcsi.ie
_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
I'm not an outlier; I just haven't found my distribution yet
-------------------------------------------------------------------------------
The max likelihood estimator e(M+1/2SIG2) has smaller asymptotic
varianace.
Basilio
-------------------------------------------------------------------------------
Suppose X is log-normal
==> log X is normal with mean M and variance V
then EX = E exp(log X) = exp (M + V/2)
M and V can be estimated from the mean and
variance of log X.
Be warned the formula above is very sensitive
to the assumption. i.e., if the log-normal model
is wrong, the correct mean can be very far from
the formula. Experiment a little bit.
The (arithmetic) average of X-data is not
sensitive to the assumption (i.e., always
unbiased), but not as efficient if the data
are truly log-normal. If you have a lot of
data, it is much safer to use the arithmetic
mean.
-Yudi-
-------------------------------------------------------------------------------
It depends which mean, and what you mean by best. Usually, log-normal
distributions are summarized by the geometric mean (which is the antilog of
the mean log), not by the arithmetic mean. This is because the median of the
log-normal distribution is its geometric mean. The maximum likelihood
estimator of the population geometric mean is the sample geometric mean. If
you have a confidence interval for the mean log, then you can use antilogs
to derive a confidence interval for the geometric mean. Likewise, if you
have a confidence interval for the difference between two mean logs, then
you can use antilogs to derive a confidence interval for the ratio of the
geometric means.
If you want the arithmetic mean of the log-normal distribution, then it is
equal to
exp( mu + (1/2)sigma^2 )
where mu and sigma are the mean and standard deviation of the logs.
Therefore, the maximum likelihood estimator of the arithmetic mean of a
log-normal distribution is derived by inserting the sample mean and standard
deviation of the logs into the above formula.
All logs, in this context, are natural logs.
I hope this is helpful.
Regards
Roger
--
Roger Newson
Lecturer in Medical Statistics
Department of Public Health Sciences
Guy's, King's and St Thomas' School of Medicine
5th Floor, Capital House
Guy's Hospital
42 Weston Street
London SE1 3QD
United Kingdom
Tel: 020 7848 6648 International +44 20 7848 6648
Fax: 020 7848 6620 International +44 20 7848 6620
or 020 7848 6605 International +44 20 7848 6605
Email: [log in to unmask]
-------------------------------------------------------------------------------
Dear Jean
First of all you must decide under what criteria you are going to choose
your estimate and How about the loss function?
If you are looking for best Unbiased estimator under Squared Error loss
the answer may be totally different if you are looking for best minimum
equivariant one.
See Point Estimation of Lehmann,if you are a graduate student!
Regards
*******************************************************************************
Ahmad Parsian Phone:+98 +31 891 3007(Home)
School of Mathematical Sciences +98 +31 891 3607(Office)
Isfahan University of Technology Fax :+98 +31 891 2602
Isfahan, 84156
Iran
*******************************************************************************
--------------------------------------------------------------------------------
Jean-Michel Lemieux wrote:
> Hello,
> I have a log normal distributin and i would like to know which
> is
> the best estimator of the mean. Is it the arithmetic mean (i don't
> think
> so),
Arithmetic average is always the best estimate of a population mean.
> the median or the mean when the datas are transform in log?
>
> Thank you very much
>
> Jean-Michel
(a) Define what you mean by 'mean.'
(b) Go with it.
In practice, we use the average as a predictor (expectation value, etc.)
of the population mean. Sometimes we really intend to say mode when we
use the word, mean. then average won't work for log-normal.
As I recall, the variance is linked in some strange way to the mean, in
a log-normal distribution. Thus, changes in variance move the
arithmetic mean around. Bad scene for predictive discussions :(
I found that the best thing to do for log-normal distributions, where I
was most interested in discussing predicted/expected values, was to do
the transform, work out the details, then back transform to a scale
familiar to the audience. And include lots of plots. This is the
equivalent of using a geometric average for 'mean.'
Does this help any?
Jay
--
Jay Warner
Principal Scientist
Warner Consulting, Inc.
4444 North Green Bay Road
Racine, WI 53404-1216
USA
Ph: (262) 634-9100
FAX: (262) 681-1133
email: [log in to unmask]
web: http://www.a2q.com
The A2Q Method (tm). What do you want to improve today?
--------------------------------------------------------------------------------
The MVUE estimator of the population mean of the raw data is a function
of both
the sample mean of the logs and the sample variance of the logs.
The classic paper is:
Finney, D. J., 1941, "On the distribution of a variate whose logarithm
is
normally distributed". J. R. Stat. Soc. Suppl., 7, 155-161
In a regression context see:
Bradu, D. & Y. Mundlak, 1970."Estimation in lognormal linear models",
JASA, 65(320), 198-211
For applications in environmental sciences:
Gilroy, E. J., et al, 1990, "Mean square error of regression-based
constituent transport estimates", WRR, 26(9), 2069-2077
(Other references can be found in this paper also.)
We discuss these and other interesting applications in our short course,
details of which can be found at:
http://www.practicalstats.com/
--------------------------------------------------------------------------------
Jean-Michel,
I have been thinking about your question for some time.
If the data are log-normally distributed, and you wish to give some summary
statistics for the distribution, I would usually prefer to quote the median
and quartiles to give a summary of the shape of the distribution.
If the analysis e.g. comparison of treatments was based on comparing the
means of the log-transformed values then I will quote the geometric mean
rather than the median.
There are circumstances, however, where you need to estimate the arithmetic
mean. In this case, what is the best estimator of the mean of a log-normal
distribution if you have a sample of n values?
Suppose y ~ N(mu, sigma**2) and z = exp(y).
I do not know what is the 'best' but two obvious estimators are:
(i) the arithmetic mean of the sample: Sum(z)/n
and
(ii) the maximum likelihood estimate exp(muhat + 0.5*sigmahat**2)
where muhat and sigmahat are the maximum likelihood estimates of the mean and
std of the log-transformed values.
muhat = Sum(y)/n
sigmahat**2 = Sum((y-muhat)**2)/n
Which of these is best?
I have found a formula for the mean-square-error of these two estimators.
The ml estimator has smaller MSE for sufficiently large n
The MSE of the ll estimator is infinite if n < 2*sigma**2
The ml estimator is slightly biassed for finite n.
If the usual 'unbiassed' estimate of sigma**2 (ie Sum((y-muhat)**2)/(n-1) is
used then the bias is worse.
This suggests some more theoretical questions that the allstatters may be
able to shed some light on:
(i) Is there a 'best' estimator for the arithmetic mean
(ii) Are maximum likelihood estimators always atleast as good as other
estimators for suffciently large n
(iii) How large does n have to be?
Best wishes
Tim Auton
--
T R Auton PhD MSc C.Math
Head of Biomedical Statistics
Proteus Molecular Design Ltd
Beechfield House
Lyme Green Business Park
Macclesfield
Cheshire SK11 0JL
UK
email: [log in to unmask]
--------------------------------------------------------------------------------
The MVUE estimator of the population mean of the raw data is a function
of both the sample mean of the logs and the sample variance of the
logs.
The classic paper is:
Finney, D. J., 1941, "On the distribution of a variate whose logarithm
is normally distributed". J. R. Stat. Soc. Suppl., 7, 155-161
In a regression context see:
Bradu, D. & Y. Mundlak, 1970."Estimation in lognormal linear models",
JASA, 65(320), 198-211
For applications in environmental sciences:
Gilroy, E. J., et al, 1990, "Mean square error of regression-based
constituent transport estimates", WRR, 26(9), 2069-2077
(Other references can be found in this paper also.)
We discuss these and other interesting applications in our short course,
details of which can be found at:
http://www.practicalstats.com/
__________________________________________________________________
Jean-Michel Lemieux
[log in to unmask]
* * * * * *
Département de géologie et génie géologique
Universite Laval
Québec, Canada
G1K 7P4
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|