Hi.…
A few days back I posted about an approach which
I thought would be an improved way of determining the
CI for non-normally distributed random variable, but
after I ran a few simulations for verification it
turns out that I was overly optimistic. The results
showed that using the normal based CI formula and the
one I proposed gave almost the same CI. So, indeed it
is true that the normal based CI formula is robust. I
tried to understand this result and my conclusion is
that robustness did not result only from the standard
Central Limit Theorem (the sum of i.i.d samples tends
to normal as the number of samples approach infinity)
but also from a simple variant which I state below
(See Jacod & Protter, Probability Essentials, Theorem
21.2):
Let X_1,X_2,..,X_N be independent but NOT necessarily
identically distributed . Let EX_i = 0 all i and let
Var(X_i) = o_i^2. Under the assumption that sup o_i^2
< inf and sum(o_i^2) = inf we have lim n-> inf
{sum(X_i)/sqrt(sum(o_i^2)) = Z where Z ~ N(0,1)
(convergence is in the sense of distribution).
I’ll explain how this theorem can be used to argue the
robustness of the normal-based CI formula (hope most
people don’t mind if I share this):
Let N denote sample size. If the X_1,X_2,...,X_N are
i.i.d N(mu,o^2) then the following properties hold
and are crucial in deriving the CI formula:
1. Let X_bar denote sample average (=sum(X_i)/N). Then
X_bar and (X_i-X_bar) are independent for all i.
2. Let S_N^2=sum((X_i-X_bar)^2)/(N-1). Then as a
consequence of (1) X_bar and S_N^2 are independent.
3. (N-1)*S_N^2/o^2 is chi-square with N-1 degrees of
freedom.
4. sqrt(N)*(X_bar-mu)/S_N follow a t-distribution with
N-1 degrees of freedom.
In the case that X_1,X_2,...,X_N are not i.i.d but not
normal the 4 properties above would still hold approximately
if we have that (X_i-X_bar) is very close to normal
for all i since we already know that X_bar is close to
normal for large samples (standard CLT). It is clear
that (X_i-X_bar) is not the sum of i.i.d samples but
it still tends to normal for large sample sizes by the
version of the Central Limit Theorem stated above.
Cheers,
Hendra.
|