Print

Print


May I first of all point out that I am not a statistician so I am sorry if my
question is not in the usual notation or has an obvious answer.

I am running many millions of simulations and want to calculate the mean and
variance of the population after each of the separate trials. It is not
practical to store all the results and then calculate the mean and standard
deviation from the entire population. I have derived a formula to calculate the
mean after iteration N+1, Xbar[N+1], from the new value X[N+1] and the mean
after the previous iteration Xbar[N]:

Xbar[N+1]=(N*Xbar[N]+X[N+1])/(N+1)

I believe this is called the running mean.

I have been trying to solve the same problem for the variance of the
population, that is to find a formula for the running variance. I have
attempted to derive it myself, and have been to the medical libary to consult
the statistics books but to no avail.
Is there a method for obtaining the variance at point N+1 given that I know:

X[N+1];
Xbar[N];
Xbar[N+1];
Xvar[N];

I have thought about storing the sum of squares etc but this does not seem very
elegant and may not be computationally stable (rounding error).

I would be grateful if someone could provide me with an answer to
this question even if it is that it has no solution.

Trevor Carpenter