Hello everyone,
I want to thank all people that responded to my query. I appreciate it very
much.
This was an original text of my query on Standard Deviation:
I have two variables with very different scales (one has very low values,
another very high values). I want to compare their variability using
standard deviation. Due to differences in scales I am getting numbers that
do not look comparable.
I want to standardize original variables to be between 0 and 1, recalculate
standard deviations and then make the comparison. Is this a reasonable
approach? Are there better ways to correct for the differences in scales in
my situation?
Here is a summary of responses:
1) If you are happy to assume that these two variances are the variances of
iid normal random variables then you can use an F-test to compare the
ratio of the MLEs of them. Under the null of equal variances this
statistic has an F-distribution with m-1 and n-1 degrees of freedom (where
the two samples are of size m and n).
2) Try to calculate the coefficient of variation :CV=S/Xbar*100
(stdev/average)
(what % of X bar is S).
example: data set 1: 5,6,7,9,23 (Xbar=10, S=7.416)
data set 2: 5000, 6000, 7000, 9000, 23000 (Xbar=10000,
S=7416)
but CV= 74.16 for both sets.
4) Depending on the model assumptions, you might want to take a look at the
measure
known as deviance. See McCullagh and Nelder (1989 ?), Generalized Linear
Models. Deviance is defined as twice the ratio of the log likelihood of the
full model which is the maximum achievable likelihood to the log likelihood
under the estimates. For the Gaussian distribution deviance is the square
of
the standardized error, and hence can easily be seen as a generalization of
the
usual standardized error.
5) If you want to compare variation about the mean on a point by point
basis, then
perhaps you would compare 'standardized' z values. this route would also
let
you compare distributions, because in the z dist, virtually all of them will
be
between + and - 3.
6) A very common method is to compare the coeffient of variations (CVs),
which is the SD/mean. However, my experience is that this will
automatically favor the larger variable, creating another problem,
albeit usually one reduced in scale. Another problem is that CV's
aren't as easily tested for heterogeneity as SD's.
Thanks again.
Regina Malina
Statistician
R&D, Business Intelligence
The Loyalty Group
* [log in to unmask]
* (416) 228-2945
|