Hi Everyone
I need help for calculating some kind of standard deviation.
Problem:
Suppose we have arrays like the following, and we'd like to express
the variance in the data as indicated
a1 = (6 5 23 ) -> large variance
a2 = (8e-6 5e-5 4e-23) -> large variance
a3 = (8e-216 5e-44 4e-23) -> large variance (even though all numbers are almost zero !)
a4 = (6 5 13 ) -> medium variance
a5 = (8e-6 5e-5 4e-12) -> medium variance
a6 = (8e-216 5e-144 4e-123) -> medium variance
a7 = (6 5 3 ) -> small variance
a8 = (8e-6 5e-5 4e-6 ) -> small variance
a9 = (8e-216 5e-244 4e-223) -> small variance
The point is that the variance should be a large value
no matter whether the actual values or their logarithms
are "different". However, if we use the data as is,
v(a3) will be almost zero, and the same holds for its square
root, the standard deviation. On the other hand, if we
first do a logarithmic transformation
a_i = ( log(a_i1) log(a_i2) log(a_i3) )
and calculate the standard deviation, it is much too large
in case of a9. Is there a "middle way"?
Down here are our calculations for standard deviations of Evalues
(logged and the normal ones), from the data above:
ID a_i1 a_i2 a_i3 MEANe MEANle STDRRe STDRRle STDEVe STDEVle
a1 6 5 23 1e+01 1 2e+02 1 1 1
a2 8e-6 5e-5 4e-23 2e-05 -3e+01 1e-09 1e+03 2e-05 1
a3 8e-216 5e-44 4e-23 1e-23 -3e+02 1e-45 1e+05 0 1
a4 6 5 13 8 1 4e+01 0.4 1 0.9
a5 8e-6 5e-5 4e-12 2e-05 -2e+01 1e-09 1e+02 2e-05 1
a6 8e-216 5e-144 4e-123 1e-123 -4e+02 1e-245 3e+04 0 1
a7 6 5 3 5 0.7 5 0.3 0.7 0.9
a8 8e-6 5e-5 4e-6 2e-05 -1e+01 1e-09 0.3 2e-05 0.9
a9 8e-216 5e-244 4e-123 1e-123 -4e+02 1e-245 3e+04 0 1
Note:
STDRR = Standard Error, STDEV = Standard Deviation
e = Normal Evalues, le =Logged Evalues "
Intikhab Alam
PhD student
International NRW Graduate School in Bioinformatics and Genome Research
University of Bielefeld
Germany
[log in to unmask]
|