Print

Print


As promised, I'm summarising the responses to the query I
sent allstat on Monday.

The key part of my query was:

  Am I overlooking a statistically valid use of CV for
  data which may contain negative as well as positive values?
  If so, I'd be grateful to be told about it.  If not, can
  anybody point me to an authoritative statement limiting the
  use of CV?

I didn't hear of any such uses, and I'm particularly grateful to
have been sent a very useful link from UCLA about the validity
of CV as a measure of variability.  The URL

   http://www.ats.ucla.edu/stat/mult_pkg/faq/general/coefficient_of_variation.htm

contains the statement:

   "The CV of a variable or the CV of a prediction model for a
   variable can be considered as a reasonable measure if the
   variable contains only positive values.  This is a definite
   disadvantage of CVs."

That seems to me to get close to the "authoritative statement"
for which I was looking.

With two dissenters, responders agreed with my instinctive feeling
that CV should only be used for variables taking only positive
values.  There were specific arguments linked to the example I gave
(use of logged values).  For example:

     For something like Forced Expiratory Volume in one second (FEV1),
     which is often measured in mL but also in L you could argue that
     CV gives the same answer, whichever of the two you use. However
     if you log transform first that won't happen.  Show [the medic]
     the following case for FEV1 SD=400mL mean=4000mL or SD=0.4L mean =4L.
     The answer on the original scale is 0.1 but on the log scale is
     -0.66 or 0.7.

Several respondents mentioned the useful point that, for positive
variables like blood pressure, the CV is analogous to the SD of
the natural logs of the measurements.  This was in fact the
answer I had given to the medic with whom I was having the
"discussion".   (I slightly simplified the description when writing
my query: in fact, the function concerned was a weighted sum of
the logs of two hormone concentrations in blood; but the principles
are the same.)

One respondent commented that the CV, being a ratio, requires ratio
scale measurements to be interpretable, which would rule out variables
which can take negative values.

The dissenting views concerned two jointly distributed random
variables, X and Y, say.  For the functions X-Y and Y-X, the view
was expressed that CV(Y-X) should equal CV(X-Y).  I was also sent
an application involving the difference between two exam scores,
before and after training.

Finally, it was nice to see such solidarity in the statistical
community.   Sentences like "I'm firmly with you on this one" and
"I think that your objection makes perfect sense" are very
nice to see!

Thanks to John Bankart, Emmanouil Bagkeris, Bendix Carstensen, Tim Cole,
K Govindaraju, Kevin Kane, Roger Newson, Allan Reese, Stephen Senn and
Paul Swank for their responses.   I'm very grateful to them all.

Eryl Bassett

On Mon, September 12, 2011 16:07, I wrote:
> Dear all,
>
> I'm having a "discussion" with a medic over use of the
> coefficient of variation (CV) as a measure of variability.
>
> My instinctive feeling is that CV is really only of use when
> the range of the variable is limited to the positive real
> line.  It's often used, for example, for concentrations of
> hormones in the blood; this seems entirely appropriate,
> especially as the SD of concentrations tends to increase
> with the mean.
>
> But the medic wants to use CV as a measure of spread of
> the log of concentrations.   There are obvious objections
> to this; for example, negative or even exact zero means.
> But I haven't yet found any authoritative statement saying
> that the CV is only appropriate when the range of the
> variable is restricted to the positive real line.  Worse,
> there are some descriptions of CV which specifically mention
> its use with negative values.  For example, the Wikipedia
> entry (yes, I know it's only Wikipedia!) actually *defines*
> CV as SD divided by *modulus* of mean.
>
> My question is, therefore:
>
>    Am I overlooking a statistically valid use of CV for
>    data which may contain negative as well as positive values?
>    If so, I'd be grateful to be told about it.  If not, can
>    anybody point me to an authoritative statement limiting the
>    use of CV?
>
> Please remember that allstat policy is that replies to a query
> should go to the sender rather than to the list.   So please
> respond to me, and I'll try to summarise to the list in due
> course.
>
> Thanks
>
> Eryl Bassett
>
>

You may leave the list at any time by sending the command

SIGNOFF allstat

to [log in to unmask], leaving the subject line blank.