Print

Print


Dear allstat,

First of all, thanks to all who replied. My question was:

*************************
If I have two variables, one of which is continuous and the other count
data with many zero's and one's, which corrleation coefficient should I use
to describe their strength of association?

Should I just use Pearson's r? Or should I dichotomize the count data and
use the rpb or rb?

What about if both are count data?

Your help will be much appreciated.
******************************

Generally those who responded said using Pearson's r would be the first
thing to do, and only dichotomize if really not possible (my paraphrase).

But I don't think the answers quite satisfied me. Maybe I'll be clearer as
to what I want:

Suppose I want to compare the strength of relationship between variables:

A and B, A and C, A and D, for example.

Pearson's r would of course be the legitimate, and probably the best
measure if A, B, C, and D were all normal. But what if my data were counts
(eg no of times one is hospitalized vs no of times one goes to the cinema
in a month, or pints of beer drunk a week - something like that), or if one
is a count variable, and the other normal (eg age).

Is it still valid to use Pearson's r? It seems to me it may be biased by
the 'large' observations. Is there some sort of 'robust' estimates of
association out there?

Thanks again for any help.

Yours,
Tim Mak