A good first approach is to use pictures. Start with a scatterplot.
maybe 3-D scatterplots. Perhaps use different colors for values of
additional variables, e.g., a dichotomy or rating of how "large" of
suspicious a value is.
Does the picture change much when you rank the variables?
What do you mean by "large" ?
What do you mean by "observations"? The values on a particular variable
(a cell entry in the data matrix)? A case (record, line)?
Art
[log in to unmask]
Social Research Consultants
University Park, MD USA
(301) 864-5570
Timothy Mak wrote:
>Dear allstat,
>
>First of all, thanks to all who replied. My question was:
>
>*************************
>If I have two variables, one of which is continuous and the other count
>data with many zero's and one's, which corrleation coefficient should I use
>to describe their strength of association?
>
>Should I just use Pearson's r? Or should I dichotomize the count data and
>use the rpb or rb?
>
>What about if both are count data?
>
>Your help will be much appreciated.
>******************************
>
>Generally those who responded said using Pearson's r would be the first
>thing to do, and only dichotomize if really not possible (my paraphrase).
>
>But I don't think the answers quite satisfied me. Maybe I'll be clearer as
>to what I want:
>
>Suppose I want to compare the strength of relationship between variables:
>
>A and B, A and C, A and D, for example.
>
>Pearson's r would of course be the legitimate, and probably the best
>measure if A, B, C, and D were all normal. But what if my data were counts
>(eg no of times one is hospitalized vs no of times one goes to the cinema
>in a month, or pints of beer drunk a week - something like that), or if one
>is a count variable, and the other normal (eg age).
>
>Is it still valid to use Pearson's r? It seems to me it may be biased by
>the 'large' observations. Is there some sort of 'robust' estimates of
>association out there?
>
>Thanks again for any help.
>
>Yours,
>Tim Mak
>
>
>
|