Print

Print


>>> Marie-Lorraine APPERT <[log in to unmask]> 1/21/2004
4:30:11 AM >>>
<<<
I have problems with high correlated factors:
I am working with censored data and I am looking for the factors
(clinical or genetic) the most important for the survival of the
patient.
>>>

What do you mean by 'important'?  Do you mean highly correlated?

WHY are the two factors highly correlated? Is this correlation
something that is substantively expected, or is it interesting and new?


What is the study design? (Randomized trial? Observational? or what?)

What is the sample size?

<<<
What can I do when two factors are highly correlated (|Pearson
coeff|>=0.7),
>>>

The key thing is not whether the variables are corrlated, but whether
they are collinear.  You can have high collinearity without high
correlation, and you can have correlation without collinearity. See,
1.  Belsley (1991) Conditioning Diagnostics pub by Wiley or (ahem)
2. Flom (1999). Collinearity Diagnostics in Multiple Regression: A
Monte Carlo Study doctoral diss. Fordham U.

You didn't say what stat package you are using, but SAS, SPSS, and R
can all do what Belsley recommends


<<<
 and when I see that they both have an effect on survival?(in
univariate analysis like Kaplan Meier)
>>>

You cannot tell, from Kaplan Meier or any other stat analysis, that one
thing has an effect on something else (although if you have a randomized
study, it helps.....).  CORRELATION DOES NOT IMPLY CAUSATION.

some examples:
1) Students who hire tutors have lower grades than students who do not
hire tutors

2) The more firemen who show up at a fire, the more damage is done

3) (my favorite) In elementary school children, there is a correlation
beteen astrological sign and IQ; this correlation diminishes with age,
and is very close to 0 in adults.

4) The more storks in a city, the more babies are born.

(answers on request).

<<<
How can I know which one is really important for survival (I mean,
which is the cause), and which one is just related to survival because
of the
correlation with the first factor (I mean, which is the consequence)?
>>>

First, as noted, you cannot imply causation.

Second, you assume that ONE factor is CAUSATIVE and the other
CONSEQUENTIAL.  Actually, neither, either, or both factors could be
either causative or consquential.



HTH

Peter

Peter L. Flom, PhD
Assistant Director, Statistics and Data Analysis Core
Center for Drug Use and HIV Research
National Development and Research Institutes
71 W. 23rd St
www.peterflom.com
New York, NY 10010
(212) 845-4485 (voice)
(917) 438-0894 (fax)