>>> Marie-Lorraine APPERT <[log in to unmask]> 1/21/2004 4:30:11 AM >>> <<< I have problems with high correlated factors: I am working with censored data and I am looking for the factors (clinical or genetic) the most important for the survival of the patient. >>> What do you mean by 'important'? Do you mean highly correlated? WHY are the two factors highly correlated? Is this correlation something that is substantively expected, or is it interesting and new? What is the study design? (Randomized trial? Observational? or what?) What is the sample size? <<< What can I do when two factors are highly correlated (|Pearson coeff|>=0.7), >>> The key thing is not whether the variables are corrlated, but whether they are collinear. You can have high collinearity without high correlation, and you can have correlation without collinearity. See, 1. Belsley (1991) Conditioning Diagnostics pub by Wiley or (ahem) 2. Flom (1999). Collinearity Diagnostics in Multiple Regression: A Monte Carlo Study doctoral diss. Fordham U. You didn't say what stat package you are using, but SAS, SPSS, and R can all do what Belsley recommends <<< and when I see that they both have an effect on survival?(in univariate analysis like Kaplan Meier) >>> You cannot tell, from Kaplan Meier or any other stat analysis, that one thing has an effect on something else (although if you have a randomized study, it helps.....). CORRELATION DOES NOT IMPLY CAUSATION. some examples: 1) Students who hire tutors have lower grades than students who do not hire tutors 2) The more firemen who show up at a fire, the more damage is done 3) (my favorite) In elementary school children, there is a correlation beteen astrological sign and IQ; this correlation diminishes with age, and is very close to 0 in adults. 4) The more storks in a city, the more babies are born. (answers on request). <<< How can I know which one is really important for survival (I mean, which is the cause), and which one is just related to survival because of the correlation with the first factor (I mean, which is the consequence)? >>> First, as noted, you cannot imply causation. Second, you assume that ONE factor is CAUSATIVE and the other CONSEQUENTIAL. Actually, neither, either, or both factors could be either causative or consquential. HTH Peter Peter L. Flom, PhD Assistant Director, Statistics and Data Analysis Core Center for Drug Use and HIV Research National Development and Research Institutes 71 W. 23rd St www.peterflom.com New York, NY 10010 (212) 845-4485 (voice) (917) 438-0894 (fax)