Thanks very much to those who replied to my query on regression &
correlation. Summary of some of the replies are listed below.
I should say that by far the most simple way of establishing
multicollinearity is by comparing the correlation of the model and the
dependent variable in a stepwise procedure. Suppose you have a linear model
Y= c + a*X + b*Z + e = f(X, Z)
In my proposal, we shall look into the alternative model,
Y = c1 + a1*X + e1 = g(X).
Then if cor(Y, f(X, Z))=cor(Y, g(X)), Z contains no (additional) information
about Y. Concerning significance tests, one ought to remember that these
developed in order to establish relationships in small samples. For large
samples, I should look into the absolute magnitude of R or Rē, which if the
model is worth considering, should be larger than say, 0.6.
CTMO (Centrum voor Toegepast Multivariaat Onderzoek), Faculty of Social &
Political Sciences, Catholic University of Brussels, Vrijheidslaan 17, 1081
Brussels, Belgium. Voice +32(0)2 412 43 38. Fax +32(0)2 412 42 00. E-mail
[log in to unmask]
You will find a treatment of multicollinearity in chapter 8 of
Montgomery and Peck's Intro. to Linear Regression Analysis (ISBN
0471533874). Two methods of diagnosing are
(a) the variance inflation factor (VIF); while this depends on the
multiple correlation coefficient,The option VIF within the SAS command
PROC REG will give you this for each regression coefficient, without
your needing to calculate the correlation coefficient yourself. A
value over 10 indicates multicollinearity. . In SPSS you can click the
option 'collinearity diagnostics' after choosing the
Statistics/Regression/Linear menu options which will give you the VIF.
(b) compute the matrix's condition number. This is the square root
of the ratio of the largest to the smallest eigenvalues of the
product of the transpose of the design matrix and the dsign matrix
itself. Values exceeding 30 indicate a problem. To calculate this,
you will need a package which will calculate matrix transposes,
products and eigenvalues. Minitab and Matlab among standard packages
should do it. You may find it ith the COLLIN option in proc reg in
I hope this is of some help.
Department of Epidemiology and Health Sciences
University of Manchester Medical School
Principal Component Analysis will.
Extract all components and look at those with zero eigenvalues
Colin Chalmers B.Sc.,A.K.C.,P.G.C.E.,M.Sc.,CStat
Senior Lecturer in Applied Statistics
University of Westminster & DataStat
EMail [log in to unmask]
tel: 0171 911 5000 X3040 (UW)
0181 965 4303 (DStat)
fax: 0181 933 0759