A great deal depends on the substantive nature of your data and the
reason you are clustering. Usually the variables are considered fairly
independent. In many contexts, it is common to use some form of factor
scores. I know of no rule of thumb for what is "highly correlated".
Clustering has a great deal of art to it. You would want to try several
approaches to see if the results are very different. You might also use
something like discriminant function analysis, remembering that tests as
such lose most meaning when you use the same variables as those in the
clustering. Many of the parts of the listing from a package like SPSS
can help give you insight into the different results.
Art
[log in to unmask]
Social Research Consultants
University Park, MD USA
(301) 864-5570
Regina Malina wrote:
>Hi everyone,
>I have a question on variable selection when clustering.
>Should I remove highly-correlated variables when I perform clustering
>analysis? Does it make a difference if I do not? Is there a rule of thumb on
>how high the correlation should be in order to remove the variable?
>Thank you in advance. Regina
>
>
>
>
|