Print

Print


Hi,

I'm hoping that someone can point me in the direction of some good
references (preferably online) for cluster analysis - I need to look at
clustering together countries in a large study (19000 subjects, 38
countries).  I have found lots of info on how to do the analysis but I can't
seem to find the answers to the following questions:

1.  Is there a limit to the number of variables you should use (I have 38
countries) to define the clusters.
2.  Should I be weighting countries in some way given that the range of
subject numbers is 1 to 2500

There is also a lot of information on what the different distant measures
and linkage methods are - but no good description of when each is most
appropriate.  I am expecting to have quite a few variables - I am hoping all
will be binary (so each country will be described by proportions of each
variable) so there is unlikely to be a problem with mixed variable types -
but what is the variables are likely to be correlated - should I be using
principal components on the variables first...

I have a lot of questions and haven't yet come across any decent answers -
can anyone help?

Thanks in advance

BM__MailAutoSigNancy Barker

Principal Statistician

Oxford Pharmaceutical Sciences Ltd.



email: [log in to unmask]

phone: +44 (0) 1491 833338

fax: +44 (0) 1491 833334

mobile: +44 (0) 7941 037042

mail: The Stables, 114 Preston Crowmarsh, Wallingford, Oxon, OX10 6SL

This message is intended only for the use of the individual or organisation
to which it is addressed and may contain information that is privileged,
confidential and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any dissemination, distribution or copying of this communication is strictly
prohibited. If you have received this communication in error, please notify
us immediately by telephone +44 (0) 1491 833378, and return the original
message to us at the above e-mail address.