Dear Allstat
Like many statisticians, I am frequently called upon to rescue a project which has been unable to answer its original question due to poor design. In this case, I have data from 2000 consumers who have been asked to rate a chocolate product. So as to increase the chances of drawing meaningful conclusions from the project, I want to see if the consumers can be segmented (into about 3 to 6 clusters) based on their answers to some screening questions.
I have demographic data but the more interesting segmentation is based on prior product preference data. All 200 consumers were asked 2 questions for each of 30 other well known chocolate products.
1. Is this product one of your favourites?
2. Have you eaten this product in the last 3 months?
The responses are binary with 1=Yes and 0=No. Therefore for Question 1 alone, I would have a 2000x30 binary data matrix.
It seemed to me that Cluster analysis is the best technique but I am struggling to get any meaningful clusters and I suspect the binary nature of the data is to blame. For a start 1/3 of respondents only chose 1 product as a favourite and only 5% chose more than 4 favourites out of 30 products in all. Therefore there are a large number of zeros in the data.
My questions are
1. Is Cluster analysis appropriate for binary data? If not what other techniques could I use?
2. Should I use a Similarity or Dissimilarity metric? I suspect dissimilarity but every pair consumers will have at least 25 zeros in common based on the observation about of the number of favourites chosen.
3. Which metric should I use given the answer to Q1 ? Initially I used euclidean distance but given the binary nature I think city-block distance is more appropriate.
4. Which linkage method is best for binary data? I have no preference at the moment but I have not given it much thought.
Finally I have also explored using another scale 0, 1, 2 where
0 = product is not a favourite and has not been eaten in last 3 moneths.
1 = Either product is a favourite or has been eaten.
2 = Products is both a favourite and has been eaten in the last 3 months.
IS such a scale more likely to give better results?
Regards
Nigel Marriott
Senior R&D Statistician
Masterfoods Europe
-----------------------------------------
Email provided by http://www.ntlhome.com/
|