To Allstaters
I recently posted a query regarding cluster analysis of binary data for a project where I had been called in to rescue it. I subsequently came to the conclusion that no informative clustering of consumers could be achieved and started to analyse the data further with all consumers grouped together in one cluster.
I think all of us at some point in our statistical careers experience moments when we suddenly understand what we have been taught years ago. For me, the distinction between Type 1 and Type 3 SS was always one that I could understand theorectically but I never had an intuitive understanding of this point. In this project I think I have reached a point where the distinction will be become completely clear if you are to help me with this problem.
First, let me explain more about the background to the project. This involves a confectionery product where we know from past studies that consumer liking of the product will drop off well before the end of its shelf life. The project is to try and see if we can maintain liking over the shelf life by changing the product's ingredients and/or the process by which the product is made. To this end, 3 products were set up for consumer testing as follows
Std - Standard Ingredients & Manufacturing process
AltI - Alternative ingredients, Std Process.
AltP - Std Ingredients, Alternative Process.
Remember I said that I was called in afterwards to rescue a poor design. You can see from the above that there should have been a 4th product with alternative ingredients & process.
Each of these products were then consumer tested at 3 different ages namely at 8 weeks, 16 weeks and 24 weeks giving 9 tests in all. Unfortunately due to misunderstanding one of the tests was not done. Consumers were asked how much they liked the product on a 1-7 scale and a number of other questions. The average liking scores were as follows.
@8wks @16wks @24wks
Std 5.38 5.39 5.20
AltI 5.40 5.43 5.15
AltP 5.50 5.35 N/A
The mean square error was 1.50 on 1600 df.
As we can see, the liking is dropping off at 24 weeks but we are missing the AltP product. In the ANOVA for both Type I & III, the AgeXProduct interaction P-values were insignificant. Also the Product Type factor was not significant (Pvalues around 0.36).
However, and the point of this query, the P-values for the Age factor were very different depending on whether Type I or III SS is used.
F-stat P-Val
Type I SS 5.26 0.005
Type III SS 0.77 0.465
As you can see, a very different conclusion is made about the significance of the Age factor depending on which SS type is used. I suspect the missing data point is responsible for this. As I said before, I know what the theorectical difference is between Type I and III but I don't have an intuitive understanding of this. Any help you can give me on the reasons for the difference in results could lead me to say "AH-HA!".
Regards
Nigel Marriott
Senior Statistician R&D
-----------------------------------------
Email sent from http://www.ntlworld.com/
Virus-checked using McAfee(R) Software
Visit www.ntlworld.com/security for more information
|