To ALLSTAT
Recently my company carried a trial involving 4 products, 25 samples per product, making 100 samples in all. The 4 products were permutations of 2 slightly different sources of raw materials and 2 slightly different manufacturing methods. For each sample, measures were made of 15 product attributes.
In common with many other statisticians I was only asked for my opinions after the trial was conducted. Fortunately I was able to confirm through MANOVA their main conclusion that raw materials had no effect but that manufacturing process did have an effect.
However, I became very suspicious when one of my colleagues, who showed me a PCA plot of the data, claimed that the 1st component accounted for 99% of the total variance! On looking at the data myself I could only get a 1st component accounting for 20% of the total variance. It transpired that my colleague had done the following.
1. She had done an ANOVA on each atttribute separately and discarded any attribute where the raw material and process effect was not significant. This reduced the 15 attributes to 6 attributes.
2. She then performed a PCA using only the mean of each attribute for each product i.e. a 4x6 data matrix, rather than a 100x6 data matrix.
On hearing this I reflected first on the statement "A little learning is a dangerous thing" before deciding that my colleague needed some more statistical coaching. However, before I proceed I wanted to ask you if you can think of any circumstances when it might be correct to do what she did. In particular, is it legitimate to do PCA on group means only rather than using the full data set in some analyses?
Thank you in advance
Nigel Marriott
R&D Senior Statistician
Masterfoods Europe
-----------------------------------------
Email provided by http://www.ntlhome.com/
-----------------------------------------
Email provided by http://www.ntlhome.com/
|