I'm posting this message on behalf of a colleague
so please send replies to [log in to unmask]
---------
Hi all,
I'm judging somebody else's work on a huge economical database. The final aim of
the study is to extract and study some causal relationships and what they do is
performing a very complex and apparently senseless data analytic procedure of
which I'm not sure. Notice that no inferential arguments seem to be required,
the conclusions being attached to the population under study. My main doubts
concern
a) The use of scores assigned to some categorical variates to perform PCA;
b) The use of some of the afore mentioned PC's (80% of total variability!) to
perform hierarchical Cluster Analysis by Euclidean distance;
c) The use of such clusters to define dummy variables to use within a linear
model;
d) The use of these dummy variables to alter the relation with the covariate
which appears to be associated to the most significant regression coefficient (I
assume the tests of significance are performed on the model without dummy
variables).
I would be very grateful to receive any hint or bibliographic suggestion.
Thank you in advance,
Francesco Campobasso
--
Alessio Pollice
Dipartimento di Scienze Statistiche
Università degli Studi di Bari
Via C.Rosalba n.53, 70124 Bari - ITALY
Tel.: ++39 080 5049243
Fax: ++39 080 5049147
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|