Hello all,
I am doing a logistic regression whereby I am using the events/trials syntax
i.e. for each covariate pattern (corresponding to a group of cases) we have
total number of 'successes' and total number of trails - in effect each
covariate pattern corresponds to a group of cases rather than an individual
case.
When calculating diagnostics, I suppose that when we find an
outlying/influential point in this scenario it would mean that the *full
group* corresponding to the covariate pattern is unusual and should be
investigated and , after examination, if it were applicable, we would then
delete the full group if it appeared suspect.
For the same data, if we use the 'ungrouped' method of modelling (where
n_i=1) then I have found, for my example, that we get identical model
coefficients and model diagnostic values etc. (however, of course, now each
individual case has its own value for a diagnostic, rather than a value
assigned to a full group of cases).
Therefore am I correct in thinking that each method of data entry for
modelling generation (i.e. ungrouped or grouped) always gives exactly the
same results but the events/trails syntax (i.e. 'grouped' case) is the
'quicker' as regards data entry ?
[I also know that 'Deviance cannot be used as a measure of 'goodness' of
fit' for the ungrouped case, instead the Hosmer-Lemehow test is recommended
see http://www.ms.unimelb.edu.au/~rayw/ms372/37201sw8.pdf]
Many thanks in advance,
Kim.
|