Print

Print


On 29/06/07, Kathryn Jane Gardner <[log in to unmask]> wrote:
> Thanks for your response Jeremy.  As with most stats issues it seems there are more than one correct answer.
>
> Re: residuals, I assume that "actual" and "predicted" are the same as "observed" and
> "expected" (the latter 2 being the terminology when exploring boxplots of the saved
> ANOVA residuals via SPSS's EXPLORE function)?
>

Yes.

> By "large" residuals are we talking about a) standardized residuals, and b) a size of about > 3.3+?

I'm reluctant to give a specific value, because in part it depends on
your sample size.  If you have a sample of 1,000,000, then you'd
expect 1,000 to be outliers, by your definition, when they're not.
And if you have a sample that large, you don't mind outliers anyway.

Also, you define outliers by the mean and standard deviation, and
these are sample characteristics.  But in a small sample, these vary a
lot.

Also (again), be aware that "standardized" is a very tricky word when
it comes to residuals. (What most regression texts call standardized
residuals, SPSS calls studentized residuals.  I wrote an entry in the
Encyclopedia of Behavioral Statistics on this, which I think I have
somewhere, and could send you.)

I would do an influence analysis, to see if the outliers make any
difference, before thinking about discarding.

Jeremy



>
> ___________
>
> On a different issue, I followed your advice Jeremy about using the C statistic for classification purposes, relative to classification tables (which are limited by arbitrary cut-off points) in logistic regression. I calculated C by:
>
> 1) Running a binary logistic regression model with the default  .5 classification cut-off changed to .3, since the prevalence of the disorder I am looking at is 30%. I also saved the predicted probabilities.
> 2) Used the saved predicted probabilities for entry into a ROC curve analysis (the area under the ROC is equivalent to C).
> 3) Identified the point where the curve rises rapidly for the new classification cut-off threshold value.
> 4) Re-ran the logistic regression with the classification cut-off threshold value identified with the ROC curve e.g., .33.
>
> My problem is that the above approach doesn't seem appropriate for ordinal regression for a couple of reasons: 1) there is no option for cut-off points in ordinal reg, although I assume the choice of link functions compensates for this?, and 2) ordinal reg has an option to save the predicted/estimated response category, which I thought was the same as the predicted probabilities option in binary log reg, but now I'm not too sure as the ROC analysis using these saved probabilities doesn't seem right. Can you advise Jeremy? Is this approach correct or is there another way to calculate the C statistic for ordinal outcomes?
>
> Thanks
> Kathryn
>
>
> Kathryn
>
> >>> Jeremy Miles <[log in to unmask]> 06/28/07 3:48 pm >>>
> Hi Kathryn,
>
> They are both right, because they are saying the same thing, but in a
> different way.
>
> First, you could look at each group individually. You would look for
> outliers, which means you would be looking for individuals who are far
> from the mean.
>
> Second, you could look at residuals.  The residual is calculated as
> (actual score) - (predicted score).
>
> In a one way anova, your predicted score for each group is the group
> mean. So you would be looking for high residuals, which means people
> are far from the mean.
>
> Looking at groups no longer works as soon as you have a continuous
> predictor in there, and do an ANCOVA, or a regression.
>
> ANOVA, ANCOVA and regression are all the same thing really, so there's
> always equivalence between them.  (I tend never to use ANOVA/ANCOVA, I
> do everything with regression, because then I'm always using the same
> approach, and always doing the same thing.)
>
> The slight advantage of ANOVA is that it's easier to deal with
> non-homogeneity of variance, because you can do the Brown or Welch (or
> Satterthwaite) correction .  However there's a lot of irrelevant
> nonsense talked about that assumption, because if you have equal
> sample sizes, it's irrelevant. You can correct by using what's
> sometimes called robust estimation, or sometimes called using a
> sandwich estimator, but it's fiddly in SPSS.
>
> Jeremy
>
> On 28/06/07, Kathryn Jane Gardner <[log in to unmask]> wrote:
> > Dear list,
> >
> > I am hoping that someone can clarify the following for me regarding one-way ANOVAs. In all stats books I have read, the reader is advised to pre-screen the DV for outliers and normality by using boxplots and histograms separately for each group of the IV. However, I have read/heard elsewhere that the assumption is normality of the residuals (as in regression). If this is the case then it would appear the all of these stats books are not providing the correct advice and it is pointless pre-screening the data.
> >
> > If the assumption is normality of the residuals, I assume that this applies in one-way ANOVA also? However, there is no option to save residuals in SPSS's one-way ANOVA (I'm not sure why). The alternative would be to run the one-way ANOVA via the Multivariate ANOVA option and save the residuals, then plot histograms, boxplots and normality plots etc based on these residuals.
> >
> > Thanks
> > Kathryn
> >
>
>
> --
> Jeremy Miles
> Learning statistics blog: www.jeremymiles.co.uk/learningstats
> Psychology Research Methods Wiki: www.researchmethodsinpsychology.com
>


-- 
Jeremy Miles
Learning statistics blog: www.jeremymiles.co.uk/learningstats
Psychology Research Methods Wiki: www.researchmethodsinpsychology.com