JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for ALLSTAT Archives


ALLSTAT Archives

ALLSTAT Archives


allstat@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Monospaced Font

LISTSERV Archives

LISTSERV Archives

ALLSTAT Home

ALLSTAT Home

ALLSTAT  2002

ALLSTAT 2002

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Why do GLMs? - SUMMARY (long)

From:

"R. Allan Reese" <[log in to unmask]>

Reply-To:

R. Allan Reese

Date:

Thu, 19 Sep 2002 17:39:10 +0100

Content-Type:

TEXT/PLAIN

Parts/Attachments:

Parts/Attachments

TEXT/PLAIN (839 lines)

In early August I posed a general question about the motives for
preferring GLMs for modelling data. Thanks to all who replied. I offered
to post a summary, but went on holiday before doing it. Here is a repeat
of the question, followed by a very short reaction and a selection of the
responses grouped in themes. My insertions are marked with ***.

------------------------------------------------------------------------------
Date: Thu, 1 Aug 2002 17:08:29 +0100 (BST)
From: "R. Allan Reese" <[log in to unmask]>
To: Allstat UK list <[log in to unmask]>
Subject: RFI: why do GLMs?

I responded to a question on RADSTATS asking about transformations, and
have now been faced with the type of basic question that a good student
should raise and a good teacher should be able to answer. I offer my
response below, but would be grateful for comments, additions or
references from (send to me). Since I have not asked his permission, I
have removed the questioner's name, but will circulate a summary and also
make sure my interrogator is informed.

--On 30 June 2002 <[log in to unmask]> wrote (personal email):
>> ... various books and papers on generalized linear models, which use
>> error distributions from the exponential family, of which Gaussian is
>> just one. The manual for GLIM (Payne et al) is perhaps as good as any
>> primer, or McCullagh and Nelder's book for a more theoretical approach.

--response from X was:
> Thanks, I'm aware of these, but I haven't explicitely seen a debate
> between using power transformations and using techniques based on
> non-gaussian distributions. Does it exist?

RAR's response: The question makes me realise that for twenty years I have
taken it as axiomatic that modelling with stated assumptions was "better"
than using an approximation based on other assumptions+normality. One
argument has been that the older methods were employed simply because the
mathematics was more tractible with hand calculations and tables.
Another has been the elegance of GLMs in dispelling the morass of
terminology surrounding all the alternative specific techniques.
Thirdly, has been the assumption that modelling gives better insight into
the generating mechanisms, getting away from the idea of data analysis as
no more than an exercise in arithmetic (to the "research cycle"). But I
cannot point to specific evidence that we get "better" answers, only to
examples used, which might be considered anecdotal.

-----------------------------------------------------------------------------
Date: Tue, 6 Aug 2002 13:26:21 +0100 (BST)
From: "R. Allan Reese" <[log in to unmask]>

The sensible answer to the original query seems to be that it is not "one
or the other" but that either approach has advantages and traps, so a
competent data analyst should be aware of both and use either or both as
appropriate. In particular, the strength of belief in the inferences
relates to the strength of belief in the assumptions. The "best" model
is not necessarily the one that gives some mathematical optimum statistic
but the one that is most useful in studying the phenomenon being measured.

Nick Cox and others highlight the point that data may have some "natural
scale" which the analysis and interpretation should reflect. This is
another aspect of using modelling to give insight into processes, rather
than imposing a model because the mathematics is nice (cf Plato's
celestial circles).

Warren Gilchrist and Roger Newson make the link to median regression,
which brings "parametric" and "non-parametric" methods into the debate,
another area where introductory courses too often leave the impression
that in each situation one is "right" and the other "wrong".

Allan

------------------------------------------------------------------------------
*** Firstly, here are two responses querying the premise that GLMs are better

------------------------------------------------------------------------------
Date: Thu, 1 Aug 2002 11:23:18 -0500 (CDT)
From: Jim Hodges <[log in to unmask]>
To: "R. Allan Reese" <[log in to unmask]>
Subject: Re: RFI: why do GLMs?

Brave of you to raise this point. Please don't take anything I say as
implying criticism.

The following may be peculiar to my line of work: all-purpose PhD
statistician in a dental school. I analyze data from a great variety of
scientific studies, lab and clinical mainly but with some epidemiologic
and occasional administrative problems. My collaborators are rarely
interested in parameters per se, or in the data-generating mechanism as we
statisticians think of it. They are, instead, generally interested in
substantive questions like: Does this group's outcome tend to be higher
than this one? Is there a trend in this response with respect to this
predictor? I use transformations and normal errors because when
applicable (almost all the time) they give the same answers as the more
laborious (for me, at least) GLMs, plus I find the procedures more
transparent, in particular, easier to check for errors, and to do all the
usual diagnostic stuff. Exceptions are when I have, say, count data and
the counts are all small, but in my practice such instances are rare.

When I have some doubt about the correct transformation, I do the analysis
with a few different ones that seem appropriate (after consulting Box-Cox,
usually), and if they give the same answer to the subject-matter question,
I advise the researcher to report the one on the scale most familiar to
the intended audience.

Issues regarding the data-generating process usually seem to involve
whether some subset of points has a different (defective, usually)
data-generating process from the bulk of the points, and I treat this by
deleting the suspect points and seeing whether the substantive conclusion
changes. If not, the issue is moot; if so, the investigators need to
convince me there's a good reason to suspect these points, or else I
advise them to report the analysis including the suspect points, and to
mention that they seem suspect and that the result changes if they're
deleted.

I'd be interested to see the responses you get to this one!

Best wishes,

Jim Hodges

&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
&& Jim Hodges Phone: (612) 626-9626 &&
&& Division of Biostatistics Fax: (612) 626-8892 &&
&& School of Public Health e-mail: hodges@ &&
&& University of Minnesota ccbr.umn.edu &&
&& 2221 University Ave SE, Suite 200 &&
&& Minneapolis, Minnesota 55414 &&
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&

------------------------------------------------------------------------------
Date: Fri, 2 Aug 2002 08:56:30 -0500 (CDT)
From: Jim Hodges <[log in to unmask]>
To: "R. Allan Reese" <[log in to unmask]>
Subject: Re: RFI: why do GLMs?

[RAR]
> Yours is a minority view but not unique, and I'm particularly delighted
> to get such comments. I have a deep-rooted distrust of anyone who says,
> "There is no alternative".
>
> It is going to be a serious piece of work to summarise the answers!

But the result will be interesting, and I'll bet it would be suitable for
a publication like The American Statistician, in their Teacher's Corner.
No, I am not on its editorial board, but this is the kind of discussion I
wish I'd heard when I was a student!

Best wishes,
JH

------------------------------------------------------------------------------
From: "Ruth M Pickering" <[log in to unmask]>
To: "R. Allan Reese" <[log in to unmask]>
Subject: Re: why do GLMs?
Date: Fri, 2 Aug 2002 09:23:31 +0100

Dear Allan

Some years ago I heard someone (a lady, but I can't remember who) give a
talk at a conference (I can't remember which, but probably either the
Biometrics society or the Society of Clinical Biostatistics) about just this
issue. She came to the conclusion that transformations and Normal models
were a much better option and had been sadly overlooked in all the excitment
concerning GLMs.

Ruth Pickering

*** Does anyone else recognise this reference?

------------------------------------------------------------------------------
*** Most respondents agreed with my interpretation that transformations
and normal models were used because they could be solved analytically.
Other models may just require more arithmetic, which is not a criterion
when a computer is available.

------------------------------------------------------------------------------
From: "David W. Smith" <[log in to unmask]>
To: "R. Allan Reese" <[log in to unmask]>
Subject: Re: why do GLMs?
Date: Thu, 1 Aug 2002 14:26:18 -0700

Clearly, the use of transformations with least squares and inference using
normal theory was originally motivated, in large part, by the need for
practical methods of analysis. Sometimes this may still be true.

The issues appear to be the correctness of the model, accuracy of the tests,
eg, alpha levels of the approximate tests that correctly represent what is
happening, and the power to identify a model that is accurate.

A better model should give both better power and more accurate alpha levels.

I think George Box said something like: All models are wrong, some models
are useful. I usually interpret this to mean that a pretty good model might
be quite informative. It may also mean not to worry a great deal about
nit-picking details until the major issues have been squared away.

The stuff about good tests has been with us a long time. I'm sure there are
corresponding Bayesian interpretations about decision rules. The point is,
that the norms of good analysis are fairly permanent. They have to do with
coming to good conclusions. The actual models and tests that work well vary
with the statistical technology available, both theory and computing. They
also vary with the actual data and with our knowledge about the data. What
I mean here, is that if one is uncertain about some of the assumptions one
is making, one might choose a method that is more robust but less powerful
if the assumptions are met, eg, using a Wilcoxon test rather than a t-test.
One might also give up power by using a split-half cross-validation method.
Et cetera.

David Smith
University at Albany-State University of New York.

------------------------------------------------------------------------------
Date: Fri, 2 Aug 2002 11:10:12 +0100 (BST)
From: "R. Allan Reese" <[log in to unmask]>
To: "David W. Smith" <[log in to unmask]>
Subject: Re: why do GLMs?

On Thu, 1 Aug 2002, David W. Smith wrote:
> A better model should give both better power and more accurate alpha levels.

That's the conventional view (with which I don't disagree) but seems to me
to emerge as a matter of faith. As you mention, traditional normal-based
models were used not least because the solutions were algebraic, tractible
and closed. Having a computer to do the slog dispenses with that
argument. What is coming out of the exchanges is that there may be
*mathematical* arguments for certain models, but "better" ought to be
defined in wider scientific terms - the better model gives more insight
into the nature of the problem, as well as giving inferences from the
specific data.

Allan

------------------------------------------------------------------------------
From: "Andrew Dunning" <[log in to unmask]>
To: "R. Allan Reese" <[log in to unmask]>
Cc: "Andrew Dunning" <[log in to unmask]>, <[log in to unmask]>
Subject: Re: why do GLMs?
Date: Thu, 1 Aug 2002 21:23:40 -0400

I'm not sure I'm completely clear on your question, but it reminded me of an
exchange of messages I had with Prof. Brian Leroux of the University of
Washington. I was a student at the time taking a course on generalized
linear models which he was teaching. I hope he won't mind if I paste his
reply in below.

Andrew Dunning

On Sat, 23 May 1998, Andrew Dunning wrote:
>
> Brian:
>
> I was reviewing my Biostat571 take home exam, and thinking about the
> Applied Exam, and was wondering where to go to find an answer to the
> following question ...
>
> When should one model log E[Y] = X \beta, and when use E[log(Y)] = X
\beta.
>
> The classical model is more flexible, the mean and variance can be modeled
> separately, and in some ways is easier to interpret. They have different
> theoretical interpretations, in particular the poisson process for the
> log-linear model, but in practice one usually has overdispersion anyhow,
> so the the theoretical model is usually not an ideal fit. The poisson
> model may be more appropriate for pure count data, but the log-transformed
> classical model seems to be a well accepted model for concentrations, and
> in practice, one often has "quasi-count" data, eg, number of colonies of
> while blood cells per ml of serum.
>
> If you know of anywhere I could read about this, it would be something I
> would really like to find out about before the 18th.
>
> Thanks.
>
> Andy Dunning

Date: Tue, 26 May 1998 11:55:45 -0700 (PDT)
From: Brian Leroux <[log in to unmask]>
To: Andrew Dunning <[log in to unmask]>

This is a good question but too practical to have received much attention
in the literature!

McCullagh and Nelder must have some comments on this somewhere. It is
important to separate modeling the mean right from modeling the variance
right and to remember that modeling the mean right is more important. But
in many cases these 2 models are not different enough to make much of a
difference and you can use whichever is easier to interpret.

One thing to watch out for is small counts, for which the Poisson model
may be the most meaningful one in terms of both mean and variance. I have
done some simulations related to this question that show the mean has to
be quite small (less than 5) for there to be much difference between the 2
models.

Brian

------------------------------------------------------------------------------
From: Brian G Miller <[log in to unmask]>
To: "'R. Allan Reese'" <[log in to unmask]>
Subject: RE: why do GLMs?
Date: Fri, 2 Aug 2002 09:25:18 +0100

Interesting discussion topic. Some general thoughts:

To me, the great advantage of the GLM approach was that it rationalised and
systematised variants on the standard linear-Normal model in two directions:
non-linear link functions for the structural part, and non-Normal error
distributions for the remainder. While it emphasised the canonical links
for some well-known distributions, e.g. logistic for binomial, log for
Poisson, it freed us to think separately about the forms of the model and
the uncertainty surrounding it. So I don't see a conflict between power
transformations, which by their nature relate strongly to the shape of the
link function, and "non-Gaussian distributions" which would appear to be
more focussed on error. Any modelling should be looking at them both, and
if we choose transformation followed by linear-Normal modelling, we need to
think about the effects of the transformation on each.

It is my experience that, while scientists can often see clearly the
distinction between signal and noise, they are less familiar with how those
concepts transfer into statistical modelling. There is an implied (and
often stated) requirement for us to explain why we use the models we choose
in terms the client understands. But, in the end of the day, all models are
wrong, some are just more useful than others. And often several different
models will fit the some data just as well (or as badly) in a global sense
but will show strong local patterns of systematic misfit. So it is crucial
to apply techniques of model criticism such as the various residual plots,
but also to know which ones are sensitive to model mis-specification, and
which to wrong choice of error distribution.

Finally, it is often overlooked that choice of error distribution can be
seen as different ways of weighting the observations (which is seen clearly
in the computation of GLMs by iteratively reweighted least squares). So it
is (or should be) easy to imagine fitting the same underlying linear model
with different weighting schemes corresponding to different error
distributions, which may help to keep the two halves separate.

Hope this helps

Dr Brian G Miller, PhD, CStat
Director of Research Operations
Institute of Occupational Medicine
8 Roxburgh Place
Edinburgh EH8 9SU
Tel +44 (0) 131 667 5131
Fax +44 (0) 131 667 0136

Visit IOM at www.iom-world.org.uk

--------------------------------------------------------------------------
From: [log in to unmask]
To: [log in to unmask]
Subject: Re: why do GLMs?
Date: Fri, 2 Aug 2002 08:48:08 +0100

I have given talks from time to time, comparing the approach using
transformations with the approach using GLMs. I finally wrote and
published a paper on this subject, which came out this year:
        P.W. Lane (2002). Generalized linear models in soil science.
        European Journal of Soil Science, 53, 241-251.
In the paper, I explicitly compare alternative analyses of a set of data
from Rothamsted, and conclude with the following five advantages of GLMs.
From an intuitive viewpoint, it is better to adjust the model for the data
that have been observed rather than to adjust the data to suit a
pre-defined model.
GLMs offer flexibility, providing scope to deal with a wide range of
patterns for the relationship between variance and mean.
The two generalizations within GLMs provide the ability to separate the
choice of scale on which effects are to be linear and additive from the
need to model the variance behaviour of the response.
With GLMs there is no problem with interpreting means on the natural
scale.
With GLMs there is much less difficulty with extreme observations for
constrained data, such as zero counts.

The paper provides "debate" on the issue, but perhaps not "evidence" in
the sense that you refer to in your reply. I can't really see how such
evidence could be provided without just cataloguinbg a series of
"anecdotal" examples.

Best wishes

Peter Lane
Research Statistics Unit, GlaxoSmithKline

------------------------------------------------------------------------------
From: "Nick Sofroniou" <[log in to unmask]>
To: <[log in to unmask]>
Subject: RFI: why do GLMs?
Date: Fri, 2 Aug 2002 09:50:54 +0100

Another reason that I like to use GLMs is that one is modelling the
arithmetic mean on the original scale via a link function to the linear
predictor, rather than tranforming the response with something like a power
transform and modelling the transformed scale, which may or may not have a
clear interpretation on the original scale (e.g., the median, or geometric
mean).

Jim Lindsey's books often use an information criterion, the AIC, to compare
how well alternative non-nested models fit the data, pointing out the need
to incorporate the Jacobian of any transformation of y in the likelihood to
make the AICs comparable.

Nick

Dr. Nick Sofroniou
Educational Research Centre
Saint Patrick's College
Drumcondra
Dublin 9
Republic of Ireland

------------------------------------------------------------------------------
Date: Fri, 2 Aug 2002 10:48:05 +0100 (BST)
From: "R. Allan Reese" <[log in to unmask]>
To: Nick Sofroniou <[log in to unmask]>
Subject: Re: RFI: why do GLMs?

That's a nice succinct statement of the technical arguments, which would
shut up most clients! Thanks for linking it to the AIC; I read a couple
of excellent papers recently explaining the virtues of AIC in comparing
models, so will include those references in the feedback.

David R Anderson & Kenneth P Burnham
Understanding information criteria for selection among capture-recapture
or ring recovery models (ppS14-21)

David R Anderson & Kenneth P Burnham
General strategies for the analysis of ringing data (ppS261-270)

both in Bird Study Volume 46 Supplement 1999,
Large-scale studies of marked birds, Proceedings of the EURING97 Conference.
Available from the British Trust for Ornithology, The Nunnery, Thetford,
Norfolk, UK.

Allan

------------------------------------------------------------------------------
Date: Fri, 2 Aug 2002 10:36:39 +0100 (BST)
From: Ian White <[log in to unmask]>
Subject: RFI: why do GLMs?
To: "R. Allan Reese" <[log in to unmask]>

I think it's a very good question. In the 1-sample problem, any GLM estimates
the population mean by the sample mean, whereas a log normal model gives a
different estimate. There has been discussion of this in the health economics
literature. Ian

Ian White
MRC Biostatistics Unit
Institute of Public Health
Robinson Way
Cambridge CB2 2SR

Tel: 01223 330399
Fax: 01223 330388
http://www.mrc-bsu.cam.ac.uk/People/ian.shtml
*********************************************

------------------------------------------------------------------------------
From: [log in to unmask]
Subject: Re: RFI: why do GLMs?
To: "R. Allan Reese" <[log in to unmask]>
Date: Fri, 2 Aug 2002 10:42:15 +0100

Your questioner asks a question that bothered me for several years. When
all is said and done we are really talking about modelling uncertainty.
I've seen many papers with new methods that purport to give better results,
but I rarely understand what better really means. Sure we can change the
objective function hence a new method is better from the new perspective
but does changing the perspective make a real difference. Are predictions
with method A significantly different from predictions with method B? One
of the main justifications for new methodology is the way they expand our
thinking about the process we are looking at. Are complex models with
"sound theoretical bases" just over interpreting the data? All of these
thoughts were summed up several years ago in a paper on model uncertainty
in one of the RSS journals - I've forgotten the details.

When push comes to shove I stand firm on the principle of Occams razor -
Keep it simple. Generalised LMs are a useful extension of General LMs when
the error distribution is know to be a specific non-gaussian distribution.
But two caveats: Do not guess the error distribution; A non-linear link
function changes your hypotheses. If the error distribution is truly
unknown remember that the General LM is robust and will not lead too far
astray. Finally if the asymptotic Gaussian argument is unpalatable and the
error distribution is unknown there are always non-parametric tests.

Dave.

------------------------------------------------------------------------------
Date: Fri, 2 Aug 2002 15:05:55 +0100 (BST)
From: "R. Allan Reese" <[log in to unmask]>
To: <[log in to unmask]>
Subject: Re: RFI: why do GLMs?

If by Chris Chatfield it will be good. He had a paper in The Statistician
earlier this year that I expect to refer to often.

Chatfield C 1995 "Model uncertainty, data mining and statistical
inference" JRSS-A 158:3

-- 2002 "Confessions of a pragmatic statistician" JRSS-D 51:1

------------------------------------------------------------------------------
Date: Fri, 2 Aug 2002 11:51:04 +0100 (BST)
From: Malcolm Farrow <[log in to unmask]>
Subject: Re: RFI: why do GLMs?
To: [log in to unmask]

It occurs to me that one factor here is separating the linearity, or
otherwise, of the systematic effect from the mean-variance relationship.
If we transform the data then we change both of these things together,
which we might not want to do. In a GLM (or a nonlinear regression) they
are different things and can be modelled separately.

Of course, all of this assumes that we are talking about continuous data.
If we have discrete data (e.g. Poisson-like) then there are clearly other
arguments in favour of trying to model the actual distribution.

Malcolm Farrow
Sunderland (and many years ago, at Hull)

------------------------------------------------------------------------------
Date: Fri, 2 Aug 2002 11:40:41 +0100 (BST)
From: "R. Allan Reese" <[log in to unmask]>
To: <[log in to unmask]>
Subject: Re: why do GLMs?

On Fri, 2 Aug 2002 [log in to unmask] wrote:
> ...; but what
> statistical journal would be likely to accept a paper that simply presents
> arguments for using one method rather than another, when most
> statisticians seem now to accept the first method anyway?

I hope an RSS journal would accept a well-argued paper, whether it
bolstered or attacked the orthodox view. I have a strong prejudice now
against anyone who says, "There is no alternative." ;-) One way to get
this question out of my system may be to take the several replies and try
to produce a review paper.

...
Even for models defined a priori, applied statistics consists of
fitting the model while looking for patterns and exceptions, because if
you don't allow for deviations in *this* data set compared with any
previous data, you are not "analysing" the data.

> -- and this is what most statisticians do (though not in the analysis
> of clinical trials, I have discovered).

Quite so! And doesn't that give you reassurance that "clinical tests"
will show up unexpected side effects? I was once brought in to analyse
data for a large pharmaceutical company, and realised afterwards I was the
fall guy because no insider wanted to tell the company the experiment had
gone wrong.

Best wishes
Allan

------------------------------------------------------------------------------
From: Paul Hewson <[log in to unmask]>
To: "'R. Allan Reese'" <[log in to unmask]>
Subject: RE: why do GLMs?
Date: Fri, 2 Aug 2002 12:34:26 +0100

Look forward to seeing the summary. There was an authorititive and concise
letter in RSS news about 5 years ago (+/- 1 years) from John Nelder setting
this out very well.

From my point of view, I'd rather have data on the actual scale and funny
parameters than data on a funny scale and simple parameters. Being quite a
simple person, I also find the error in glms much easier to understand than
the highly complex errors you get when transforming data.

There's another strand to this entirely, along the lines of why jiggle
around with your data to make it fit something that is mathematically
convenient to deal with. After all, computers these days are electronic
boxes that can cope with the extra arithmetic - not an option open to early
20th century computers. This means that many more appropriate models can
be used (not just glms) depending on the context.

Paul

Paul Hewson tel. (01392) 382773
Data Analyst and Research Officer

Road Safety Team, Environment Directorate,
Devon County Council,
1st Floor, Lucombe House,
County Hall
Topsham Road
Exeter EX2 4QW

tel (01392) 382773 fax (01392) 382135
email [log in to unmask]

------------------------------------------------------------------------------
From: "Nick Cox" <[log in to unmask]>
To: <[log in to unmask]>
Subject: Re: why do GLMs?
Date: Fri, 2 Aug 2002 12:49:59 +0100

There is a considerable danger here of several false antitheses.

On one level, a simple but important message is that some transformed
scales are often just as natural as, and way more convenient than, the
form in which the data arrive. Some groups of scientists appreciate this
quite as well as statisticians, and scales which are actually transforms
(in at least one sense) abound: pH, decibels, octaves, the Richter scale,
etc. The reciprocal of a ratio can be as natural as that ratio: miles per
gallon vs gallon per mile, price-earnings ratio vs earnings-price ratio,
people per unit area vs area per person, etc. Economists also are happy
with thinking about money on a logarithmic scale, just as most of us do
much of the time, e.g. in thinking about house prices, wage and salary
increases, etc.

In all these cases and many more if the response data, or more precisely
the residuals from a model, are more nearly Gaussian, or at least
symmetric, on a transformed scale, this helps to persuade scientist and
statistician to carry out analyses on the transformed scale _and to stay
working with that scale_.

Emphasis on this seems to me totally consistent with enthusiasm for
Generalised LMs.

Conversely, and this is perhaps the key point, there are many situations
in which scientists really do want results on the scale in which data
come. For this situation, GLMs do have one major advantage in returning
results on the scale of the response as originally measured whatever the
link function used, without the need for the extra labour of back
transforming, bias corrections, etc. Yet there can be still be need to
transform covariates.

Having said all that, there is a flavour in what you quote of focusing on
just this one issue. The form of the distribution of the response is often
secondary, yet in some fields discussed much more often than issues such
as independence assumptions which are arguably more fundamental. It is
still common to find that this assumption is discussed more frequently in
some applied literatures than all other assumptions combined,

Again, one merit of Generalised LMs has been to make it easier in many
cases to vary the family of distribution and see how much impact that
assumption has on the analysis.

I don't know how many people use GLIM any more. It seems that now every
serious statistical environment has its own module for Generalised LM.
(That's a partial definition of "serious ...".)

McCullagh and Nelder is a work of genius, but a trifle scary as an
introduction. Annette Dobson's introductory text might better fit the
bill. (An Introduction to Statistical Modelling, Chapman & Hall, 1983)

Nick
[log in to unmask]

------------------------------------------------------------------------------
Date: Fri, 02 Aug 2002 13:07:05 +0100
Subject: Re: RFI: why do GLMs?
To: [log in to unmask]
From: "Warren GILCHRIST(CMS)" <[log in to unmask]>

Just to note that an alternative to both transformations + Normal and GLM
is to use median based regression with quantile functions for non-Normal
models. This avoids, for example, the problems generated by reversing the
transformation after the fitting. See my book "Statistical Modelling with
Quantile Functions", Warren Gilchrist, 2000, CRC/Chapman and Hall, for
some examples.

Best Wishes
Warren

------------------------------------------------------------------------------
Date: Fri, 02 Aug 2002 14:53:32 +0100
To: "R. Allan Reese" <[log in to unmask]>
From: Roger Newson <[log in to unmask]>
Subject: Re: why do GLMs?

I saw your thread on Allstat re GLMs and transformations. In fact, a lot of
these normalising/variance stabilising transformations are powers and logs,
and fit nicely into the GLM framework, and can be used to define CIs for
algebraic means and their differences or ratios. In general, given a power
p, the p'th power algebraic mean of Y is equal to

AM_p(Y) = E(Y^p))^(1/p)

where E(.) denotes expectation, and a^b denotes a raised to the power of p.
The arithmetic mean is the 1st power algebraic mean. The harmonic mean is
the algebraic mean of power -1. By convention, the geometric mean is
considered to be the zeroth power algebraic mean, and is equal to

GM(Y) = exp(E(log(Y))).

Therefore, if you raise Y to the power p, and use the resulting transformed
variable Y^p as the outcome variable in a generalised linear model with
link function power(1/p), then the parameters beta_j fitted (with
confidence limits) will be algebraic means and their differences. (In the
special case p=1, the fitted parameters beta_j are of course arithmetic
means and their differences.)

Alternatively, you can estimate algebraic means and their ratios. To do
this, use Y^p as the outcome variable in a generalised linear model with a
log link function, and derive parameters beta_j, with a covariance matrix
V(beta), and then multiply beta_j by 1/p to get new parameters gamma_j, and
multiply V(beta) by 1/(p*p) to get V(gamma_j) , the covariance matrix of
the gamma_j. You can then use the diagonal of V(gamma_j) to calculate
standard errors and confidence limits gammamin_j and gammamax_j for each
gamma_j. Then you can transform the confidence intervals to get

theta_j=exp(gamma_j)
thetamin_j=exp(gammamin_j)
thetamax_j=exp(gammamax_j)

and the theta_j will be algebraic means and their ratios, with confidence
limits thetamin_j and thetamax_j.

In the case of the log transform, you can use log(Y) as the outcome
variable in a GLM with an identity link function, and estimate parameters
beta_j, with confidence limits betamin_j and betamax_j, and then transform
the confidence limits to get

theta_j=exp(beta_j)
thetamin_j=exp(betamin_j)
thetamax_j=exp(betamax_j)

and the theta_j will be geometric means and their ratios. In general, these
differences (or ratios) may either be differences (or ratios) between
groups, or differences (or ratios) associated with a given increment in a
quantitative X-variable.

Usually, we use the normal (ie constant) variance function, whatever the
link. However, that is not compulsory, at least if you use the Stata
statistical package (as I do). You may use the gamma variance function, if
you think that Y^p has a constant coefficient of variation rather than a
constant variance.

Therefore, power and log transforms, at least, are not an alternative to
GLMs, but part of the GLM framework. Which particular transformation and
link function you use depends whether you want to estimate arithmetic
means, harmonic means, algebraic means or geometric means, and whether you
want to know their differences or their ratios.

Often, these assorted means are really proxies for the median. If the
conditional distribution of Y, given a value for the X-variates, is
expected to be skewed, then the algebraic, harmonic or geometric mean may
be a better proxy for the median than the arithmetic mean. It is possible
to derive confidence intervals for median differences between groups,
median ratios between groups, or even median slopes of Y with respect to X
(Newson, 2002). However, it is not so easy to derive confidence intervals
for median differences, ratios or slopes adjusted for multiple X-variates.
Therefore, we often must use a proxy for the median. Whether you choose the
arithmetic, geometric, harmonic or algebraic mean depends on which is the
best proxy for the median. For instance, if I want to estimate ratios
between median values for groups of viral loads estimated by polymerase
chain reaction (PCR), then I usually use geometric means and their ratios,
because the median is approximately the geometric mean. The choice of
variance function depends on how the conditional variance of Y given X is
expected to be related to the conditional mean of Y given X.

A good up-to-=date book on GLMs is Hardin and Hilbe (2001), which I
reviewed for The Stata Journal (Newson, 2001).

I hope this helps.

Best wishes

Roger

References

Hardin J, Hilbe J. Generalized linear models and extensions. 2001; College
Station, TX: Stata Press.

Newson R. Review of Generalized Linear Models and Extensions by Hardin and
Hilbe. The Stata Journal 2001; 1: 98-100.

Newson R. Parameters behind "nonparametric" statistics: Kendall's tau,
Somers' D and median differences. The Stata Journal 2002; 2: 45-64.

--
Roger Newson
Lecturer in Medical Statistics
Department of Public Health Sciences
King's College London
5th Floor, Capital House
42 Weston Street
London SE1 3QD
United Kingdom

Tel: 020 7848 6648 International +44 20 7848 6648
Fax: 020 7848 6620 International +44 20 7848 6620
   or 020 7848 6605 International +44 20 7848 6605
Email: [log in to unmask]

Opinions expressed are those of the author, not the institution.

------------------------------------------------------------------------------
From: "Langton, Steve D (ESD)" <[log in to unmask]>
To: "'R. Allan Reese'" <[log in to unmask]>
Subject: RE: why do GLMs?
Date: Tue, 6 Aug 2002 10:19:41 +0100

I've just noticed this discussion and thought I'd add my thoughts.

I'm all for using GLMs where there is an appropriate distribution to use.
Hence I'll always use logistic regression for binomial data, making
allowance for overdispersion if necessary. However, we mustn't be fooled
into making simplistic judgements about the appropriate distribution. The
best example of this is the myth that all count data should follow a Poisson
distribution. In biology there are frequently (perhaps usually) good
scientific reasons why processes are much more complex than Poisson, and
hence simplistic application of a Poisson GLM is not sensible. With such
data I tend to prefer to treat the data as log-normal, although some
simulations I ran a while back suggested that a Poisson GLM with appropriate
correction for overdispersion worked quite well, even when the data wasn't
really Poisson.

Best wishes,
Steve

Steve Langton
DEFRA Statistics (C&S) Division Tel: 01904 455100
Foss House Fax: 01904 455254
Kings Pool GTN: 5137
1-2 Peasholme Green
York YO1 2PX
E-mail: [log in to unmask]


*****************************************************
End of summary



R. Allan Reese Email: [log in to unmask]
Associate Manager Direct voice: +44 1482 466845
Graduate Research Institute Voice messages: +44 1482 466844
Hull University, Hull HU6 7RX, UK. Fax: +44 1482 466436
====================================================================
Hull University: one of the "Access elite" (THES 18/1/02)
Widened access / Low drop out / Excellent teaching / Excellent research
...
Hull University: one of the hardest funding cuts by HEFCE (THES 3/3/02)

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

May 2024
April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000
1999
1998


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager