Moving or choosing the origin - adjusting the intercept in models
On allstat, I wrote that a regression predictor x could be expressed as
x-z, thus moving the origin, and asked for advice or references as to
when and how z could be legitimately and usefully chosen as a part of
the analysis and interpretation. The equations f(x) and f(x-z) would be
equivalent in goodness of fit, and if multiplied out would give the same
coefficients for terms in x. However, a model expressed in x-z might
have a direct relevance to the domain. The example that had
precipitated this was that of regressing UK government health spending
on time; it nicely fitted a straight line with the origin in 1947.
I received several helpful and stimulating replies from allstat, which
are appended. However, I eventually realized that what I am referring
to is simply an equivalent *re-parameterization* of a model, and
related to (but different from) the OFFSET concept in Glim (where the
offset is added to the linear predictor). Since the extra parameter z
does not change the *algebraic* fit, justifying or optimizing its value
must be based on another logic.
A web search on intercept correction located another interpretation in
econometrics. Of 146 hits, at least 141 mentioned the work of Michael
Clements & David Hendry, and it was their books that I had
serendipitously found in our library. Forecasting Economic Time Series
(CUP 1998) p180 states that "published [economic] forecasts reflect in
varying degree the properties of the models and the skills of the
models proprietors. ... adjustments (often extensive) are often made to
the model-based predictions in arriving at a final forecast, typically
to the constant terms or intercepts in the models equations." Rothman
(J Econ Lit), reviewing their follow-up book Forecasting Non-stationary
Economic Time Series (MIT 1999), comments, "[as a student] I initially
thought that this corrective procedure was an ad hoc device used to
cover up and mask poor model performance" but goes on the commend the
book.
My understanding is that Clements & Hendry are describing attempts to
incorporate into a mathematical model the qualitative information or
opinion that reflects the wider economic and social effects. One
unfortunate example they give, however, is to correct a moving average
process by treating it as autoregressive: instead of forecasting
y(t+1) = mu the mean, forecast y(t+1)= y(t). That is Demings classic
case of management by interference. It also worries me that C&H use the
term non-stationary to describe structural breaks in the generating
system; I think it is misleading to talk of a model with a structural
break when what may be implied is a totally new situation. Forecasting
generally fails for just that reason, which is why a model based on
data pre black-Wednesday / 9.11 / hurricane Isabel may need more than a
tweak to its parameters to continue to be useful. I would distinguish
broken-stick models from models in which the parameters are functions
of time.
The processes described above are what I would term *heuristic*, and
justify the health-spending example at least to the extent that it is
common practice to make adjustments so that the model more faithfully
reflects external information. The Glim OFFSET directive reflects a
stronger imperative to make a model conform to fixed data points, or
*prior knowledge*. One case of this is to fit a model with / without an
intercept, the latter imposing the strong condition that y=0 when x=0.
Those two models do not, however, generally have the same goodness of
fit. Nelder's concept (see emails) of Functional Marginality must be a
consideration. Arbitrarily insisting on a line through the origin when
x=0 is well outside the data range may not be sensible.
Other reasons for moving the origin are computational. Computational
imperatives should be incorporated into software, but the use may still
depend upon the understanding of the user. Humphrey, Newson and Swank
below all suggest centring on the range of the predictor or near to the
point of inflexion when fitting a quadratic. Their reason is that
otherwise x and x^2 will be very collinear. Hopkins
(www.sportsci.org/resource/stats/polynomial.html) adds the heuristic, "I
find it easier to interpret [the quadratic coefficient] if I transform
the X values so they range from -1 to +1," but this is exceptional.
Websites promoting software do not recognise the problem. One striking
example shows a regression based on salary, salary^2 and salary^3 (The
Data Mining Group, www.dmg.org/v1-1/polynomialregression.htm). SPSS,
using both REGRESSION and CURVEFIT did not make any adjustment and
hence the quadratic fit was not found; there was one figure labelled
collinearity but no assistance in interpreting this. Genstat output
included a clear statement that year and year^2 were collinear, so
implying a poor fit.
I would expect software that fitted polynomial terms to do so using
orthogonal polynomials, but this is not promoted, and apparently SPSS
CURVEFIT does not. One of the oldest textbooks on my shelf, the
excellent "Statistics Manual" by Edwin Crow et al (originally a US
Naval Ordnance document but reprinted 1960 by Dover)states, "We note
that x, x**2 and x**3 are certainly not statistically independent [as
predictors], but independence is not necessary in multiple regression
analysis."
In view of the prevalence of statistic models using polynomial fits, it
worries me that I have been unconscious to such problems for so long and
that the problem may generally not be addressed in courses. Note that
there are both statistical and computational issues - most users assume
that the computer will deliver the correct answer.
Further comments or references from allstat members are invited,
especially from anyone who can offer a *short* defence of econometrics!
Allan
----- Initial query
Date: Mon, 8 Sep 2003 16:43:44 +0100 (BST)
From: R. Allan Reese <[log in to unmask]>
Subject: Question: where's time zero?
I am working on time series and removing obvious trends. Something
which
has me intrigued is that by choosing different times as the origin I can
fit equivalent but different models, but I do not recall seeing this
discussed in my training or in the literature.
For example, my data run from 1960 to 2002. A linear trend will have
the
same slope regardless of whether it is regressed on year or (year-1900)
or
(year-1960) but will have different intercepts. As these are economic
data, it was of interest to note that "solving" the regression suggested
an intercept of zero in the year before the programme began, so there
was
a logic in choosing that as the origin year.
Some of the series, however, require a curved fit, and a quadratic was
used as a first approximation. Fitting year^2 may be equivalent to
fitting (year-z)^2, but the first caused a numerical failure in the
algorithm (tolerance exceeded). I had a pragmatic reason for choosing a
value for z in the centre of the distribution. Different z's give the
same overall fit (of course) but strongly influence the coefficients on
lower powers.
It seems to me therefore that the choice of z ought to be a
consideration
in the analysis, maybe using a pragmatic or theory-based value. If z is
considered another parameter, which criterion should be "optimized"
given
that all models fit equally? How would you define the "simplest" model?
There may be a connection with fitting orthogonal polynomials, so adding
the kth order does not change the coefficients on k-1 etc, but this
seems
to me an extra topic.
Comments or references to existing literature, sent to me, would be
welcomed.
R. Allan Reese Email: [log in to unmask]
Associate Manager GRI Direct voice: +44 1482 466845
Graduate School Voice messages: +44 1482 466844
Hull University, Hull HU6 7RX, UK. Fax: +44 1482 466436
---------------------------------------------------------------------
Date: Tue, 9 Sep 2003 10:47:33 +0100
From: [log in to unmask]
To: R. Allan Reese <[log in to unmask]>
Subject: Re: Query: where's time zero?
I don't think choices of z should have influenced the lower order terms,
as long as the numerical accuracy of all the coefficients is adequate.
Note in (year-z)^2 you have a term 2*z*year which should be counted in
the
linear term. The total contribution of the linear term should be the
same.
Jason
GlaxoSmithKline
---------------------------------------------------------------------
Date: Tue, 9 Sep 2003 12:23:20 +0100 (BST)
To: [log in to unmask]
The point you make is exactly what I mean by being equivalent models,
but
it makes a big difference to the interpretation to say that
spending = b1 * year + c
or
spending = b2 * (year - z)
where z is now a meaningful figure and the "constant" is not
significant.
In the quadratic term, I can see why there are numerical instabilities
in
trying to fit 1960^2 to 1990^2, rather than fixing the origin at 1979 to
make the range -20 to +10.
I'll post a summary of replies. The question overlaps with
"identifiability", except that is usually applied to a constraint that
makes a system mathematically soluble.
Allan
---------------------------------------------------------------------
Date: Tue, 09 Sep 2003 10:54:04 +0100
From: Roger W Humphry <[log in to unmask]>
Can't really help except to say I was recommended to use the midpoint
for
z and as you say, it made little difference to the estimates for the
turning point etc.
yours,
Roger
---------------------------------------------------------------------
Date: Tue, 9 Sep 2003 13:37:27 +0100 (BST)
To: Roger W Humphry <[log in to unmask]>
As usual, I'm relieved not to be immediately deluged with "Didn't you
know
that!!!!" No solid leads yet, but I will summarize to the list.
Allan
---------------------------------------------------------------------
Date: Tue, 09 Sep 2003 13:44:32 +0100
From: Roger Newson <[log in to unmask]>
Most people would centre the time axis, choosing a zero time and
substituting t-t_0 for t in the model. In this case, the intercept
parameter is the value of the quadratic at t_0, the linear parameter is
the rate of change at t_0, and the quadratic parameter is the constant
acceleration rate (the second derivative), which is the same at all
times.
The time t_0 is usually chosen to be central, or at least inside the
range
of the data.
An alternative approach might be to treat the quadratic as a special
case
of a quadratic spline, and to parameterise it by the values of the
spline
at the beginning, middle and end of the range. The method for doing this
(in the Stata statistical package) is in my paper (Newson, 2003). I have
used splines extensively in my work on time series of asthma-related
hospital admissions.
Newson R. B-splines and splines parameterized by their values at
reference
points. Downloadable as from 10 June 2003 from my website at
http://www.kcl-phs.org.uk/rogernewson/
Roger Newson
Lecturer in Medical Statistics
King's College London, London SE1 3QD
---------------------------------------------------------------------
Date: Tue, 9 Sep 2003 08:47:57 -0500
From: Paul R Swank <[log in to unmask]>
With a quadratic model with positive values for the predictor, X, X and
X^2 are typically highly correlated (for example, if x = 1 to 10, the
correlation of X and X^2 is about .97). Thus, by centring X, using Z for
instance), you reduce the collinearity in the model and allow a
solution.
Paul R. Swank, Ph.D.
Professor, Developmental Pediatrics
Medical School, UT Health Science Center at Houston
---------------------------------------------------------------------
Date: Tue, 9 Sep 2003 17:48:40 +0100 (BST)
To: Paul R Swank <[log in to unmask]>
Thanks for that advice, which will go in the summary. Does that suggest
one might choose z so that the correlation of t and (t-z)^2 is
minimized?
Allan
---------------------------------------------------------------------
Date: Tue, 09 Sep 2003 22:26:04 -0500
From: Jay Warner <[log in to unmask]>
I believe that, mathematically speaking, your choice of z is not an
issue.
Yes, of course the coefficients will change, dramatically. And your
software may not like to handle ind. vars with sizes of near 2000. (the
probable source of your error msg.)
However, since the regression _assumes_ the x values were measured
exactly, it doesn't matter whether you use z = 0, z = 1900, or z = 1981.
It does matter if your software rounds those itty bitty digits near the
end of the string :) It does matter if your software does not
internally
adjust the x' = 0 to the center of your indep. variable. Which it
should,
if it is a self-respecting software. Reason: only if the x's are
centered will the correlation between x^2 coef and x coef be (near)
zero.
Your equation may be able to explain your data, but unless you do the
regression properly, you may not be able to predict anything in the
future
with it.
Cheers, and hope you find a real expert at this,
Jay
Jay Warner
Principal Scientist
Warner Consulting, Inc. Racine, WI 53404-1216, USA
---------------------------------------------------------------------
Date: Wed, 10 Sep 2003 16:35:35 +0100 (BST)
To: Jay Warner <[log in to unmask]>
Thanks, but the choice of z precisely *is* the issue. The numerical
stability is a minor issue, but introducing z as an additional parameter
raises the problem of identification. But I'm getting other good
thoughts
from others.
Allan
----------------------------------------------------------------------
Date: Wed, 10 Sep 2003 14:41:42 +0100
From: "Nelder, John A" <[log in to unmask]>
The idea you need is that of functional marginality. I have a paper in
J.Appl.Stats. but cannot give you exact reference because I am at home
prior to going to S.Africa. Let me know then if you can't find it.
John Nelder.
---------------------------------------------------------------------
Date: Wed, 10 Sep 2003 16:55:45 +0100
To: "Nelder, John A" <[log in to unmask]>
Thanks. I had no problem tracing the reference and we have an online
subscription so it has printed at my desk.
Nelder JA Functional Marginality and response-surface fitting. J Applied
Statistics (2000) Vol 27 No 1 pp109-112
--- Functional Marginality is important. Letter+discussion. Appl
Statistics (1997) Vol 46 No 2 pp281-286
Functional marginality is a consideration, but, as you state, y = a +
b(x-z) + c(x-z)^2 gives the same goodness of fit for all z. It is a
question of identifiability to fix z. FM applies if the criterion for
choosing z is that lower order terms can be dropped, given that z=0 has
no a priori logic. That is essentially what I did, without putting
the name FM to it. The example was UK health service spending. It was
striking that spending extrapolated as zero in 1947, but I suspect it
was a fluke.
Have a good trip,
Allan
--------------------------------------end of summary
--- End Forwarded Message ---
--------------------------------------------------------------
R Allan Reese Email: [log in to unmask]
Graduate School
University of Hull
Tel +44 1482 466845 Fax: +44 1482 466436
|