Not an expert in T&R and without much context, it looks like the assumption of independence of errors in linear regression is violated. You may try Prais–Winsten regression instead (https://en.m.wikipedia.org/wiki/Prais–Winsten_estimation) or something similar such as ARMAX. Similar to your “log(t) - log(t-1)”, time effect was controlled for in the PW model but in the error term.
Queen Mary University of London
Ps this is the second post since wrong email was used and caused an error by UTSG.
From: Rob Bain <[log in to unmask]>
Sent: Friday, November 24, 2017 4:30 pm
Subject: [UTSG] (Mis)Use of Econometric Models??
To: <[log in to unmask]>
I'm very interested in any list-subscriber's views on the following...
In my reviews of toll road traffic and revenue (T&R) forecasts I'm frequently presented with econometric models. Typical uses are for forecasting traffic in a simple, brownfield corridor with limited alternatives or to grow matrices in the context of a broader network model. The econometric models are invariably cast as log-log formulations (giving all the usual good stuff eg. coefficients = elasticities) and the goodness-of-fit stats that result from model estimation are typically spectacular (R-sq > 0.9). The T&R consultants use this to trumpet their modelling capabilities and to give client comfort. Often unguarded (and unwise) comments follow about predictive ability - but let's ignore that rubbish for now. The point is that, for me, the R-sq is being misrepresented. The statistical significance of such regressions is used as if it is supportive of very strong (generally causal) relationships. However the high significance is typically only because the variables (eg. AADT and GDP or whatever) have upward trends - and that is what is being picked up.
For me the appropriate statistical treatment of trending variables is to estimate the relationship between the changes or growth rates of the dependent and independent variables. Inconveniently, however, this will show much noisier relationships than the log-log approach (lower R-sq's) however it is much better for model identification (selecting the right macro variables to use).
To test this, in a spreadsheet I created 20 years of annual random data for two variables A and B. I let the random variables vary between -5% and +5%. When plotted, these random variables (of course) suggest no relationship.
I then introduced two growth trends: I added +3% to all of the A values and +5% to all of the B values.
I then presented them as indices. Starting values = 100 in both cases. And lo and behold, a very strong (yet entirely spurious) correlation appeared!
I then logged both variables (log-log) and the spurious relationship remained (with a great-looking, client-pleasing R-sq of 0.95!)! However this is only 'capturing' the internal trends (both of which are upward).
I finally turned to growth rates and looked at log(t) - log(t-1) and the spurious relationship simply disappeared.
Any thoughts? I'm simply interested in improving modelling (and forecasting) practice. Few clients appear to pick up on any of this.
Investor Support Services
+44 1732 463314
NOTICE OF CONFIDENTIALITY. This communication is intended only for the use of the addressee and may contain confidential and privileged information. If you are not the intended recipient, you are notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately or, alternatively, immediately destroy this communication.