> Dear Allstaters
>
> Thank you once again for your generous and informative help.
>
> The background was my use of regression analysis (Statistica GLM module) to investigate relationships between various blood components. An example might be the glycated subfraction of haemoglobin (HbA1c) and mean blood glucose. One would expect some sort of 'correlation' between the two. Age and gender might be added to the regression model as possible confounding factors. The problem is that all laboratory tests have an associated analytical error, which we know in advance. For example the CV of these two tests might be 5% and 3% respectively. The CV might be higher at lower values, especially with some chromatographic and immunoassay techniques. Some tests may be more imprecise, others less.
>
> I submitted some work using this sort of analysis, but on this occasion the reviewer suggested that some of the conclusions were invalid because of measurement error. I therefore was looking for a way to incorporate analytical error into the regression models.
>
> I have appended some of the (anonymised) replies below, but it is obvious from the first two that my suggestion of 'fuzzy regression' is a non-starter. A Bayesian approach has been suggested, and indeed specific mention of measurement error is found in section 9.6 of the BUGS 0.5 manual. I am going to take this and Congdon (2003) away for holiday reading!
>
> The responses:
>
> Fuzzy is hopeless and logically unsound. Why don't you set your model in a Bayesian framework in which case you will have (I think) a hierarchical model. See Bayesian Statistics: an introduction, Peter M Lee, Arnold Publishers, or any other good text book on Bayesian analysis (there's also lots of software such as WinBugs etc). If you don't like Bayes, why not set up your hierarchical model so that you can use generalized linear models which allow variances and means to be related.
>
> ****************
> Your problem is soluble, but use of the term 'fuzzy' does not help. Fuzzy logic is a popular but unsound approach to a variety of inference problems. I strongly suggest you stay away from anything that calls itself fuzzy.
>
> From a Bayesian approach, the problem formulation seems fairly straightforward. You have a linear model:
>
> Y = alpha + beta1 * X1 + beta2 * X2
>
> In the Bayesian formulation Y, beta1, beta2,... are all random variables. You have to specify their distributions. You could set them to normal distributions if you want. Then you could have more equations to set the means and variables of the distributions.
>
> Y ~ N(alpha + beta1 * X1 + beta2 * X2, var0)
>
> alpha ~ N(0, var1)
> beta1 ~ N(0, var2)
> beta2 ~ N(0, var3)
>
> The variances can be specified as specific numbers or by further formulas like
>
> var0 = alpha2 + beta3 * X1
>
> You would have to specify prior distributions for alpha2, beta3, such as
>
> alpha2 ~ N(0,10000)
> beta3 ~ N(0,10000)
>
> Such a model could be estimated using the WinBUGS Bayesian package available free at the Medical Research Council's BUGS site.
>
> If you are not familiar with Bayesian methods see the excellent book Bayesian Data Analysis by Gelman et al.
>
> ******************
>
> If you are seeking the relationship between 2 variables and both have error then you need to fit what is called an error measurement model. (there is a book by Cheng and van Ness called Statistical regression with measurement error,1999), these are also called errors-is-variables models. However you need to know the variance of the errors. If you have repeated observations you can estimate these.
>
> If you cannot estimate the error variances then I recommend that you use the geometric mean functional relationship. The name comes from the fact that the slope is given by the geometric mean of the slopes arising from regressing y on x and x on y. i.e. this slope is intermediate between these 2 regression slopes. Fortunately there is a simple formula for the resulting slope:>
> b = sigma (y) divided by sigma(x) , with the sign of b given by the sign of the correlation.
>
> Beware of people who recommend using orthogonal regression; it's not units invariant i.e. if you change the units of measurement you get a different relationship!!
>
> ************************
>
> You could try running an initial regression model and calculate the error term for each observation from that (ie the actual dependent - fitted value). Then you could re-run the regression model as before, but this time including, as an extra independent variable, the analytical errors as calculated from the first model. That way, you end up with a standard multiple regression model and can derive all the diagnostics in the usual way.
>
> *************************
>
> So the magnitude of the error is proportionate to the magnitude of the response variable ie the errors are multiplicative rather than additive? The conventional approach would then be to take logs and then use normal linear regression?
>
> You may also wish to consider the jack-knifing or bootstrap approach of non-parametric regression, and/or using iteratively re-weighted least squares, to get robust estimates of your parameters
>
> Once you have fitted a model [via regressions or otherwise] the sum of the squares of the differences between the actual values and the fitted values is a basic measure of the goodness of fit that could be used to compare different models.
>
> A simple non-parametric measure allowing for different magnitude responses at different levels of input variables would be to look at the absolute differences [fitted vs actual] relative to the actual value - and then to look at the distribution [maybe the average?] of these ratios.
>
> ***************************
>
> 1) in Ordinary Lest Squares (OLS - what Gauss did), the assumption is that the residuals of the fit are Normally dist., and have equal variance. If your residuals do not show a pattern ('information'), then OLS will work fine for you.
>
> 2) As I recall, OLS can make good point estimate predictions without assumptions on the variance structure of residuals. It is only when you ask for confidence intervals (analytical error?) that you have to assume residuals are 'well behaved.'
>
> 3) OLS presumes that the x variables are error free - perfectly measured. To include such variation, I think you will want to explore PLS, PCA, and other 'next step' regression methods.
>
> PLS = Partial Least Squares
> PCA = Principal components analysis
> etc.
>
> ****************************
>
> Many, Many thanks again
>
> Peter
>
> Peter Hudson
> Principal Biochemist
> Wrexham Maelor Hospital
> Wales, UK. LL13 7TD
> [log in to unmask]
>
|