I come before you with a question about out-of-sample prediction and your help would be very much appreciated.
The original dataset consists of 46 countries with treatment cost (from a systematic review) and gross national income (GNI) data. The aim is to predict treatment cost in another sample of 124 countries where only GNI is available. I used a simple log cost ~ log GNI OLS regression model based on the n=46 countries sample, then used the regression model to predict log cost for the 124 countries, back exponentiated to obtain predicted point estimates (adjusted for bias in exponentiating the error term) and also calculated 95% prediction intervals (first on log scale, then exponentiated to level scale). These predicted cost estimates and their ranges will further enter a simulation model where values will be sampled in '000s iterations.
I have two questions:
1. Is it correct to use prediction intervals as opposed to confidence intervals to estimate treatment cost in out-of-sample countries?
2. For the sampling exercise, what sort of distribution is it appropriate to sample predicted cost values from? I would have thought log-normal, but I am not sure.
Thank you very much.
You may leave the list at any time by sending the command
to [log in to unmask], leaving the subject line blank.