I would be interested in a statistician's response to the following. Because it
is hard to measure or estimate otherwise, I want to predict the weight (biomass)
of trees based on simple measurements (e.g., trunk diameter, height). I choose
(not randomly) a group of trees in a range of sizes, measure them, cut them
down, weigh them, and perform a linear regression. I get a high R2, a highly
significant regression (although the intercept term is marginally not
significant) and a highly significant independent variable using log(height x
diameter) as the independent variable and log(biomass) as the response, sample
size 11. I now want to use the equation to predict the biomass of trees given
their height and diameter. What statistical problems, logical pitfalls,
deathtraps, etc. do I need to be aware of, resulting from or in addition to the
fact that sampling was not random?
Also, I would like to estimate prediction error prior to generating any
predictions from data (I want to know what kind of error I can expect in making
predictions, and in what ranges of the independent variables).
|