Dear all,
I am a MSc OR student and need some advise on how to
handling sampling biases.
My data set, which would be used to develop the model,
is not a good representation of the population. There
are huge differences between the population and the
sample in terms of the explanatory variables
(categorical).
For example, the data contains 30% of one of the
levels of a variable, whereas, it is known that the
true
proportion is about 10%. Due to time constraint and
limited resource, it is impossible to re-collect the
data according to the population portion. I have to
use the data to build the model. Since the sample is
not very representative, My questions are:
1)Is there any method to adjust the data to make it
looks similar to the population?
2) Is the adjustment necessary ? Do I have to deal
with the bias ? Due to the data is highly skewed, I
am going to use logistic regression to build my model.
To what degree, the bias would distort my final
finding ?
Thanks for your help
Regards,
__________________________________________________
Do You Yahoo!?
Make international calls for as low as $.04/minute with Yahoo! Messenger
http://phonecard.yahoo.com/
|