Print

Print


Dear all,

I am a MSc OR student and need some advise on how to
handling sampling biases.

My data set, which would be used to develop the model,


is not a good representation of the population. There

are huge differences between the population and the

sample in terms of the explanatory variables
(categorical).

For example, the data contains 30% of one of the

levels of a variable, whereas, it is known that the
true

proportion is about 10%. Due to time constraint and

limited resource, it is impossible to re-collect the

data according to the population portion. I have to

use the data to build the model. Since the sample is

not very representative, My questions are:



1)Is there any method to adjust the data to make it

looks similar to the population?

2) Is the adjustment necessary ? Do I have to deal
with the bias ?  Due to the data is highly skewed, I
am going to use logistic regression to build my model.
To what degree, the bias would distort my final
finding ?

Thanks for your help

Regards,



__________________________________________________
Do You Yahoo!?
Make international calls for as low as $.04/minute with Yahoo! Messenger
http://phonecard.yahoo.com/