Hi,
I have got some very interesting
remarks concerning my email of yesterday ,
thanks to all All stats users for
their collaborations, and this is the
list of responses:
This is the response of Stas
> 1/Can I use all my variables in the method ,
> X1,X2,X3,X4, without the assumptions of Gaussian
> (normal) distribution?
You don't need any assumptions on your explanatory variables, at least
as long as you treat them as fixed covariates. If they are random and say
are measured with error, then you would have to use some special methods
econometricians are keen on that control for endogeneity.
> 2/ What happen if some of my explanatory variables are correlated?
Just as it would happen in the linear regression. Your estimates will
be somewhat inefficient, and you won't be able to really tell much of the
effect of one of the variables on the outcome. You can try out all of
the methods of shrinkage (principal component regression was mentioned, and
there is a number of others), but mainly as an exercise out of
curiosity.
The views expressed by the original response are way too strong.
> 3/ What is the important of the Intercept.
It allows the model to get the right shift to match the observed
proportion of 1s. Again, like in the linear regression: you don't
really interpret the constant unless you have some special meanings to the
levels (or the means) of your explanatory variables (such as being centered at
zero, or something of that kind).
> 4/In SAS what levels must I use for the method,
> alpha =0,05 or 0,1 for entry and Stay options will be OK?
> 5/If my model is fitted, can I obtain the probability
> confidence interval of the probability of event (Y=1)
> of a new observation or individual.
Sorry, I am a Stata person, not a SAS one. For 5), however, you would
need full information on the values of the explanatory variables to proceed
with the prediction.
> 6/ There exists any interesting graphical
> presentations to show the relation between the
> appropriate variables and the probability.
I'd simply plot the predicted probability versus the explanatory
variable (keep in mind the effect of other variables though!). You can also
indicate 0/1 outcomes on bivariate plots of two explanatory variables
(i.e., one of the regressors vs. another one, plus the labels of
observations -- 0/1 according to the observed value of the outcome).
This will give you some idea of the strength of the bivariate relations, as
well as the presence of outliers (a lonely 1 surrounded by zeros).
And yes, Agresti is a good book written with health applications in
mind.
For Scott this what he thinks
Blaise's responses 1 and 2 are, I believe, incorrect.
1. In classical logistic regression, you condition on the explanatory
variables. That is, they are assumed to be fixed, not random. Thus,
you needn't worry about their distribution. In particular, I have
never before seen the argument that the explanatory variables should be
normally distributed in order for inferences to be valid! If this
claim is true - and I think it is not - it would be nice to know a reference
to support it.
2. Including correlated explanatory variables in a regression model
does NOT produce "wrong" results! Using PCA to get uncorrelated (but
probably difficult to interpret) predictors may sometimes be a good
idea, but it is not necessary. Leaving correlated explanatory
variables in a model is NOT necessarily a bad idea. You just need to be clear
about interpretation of all the parameters.
I agree generally with the rest of Blaise's comments, and especially
with his recommendation of Alan Agresti's book.
For Paul
Logistic regression makes no distributional assumptions about the
Predictor variables.
Thank you again, and hope that the discussion will continue.
Regards
Adel
---------------------------------
Créez gratuitement votre Yahoo! Mail avec 100 Mo de stockage !
Créez votre Yahoo! Mail
Le nouveau Yahoo! Messenger est arrivé ! Découvrez toutes les nouveautés pour dialoguer instantanément avec vos amis.Téléchargez GRATUITEMENT ici !
|