Hi everyone,
I posted 2 queries to the list a short while ago. Here are the replies:
************************
( QUERY 1)
A small query....re. interactions concerning dummy variables and their
interpretation in a regression model.
I have 2 binary variables. One has categories "male" (1), "female" (0);
the other has categories ">35 years" (1) and "<35 years" (0).
Say, in a hypothetical example, we have the following data
Sex age
1 0
1 1
0 0
1 1
1 1
0 1
0 0
I form an interaction like so:
Sex*age
0
1
0
1
1
0
0
Now, here for the sex*age interaction a '1' is formed only when a person
is 'male' and >35 years....hence we only get a contribution to the
fitted value in a model from the interaction term for males over 35.
How can we evaluate the contribution to a model from the individuals who
are 'female and >35'; 'male and <35' and 'female and <35' when, as we
can see, their contribution to the model (for the interaction)using this
coding scheme is zero ?
As an alternative, would it be correct to generate a categorical
variable with the following codes:
<35 and male 1
>35 and male 2
<35 and female 3
>35 and female 4
And then for modelling create 3 dummy variables like so:
<35 and male 1 0 0
>35 and male 0 1 0
<35 and female 0 0 1
>35 and female 0 0 0
Many thanks,
Kim.
********************************************
ANSWER
Here is a succinct version of the general consensus:
In the data we were looking at, we were performing regression
modelling. We had 4 possible categories for the independent variables:
<35 and male
<35 and female
>35 and male
>35 and female
Thus there are only four possible parameters that can be fitted (these
can be 3 dummy variables ['age', 'sex' and 'age*sex'] and a 'constant'
where the 3 dummy variables are:
Age sex age*sex
1 -1 -1
1 1 1
-1 -1 1
-1 1 -1
Or
Age sex age*sex
1 0 0
1 1 1
0 0 0
0 1 0
Or we could generate the 3 dummy variables in the model by producing a
categorical variable
with the following codes:
<35 and male 1
>35 and male 2
<35 and female 3
>35 and female 4
And then for modelling, create the 3 dummy variables like so (for
example):
<35 and male 1 0 0
>35 and male 0 1 0
<35 and female 0 0 1
>35 and female 0 0 0
Or
<35 and male 1 0 0
>35 and male 0 1 0
<35 and female 0 0 1
>35 and female -1 -1 -1
For this final coding, however, you lose the ability to separate the
three degrees of freedom into one for sex one for age and one for the
interaction.
The resulting models will be equivalent as regards 'fit' but will show
differing 'significance' for the various parameters depending on choice
of coding. A model which fits all of the 4 parameters is said to be
saturated. There are two different conventions regading when a model is
saturated. The first is when all observations would be perfectly
predicted. This requires as many parameters as observations and is not
the case in this example. The second is when there are as many
parameters as can be fitted given the predictor structure. This is the
case in this example. However, by adding more covariates you could add
more parameters.
***************************************
QUERY 2
Hello everyone,
I am about to embark on a Cox Regression in SPSS. However, I have some
questions about the output. If we take the hypothetical data set below
Age sex time death
1.00 1.00 10.00 1.00
.00 1.00 12.00 .00
.00 .00 12.00 1.00
1.00 .00 21.00 .00
.00 1.00 45.00 1.00
.00 1.00 21.00 .00
.00 1.00 .00 .00
1.00 .00 2.00 .00
1.00 .00 1.00 .00
1.00 .00 4.00 1.00
1.00 .00 3.00 1.00
.00 .00 67.00 1.00
.00 1.00 33.00 1.00
.00 .00 21.00 1.00
.00 1.00 2.00 .00
1.00 .00 22.00 1.00
1.00 1.00 1.00 .00
1.00 .00 6.00 .00
1.00 .00 5.00 1.00
.00 .00 3.00 .00
.00 .00 2.00 1.00
.00 .00 7.00 .00
.00 1.00 55.00 1.00
1.00 1.00 3.00 1.00
1.00 1.00 1.00 .00
I have entered in the dialogue box that Sex and Age are the covariates,
'time' is the 'time' variable and 'status' is variable named death (with
'1' as the event).
I find in the output that 4 cases have been omitted. There is 1 case
omitted with 'non positive' time. What does this mean? There are 3
cases which have been omitted which are 'censored cases before the
earliest event in a stratum'...what does this mean? Hence the means
printed for sex and age are 0.381 and 0.429 respectively. I would be
grateful if you could tell me how these values have been evaluated.
Also (for another data set (where I run a Kaplan Meier analysis the
following message is printed:
>"A negative or missing value of dependent variable has been
>encountered. KM will exclude such cases from analysis".
What does this mean?
The SPSS manuals do not seem to be very enlightening on this point.
******************
ANSWER:
The case with time =0 has been omitted.
Also, the earliest event occurs in month 2, so we omit the 3 patients
which are 'censored' and who lie before month 2 (i.e. in month 1).
Many thanks, everyone for your help,
All the Best,
Kim.
|