I have a dataset of close to 20,000 surgical patients. The goal is to develop a risk algorithm for a particular post-operative complication so that by plugging in a number of pre-operative risk factors, the surgeon can have the risk calculated. Based on a risk cut-off, prophylactic treatment would be started at the end of surgery. This complication rate is about 20%+ so we are talking about a significant number of cases. When I get this working, we are planning a clinical trial to determine if using a risk algorithm will actually reduce the incidence.
what I have done:
a) split patients into a test group for developing the algorithm and then a confirmation group to run it against. The ratio is 2 test: 1 confirm.
b) ran multiple univariate chi-squares to see which parameters were associated with developing the condition. Most of them were ones we thought would correlate.
c) built logistic regression (manually manipulating parameters in the model) using various combinations. There are 9 factors which always show up as highly significant (based on type 3 effects and odds ratios). The Hosmer Lemeshow p-value is >0.7 while the 'c' is also > 0.71
d) probabilities run from 0 to about 0.8
e) when i combine the predicted probabilities into groups centered about a values ± 0.025 or ± 0.5, and then plot the predicted values against observed values for these patients, I get a fairly good straight line. Lower values seem to lie right on line, but higher values scatter some, although with no obvious pattern
The Problem:
The surgeons want this on an internal web site so that they can plug in the values and get a probability. I am trying to set up the code to do the calculations. I am doing something wrong, because the prob I calculate is not the same as the ip_yes value that SAS calculates.
I take the coefficients of the parameters I am using ('estimates' from max likelihood estimates), multiply it by 0 or 1 depending on absence/presence for categorical variables, multiple it by the value (eg: age in years) for continuous values, add them up and to the intercept (let's call this value 'tot'). Then use equation:
prob = exp(tot)/(1 + exp(tot))
The calculated values are always larger than SAS scores by about 10-20%. I have done the hand calculation for several patients.
Any ideas.
Thanks in advance for the help. Sorry for lengthy posting
Morley Herbert
|