I am carrying out a regression analysis that relies on pseudo-likelihood for parameter estimation (Competing Risks), on a set of 50 regressors (5K+ observations).

Unsurprisingly, a good deal of them are not significant. I need a predictive model with a good trade-off between simplicity and predictive precision and am wary of simply chopping off non significant regressors.

A classic stepwise selection procedure cannot be applied because the pseudo-likelihood estimate is not indicative of the information captured by the model.

In order of complexity, I could:

1) Try to adapt my tool (R package cmprsk) to implement a penalized regression procedure, but I lack the time and expertise to generate a reliable procedure; or

2) bootstrap for precision of estimates for all possible combinations of regressors in stepwise fashion and select the best performer. The drawback is lack of time and processing power; or

3) use an FDR/FWER procedure by running the regression with each predictor in turn, and using
FDR/FWER algorithms to chuck out the weakest suits - as it were - and running the final model with the survivors. The drawback is that it may be too conservative, lose some explanatory power, and not being generally good practice.

Is anyone able to point out a reasonable compromise solution for this requirement?

Regards

Giulio Flore


You may leave the list at any time by sending the command

SIGNOFF allstat

to [log in to unmask], leaving the subject line blank.