Print

Print



Hi All,


I have a question with regards to computing variable importance for a single new observation in a random forest. Suppose I fit a random forest model to some training data and get a list of variables and their importance. I can then use the model to predict the response value for a new observation.


My question is how would I determine the variable importance specifically for that new observation?


My thoughts:


[1] Treat the variable importance values as weights and multiply then by the new observation vector and then rank the variables based on this.


[2] compute the predicted value of the new observation, y_hat say, and then remove each variable in turn and compute a new predicted value y_hat(-i) say with the ith variable removed. Then compute d(i) = |yhat-yhat(-i)| for each variable and rank them based on d(i). The variables with the largest d(i) are the most important for that observation


Or is there some other standard way?


Any help most appreciated


Best Regards,



Richard

You may leave the list at any time by sending the command

SIGNOFF allstat

to [log in to unmask], leaving the subject line blank.