JISCMail - ALLSTAT Archives

Hi All,

I have a question with regards to computing variable importance for a single new observation in a random forest. Suppose I fit a random forest model to some training data and get a list of variables and their importance. I can then use the model to predict the response value for a new observation.

My question is how would I determine the variable importance specifically for that new observation?

My thoughts:

[1] Treat the variable importance values as weights and multiply then by the new observation vector and then rank the variables based on this.

[2] compute the predicted value of the new observation, y_hat say, and then remove each variable in turn and compute a new predicted value y_hat(-i) say with the ith variable removed. Then compute d(i) = |yhat-yhat(-i)| for each variable and rank them based on d(i). The variables with the largest d(i) are the most important for that observation

Or is there some other standard way?

Any help most appreciated

Best Regards,

Richard