JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for ALLSTAT Archives


ALLSTAT Archives

ALLSTAT Archives


allstat@jiscmail.ac.uk


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

ALLSTAT Home

ALLSTAT Home

ALLSTAT  March 2010

ALLSTAT March 2010

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Boosting Trees

From:

Lars Chi <[log in to unmask]>

Reply-To:

Lars Chi <[log in to unmask]>

Date:

Sun, 28 Feb 2010 19:04:19 -0500

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (60 lines)

Dear Experts,

Iím trying to understand how correlated predictors impact the Relative
Importance measure in Stochastic Boosting Trees (J. Friedman).  As Friedman
described ď Öwith single decision trees (referring to Briemanís CART
algorithm), the relative importance measure is augmented by a strategy
involving surrogate splits intended to uncover the masking of influential
variables by others highly associated with them. This strategy is most
helpful with single decision trees where the opportunity for variables to
participate in splitting is limited by the size of the tree. In the context
of Boosting, however, the number of splitting opportunities is vastly
increased, and surrogate unmasking is less essentialĒ.
Based on the results from the simulated example below (in R), if I have, say
two variables which are highly correlated, then the relative importance
measure derived from Boosting will tend to be high for one of the predictors
and low for the other.  Iím trying to reconcile this observation with
Friedmanís description above, which according to my understanding, these two
variables should have about the same measure of importance. I'll appreciate
your comments. Thanks in advance!


require(gbm)
require(MASS)
#Generate multivariate random data such that X1 is moderetly correlated by
X2, strongly
# correlated with X3, and not correlated with X4 or X5.
cov.m <-
matrix(c(1,0.5,0.9,0,0,0.5,1,0.2,0,0,0.9,0.2,1,0,0,0,0,0,1,0,0,0,0,0,1),5,5,
byrow=T)
n <- 2000 # obs
X <- mvrnorm(n, rep(0, 5), cov.m)
Y <- apply(X, 1, sum)
SNR <- 10 # signal-to-noise ratio
sigma <- sqrt(var(Y)/SNR)
Y <- Y + rnorm(n,0,sigma)
mydata <- data.frame(X,Y)
#Fit Model (should take less than 20 seconds on an average modern computer)
gbm1 <- gbm(formula = Y ~ X1 + X2 + X3 + X4 + X5,
data=mydata,
distribution = "gaussian",
n.trees = 500,
interaction.depth = 2,
n.minobsinnode = 10,
shrinkage = 0.1,
bag.fraction = 0.5,
train.fraction = 1,
cv.folds=5,
keep.data = TRUE,
verbose = TRUE)
## Plot variable influence
best.iter <- gbm.perf(gbm1, plot.it = T, method="cv")
print(best.iter)
summary(gbm1,n.trees=best.iter) # based on the estimated best number of
trees

You may leave the list at any time by sending the command

SIGNOFF allstat

to [log in to unmask], leaving the subject line blank.

Top of Message | Previous Page | Permalink

JISCMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
2006
2005
2004
2003
2002
2001
2000
1999
1998


WWW.JISCMAIL.AC.UK

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager