My Blog List

Sunday, June 12, 2011

Gradient Boosted Decision Tree(GBDT)

Gradient Boosted Decision Trees are an additive classification or regression model consisting of an ensemble of trees, fitted to current residuals, gradients of the loss function, in a forward step-wise manner. In the traditional boosting framework, the weak learners are generally shallow decision trees consisting of a few leaf nodes. GBDT ensembles are found to work well when there are hundreds of such decision trees. Gradient Boosted Decision Trees was introduced by Jerome Friedman in 1999. Salford Systems' Treenet and the GBM package in R are implementations of GBDT.

In machine learning, understanding how one's model succeeds or fails for a particular dataset or sample is important to building better models. For Naive Bayes classifiers, a list of frequency counts and a confusion matrix is generally informative enough to understand what went wrong. For traditional decision trees, understanding how a sample traverses the tree is highly informative about which features caused a sample to be classified a particular way.

For GBDT models, since there are hundreds of decisions trees, we can not realistically expect a researcher to be able to look at all the decision trees and the individual path the sample traverses each tree. Our proposal is to work on visualizations of ensembles of GBDT for error analysis. Of particular focus will be generating a single representative tree from the ensemble and visualizations of the paths that a particular sample traverses and what features impacted the final results the most.


References
  • Mulvaney R, Phatak DS. A Method to Merge Ensembles of Bagged or Boosted Forced-Split Decision Trees. IEEE Trans on PAMI. 2003
  • Barlow, T., Neville, P. Case study: visualization for decision tree analysis in data mining. IEEE Symposium on Information Visualization, 2001.
  • Soon Tee Teoh, Kwan-Liu Ma, PaintingClass: interactive construction, visualization and exploration of decision trees. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

From:http://vis.berkeley.edu/courses/cs294-10-fa07/wiki/index.php/FP-Jerry_Ye_and_Jimmy_Chen

No comments:

Post a Comment