Original Article Boosted classification trees result in minor to modest improvement in the accuracy in classifying cardiovascular outcomes compared to conventional classification trees
Peter C. Austin, Douglas S. Lee
Institute for Clinical Evaluative Sciences, Toronto, Ontario, Canada; Department of Health Management, Policy and Evaluation, University of Toronto; Dalla Lana School of Public Health, University of Toronto; Department of Medicine, University Health Network and Faculty of Medicine, University of Toronto, Toronto, Canada
Received March 22, 2011; accepted April 20, 2011; Epub April 23, 2011; published June 1, 2011
Abstract: Purpose: Classification trees are increasingly being to classifying patients according to the presence or absence of a disease or health outcome. A limitation of classification trees is their limited predictive accuracy. In the data-mining and machine learning literature, boosting has been developed to improve classification. Boosting with classification trees iteratively grows classification trees in a sequence of reweighted datasets. In a given iteration, subjects that were misclassified in the previous iteration are weighted more highly than subjects that were correctly classified. Classifications from each of the classification trees in the sequence are combined through a weighted majority vote to produce a final classification. The authors‟ objective was to examine whether boosting improved the accuracy of classification trees for predicting outcomes in cardiovascular patients. Methods: We examined the utility of boosting classification trees for classifying 30-day mortality outcomes in patients hospitalized with either acute myocardial infarction or congestive heart failure. Results: Improvements in the misclassification rate using boosted classification trees were at best minor compared to when conventional classification trees were used. Minor to modest improvements to sensitivity were observed, with only a negligible reduction in specificity. For predicting cardiovascular mortality, boosted classification trees had high specificity, but low sensitivity. Conclusions: Gains in predictive accuracy for predicting cardiovascular outcomes were less impressive than gains in performance observed in the data mining literature. (AJCD1103001).