首页 | 本学科首页   官方微博 | 高级检索  
     


A Comparison of Prediction Accuracy,Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms
Authors:Lim  Tjen-Sien  Loh  Wei-Yin  Shih  Yu-Shan
Affiliation:(1) Department of Statistics, University of Wisconsin, Madison, WI 53706, USA;(2) Department of Statistics, University of Wisconsin, Madison, WI 53706, USA;(3) Department of Mathematics, National Chung Cheng University, Chiayi, 621, Taiwan, R.O.C.
Abstract:Twenty-two decision tree, nine statistical, and two neural network algorithms are compared on thirty-two datasets in terms of classification accuracy, training time, and (in the case of trees) number of leaves. Classification accuracy is measured by mean error rate and mean rank of error rate. Both criteria place a statistical, spline-based, algorithm called POLYCLSSS at the top, although it is not statistically significantly different from twenty other algorithms. Another statistical algorithm, logistic regression, is second with respect to the two accuracy criteria. The most accurate decision tree algorithm is QUEST with linear splits, which ranks fourth and fifth, respectively. Although spline-based statistical algorithms tend to have good accuracy, they also require relatively long training times. POLYCLASS, for example, is third last in terms of median training time. It often requires hours of training compared to seconds for other algorithms. The QUEST and logistic regression algorithms are substantially faster. Among decision tree algorithms with univariate splits, C4.5, IND-CART, and QUEST have the best combinations of error rate and speed. But C4.5 tends to produce trees with twice as many leaves as those from IND-CART and QUEST.
Keywords:classification tree  decision tree  neural net  statistical classifier
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号