首页 | 本学科首页   官方微博 | 高级检索  
     


Optimal convex error estimators for classification
Authors:Chao Sima
Affiliation:a Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
b Computational Biology Division, Translational Genomics Research Institute, Phoenix, AZ 85004, USA
Abstract:A cross-validation error estimator is obtained by repeatedly leaving out some data points, deriving classifiers on the remaining points, computing errors for these classifiers on the left-out points, and then averaging these errors. The 0.632 bootstrap estimator is obtained by averaging the errors of classifiers designed from points drawn with replacement and then taking a convex combination of this “zero bootstrap” error with the resubstitution error for the designed classifier. This gives a convex combination of the low-biased resubstitution and the high-biased zero bootstrap. Another convex error estimator suggested in the literature is the unweighted average of resubstitution and cross-validation. This paper treats the following question: Given a feature-label distribution and classification rule, what is the optimal convex combination of two error estimators, i.e. what are the optimal weights for the convex combination. This problem is considered by finding the weights to minimize the MSE of a convex estimator. It also considers optimality under the constraint that the resulting estimator be unbiased. Owing to the large amount of results coming from the various feature-label models and error estimators, a portion of the results are presented herein and the main body of results appears on a companion website. In the tabulated results, each table treats the classification rules considered for the model, various Bayes errors, and various sample sizes. Each table includes the optimal weights, mean errors and standard deviations for the relevant error measures, and the MSE and MAE for the optimal convex estimator. Many observations can be made by considering the full set of experiments. Some general trends are outlined in the paper. The general conclusion is that optimizing the weights of a convex estimator can provide substantial improvement, depending on the classification rule, data model, sample size and component estimators. Optimal convex bootstrap estimators are applied to feature-set ranking to illustrate their potential advantage over non-optimized convex estimators.
Keywords:Bootstrap  Classification  Cross-validation  Error estimation  Feature-set ranking  Optimal estimation  Resubstitution  Companion website: http://ee  tamu  edu/_method=retrieve&  _eid=1-s2  0-S0031320306001336&  _mathId=si141  gif&  _pii=S0031320306001336&  _issn=00313203&  _acct=C000053510&  _version=1&  _userid=1524097&  md5=46e86753577b2550b21ab90252d44d5b')" style="cursor:pointer  &sim" target="_blank">" alt="Click to view the MathML source" title="Click to view the MathML source">&sim  edward/convex/
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号