首页 | 本学科首页   官方微博 | 高级检索  
     


Direct estimation of class membership probabilities for multiclass classification using multiple scores
Authors:Kazuko Takahashi  Hiroya Takamura  Manabu Okumura
Affiliation:(1) Faculty of International Studies, Keiai University, Sakura, Chiba, Japan;(2) Precision and Intelligence Laboratory, Tokyo Institute of Technology, Yokohama, Kanagawa, Japan
Abstract:Accurate estimation of class membership probability is needed for many applications in data mining and decision-making, to which multiclass classification is often applied. Since existing methods for estimation of class membership probability are designed for binary classification, in which only a single score outputted from a classifier can be used, an approach for multiclass classification requires both a decomposition of a multiclass classifier into binary classifiers and a combination of estimates obtained from each binary classifier to a target estimate. We propose a simple and general method for directly estimating class membership probability for any class in multiclass classification without decomposition and combination, using multiple scores not only for a predicted class but also for other proper classes. To make it possible to use multiple scores, we propose to modify or extend representative existing methods. As a non-parametric method, which refers to the idea of a binning method as proposed by Zadrozny et al., we create an “accuracy table” by a different method. Moreover we smooth accuracies on the table with methods such as the moving average to yield reliable probabilities (accuracies). As a parametric method, we extend Platt’s method to apply a multiple logistic regression. On two different datasets (open-ended data from Japanese social surveys and the 20 Newsgroups) both with Support Vector Machines and naive Bayes classifiers, we empirically show that the use of multiple scores is effective in the estimation of class membership probabilities in multiclass classification in terms of cross entropy, the reliability diagram, the ROC curve and AUC (area under the ROC curve), and that the proposed smoothing method for the accuracy table works quite well. Finally, we show empirically that in terms of MSE (mean squared error), our best proposed method is superior to an expansion for multiclass classification of a PAV method proposed by Zadrozny et al., in both the 20 Newsgroups dataset and the Pendigits dataset, but is slightly worse than the state-of-the-art method, which is an expansion for multiclass classification of a combination of boosting and a PAV method, on the Pendigits dataset.
Contact Information Manabu OkumuraEmail:
Keywords:Multiclass classification  Class membership probabilities  Accuarcy table  Logistic regression  Direct estimation  Multiple classification scores
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号