首页 | 本学科首页   官方微博 | 高级检索  
     


Cost-sensitive classification with inadequate labeled data
Authors:Tao Wang  Zhenxing Qin  Shichao Zhang  Chengqi Zhang
Affiliation:1. Faculty of Engineering and Information Technology, University of Technology Sydney, PO Box 123, Broadway, NSW 2007, Australia;2. College of Computer Science and Information Technology, Guangxi Normal University, Guilin, China
Abstract:It is an actual and challenging issue to learn cost-sensitive models from those datasets that are with few labeled data and plentiful unlabeled data, because some time labeled data are very difficult, time consuming and/or expensive to obtain. To solve this issue, in this paper we proposed two classification strategies to learn cost-sensitive classifier from training datasets with both labeled and unlabeled data, based on Expectation Maximization (EM). The first method, Direct-EM, uses EM to build a semi-supervised classifier, then directly computes the optimal class label for each test example using the class probability produced by the learning model. The second method, CS-EM, modifies EM by incorporating misclassification cost into the probability estimation process. We conducted extensive experiments to evaluate the efficiency, and results show that when using only a small number of labeled training examples, the CS-EM outperforms the other competing methods on majority of the selected UCI data sets across different cost ratios, especially when cost ratio is high.
Keywords:Cost-sensitive learning   Classification   Semi-supervised learning   Expectation maximization
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号