首页 | 本学科首页   官方微博 | 高级检索  
     

取样方法与代价敏感方法的比较
作者单位:中国地质大学计算机学院 湖北武汉430000
摘    要:传统的方法中,标准的分类器设计一般基于精度,但是许多实际应用问题中,不同的类别对应的错分代价也不同,往往少数类样本更加值得关注。对于不平衡的数据集处理,最直接的方法就是改变学习算法的本身使之成为代价敏感算法,当然相对于改变数据集的结构,这也是稍难实现的方法。除此之外,改变数据集的分布也是常用办法,本文采用的办法是过取样和欠取样。本文将对以上所提到的三种方法在不同的数据集上比较其性能,以了解不同解决策略的特性与适用的环境。

关 键 词:代价敏感  取样  不平衡数据集

The Comparison of Sampling and Cost-sensitive Learning
Authors:LONG Ying  CAI Zhi-hua
Abstract:In traditional methods, Standard classifiers normally base on minimizing the incorrectly classified error, however, in many application areas, different misclassification has different cost, the minor part of the dataset usually has higher cost. To dealing with unbalanced dataset, changing the learning algorithm to be cost sensitive is the most direct way. It is more difficult way according to changing the structure of dataset directly which is a popular method. In this paper, we adopt oversampling and undersampling as the changing structure methods. The three methods will be compared under several datasets that we can analyze their characteristic and the environment that appropriate for their work.
Keywords:Cost sensitive  Sampling  Unbalanced dataset
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号