首页 | 本学科首页   官方微博 | 高级检索  
     


Novel mislabeled training data detection algorithm
Authors:Yuan  Weiwei  Guan  Donghai  Zhu  Qi  Ma  Tinghuai
Affiliation:1.College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211-106, Jiangsu, China
;2.Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, Jiangsu, 210-023, China
;3.Jiangsu Engineering Centre of Network Monitoring, Nanjing University of Information Science and Technology, Nanjing, 210-044, Jiangsu, China
;
Abstract:

As a kind of noise, mislabeled training data exist in many applications. Because of their negative effects on learning, many filter techniques have been proposed to identify and eliminate them. Ensemble learning-based filter (EnFilter) is the most widely used filter which employs ensemble classifiers. In EnFilter, first the noisy training dataset is divided into several subsets. Each noisy subset is then checked by the multiple classifiers which are trained based on other noisy subsets. It is noted that since the training data used to train multiple classifiers are noisy, the quality of these classifiers cannot be guaranteed, which might generate poor noise identification result. This problem is more serious when the noise ratio in the training dataset is high. To solve this problem, a straightforward but effective approach is proposed in this work. Instead of using noisy data to train the classifiers, nearly noise-free (NNF) data are used since they are supposed to train more reliable classifiers. To this end, a novel NNF data extraction approach is also proposed. Experimental results on a set of benchmark datasets illustrate the utility of our proposed approach.

Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号