Novel mislabeled training data detection algorithm期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Novel mislabeled training data detection algorithm

Authors:	Yuan Weiwei Guan Donghai Zhu Qi Ma Tinghuai

Affiliation:	1.College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211-106, Jiangsu, China ;2.Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, Jiangsu, 210-023, China ;3.Jiangsu Engineering Centre of Network Monitoring, Nanjing University of Information Science and Technology, Nanjing, 210-044, Jiangsu, China ;

Abstract:	As a kind of noise, mislabeled training data exist in many applications. Because of their negative effects on learning, many filter techniques have been proposed to identify and eliminate them. Ensemble learning-based filter (EnFilter) is the most widely used filter which employs ensemble classifiers. In EnFilter, first the noisy training dataset is divided into several subsets. Each noisy subset is then checked by the multiple classifiers which are trained based on other noisy subsets. It is noted that since the training data used to train multiple classifiers are noisy, the quality of these classifiers cannot be guaranteed, which might generate poor noise identification result. This problem is more serious when the noise ratio in the training dataset is high. To solve this problem, a straightforward but effective approach is proposed in this work. Instead of using noisy data to train the classifiers, nearly noise-free (NNF) data are used since they are supposed to train more reliable classifiers. To this end, a novel NNF data extraction approach is also proposed. Experimental results on a set of benchmark datasets illustrate the utility of our proposed approach.

Keywords:
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏