首页 | 本学科首页   官方微博 | 高级检索  
     

噪声标签重标注方法
引用本文:余孟池,牟甲鹏,蔡剑,徐建.噪声标签重标注方法[J].计算机科学,2020,47(6):79-84.
作者姓名:余孟池  牟甲鹏  蔡剑  徐建
作者单位:南京理工大学计算机科学与工程学院 南京 210094;南京理工大学计算机科学与工程学院 南京 210094;南京理工大学计算机科学与工程学院 南京 210094;南京理工大学计算机科学与工程学院 南京 210094
摘    要:样本标签的完整性对于有监督学习问题的分类精度有着显著影响,然而在现实数据中,由于标注过程的随机性和标注人员的不专业性等因素,数据标签不可避免地会受到噪声污染,即样本的观测标签不同于真实标签。为降低噪声标签对分类器分类精度的负面影响,文中提出一种噪声标签纠正方法,该方法利用基分类器对观测样本进行分类并估计噪声率,以识别噪声标签数据,再利用基分类器的分类结果对噪声标签样本进行重新标注,得到噪声标签样本被修正后的样本数据集。在合成数据集与真实数据集上的实验结果表明,该重标注算法在不同基分类器和不同噪声率干扰下对分类结果都有一定的提升作用,在合成数据集上对比无降噪声算法,其正确率提升5%左右,而在CIFAR和MNIST数据集上的高噪声率环境下,该重标注算法的F 1值比Elk08和Nat13平均高7%以上,比无噪声算法高53%。

关 键 词:噪声标签学习  重标注标签  逻辑回归  朴素贝叶斯

Noisy Label Classification Learning Based on Relabeling Method
YU Meng-chi,MU Jia-peng,CAI Jian,XU Jian.Noisy Label Classification Learning Based on Relabeling Method[J].Computer Science,2020,47(6):79-84.
Authors:YU Meng-chi  MU Jia-peng  CAI Jian  XU Jian
Affiliation:(School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094,China)
Abstract:The integrity of sample labels has a significant impact on the accuracy of supervised learning algorithms.However,in real data,due to the unprofessional and random nature of the labeling process,the label of the dataset is inevitably polluted by noise,i.e.the assigned label of sample is different from its real label.In order to reduce the negative impact of noise labels on the classification accuracy of classifiers,this paper proposes a noise label correction approach.It firstly identifies the noise label data by applying the base classifier to classify the samples and estimating the noise rate to identify noisy label data,and then uses the base classifier to relabel the noisy samples.As a result,the noisy samples are relabeled to obtain a sample dataset in which the noisy samples are corrected.Experiments on synthetic datasets and real datasets show that the relabel algorithm has a certain improvement effect on classification results under different base classifiers and different types of noise rate interference.Compared with the base classifier,the accuracy of relabel algorithm is improved by about 5%in the synthetic dataset,while in the high noise environment of CIFAR and MNIST datasets,the F 1 score of the proposed algorithm is 7%higher than that of Elk08 and Nat13 on average,and is improved by 53%compared with base classifier.
Keywords:Noisy label learning  Relabeling label  Logistic Regression  Naive Bayes
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号