首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于自训练的众包标记噪声纠正算法
引用本文:杨艺, 蒋良孝, 李超群. 一种基于自训练的众包标记噪声纠正算法. 自动化学报, 2023, 49(4): 830−844 doi: 10.16383/j.aas.c210051
作者姓名:杨艺  蒋良孝  李超群
作者单位:1.中国地质大学(武汉)计算机学院 武汉 430074;2.智能地学信息处理湖北省重点实验室(中国地质大学(武汉))武汉 430074;3.中国地质大学(武汉)数学与物理学院 武汉 430074
基金项目:国家自然科学基金联合基金(U1711267), 中央高校基本科研业务费专项资金(CUGGC03)资助
摘    要:针对众包标记经过标记集成后仍然存在噪声的问题, 提出了一种基于自训练的众包标记噪声纠正算法(Self-training-based label noise correction, STLNC). STLNC整体分为3个阶段: 第1阶段利用过滤器将带集成标记的众包数据集分为噪声集和干净集. 第2阶段利用加权密度峰值聚类算法构建数据集中低密度实例指向高密度实例的空间结构关系. 第3阶段首先根据发现的空间结构关系设计噪声实例选择策略; 然后利用在干净集上训练的集成分类器对选择的噪声实例按照设计的实例纠正策略进行纠正, 并将纠正后的实例加入到干净集, 再重新训练集成分类器; 重复实例选择与纠正过程直到噪声集中所有的实例被纠正; 最后用最后一轮训练得到的集成分类器对所有实例进行纠正. 在仿真标准数据集和真实众包数据集上的实验结果表明STLNC比其他5种最先进的噪声纠正算法在噪声比和模型质量两个度量指标上表现更优.

关 键 词:众包学习   自训练   集成标记   标记噪声   噪声纠正
收稿时间:2021-01-18

A Self-training-based Label Noise Correction Algorithm for Crowdsourcing
Yang Yi, Jiang Liang-Xiao, Li Chao-Qun. A self-training-based label noise correction algorithm for crowdsourcing. Acta Automatica Sinica, 2023, 49(4): 830−844 doi: 10.16383/j.aas.c210051
Authors:YANG Yi  JIANG Liang-Xiao  LI Chao-Qun
Affiliation:1. School of Computer Science, China University of Geosciences (Wuhan), Wuhan 430074;2. Hubei Key Laboratory of Intelligent Geo-Information Processing, China University of Geosciences (Wuhan), Wuhan 430074;3. School of Mathematics and Physics, China University of Geosciences (Wuhan), Wuhan 430074
Abstract:In order to solve the problem that a certain level of label noise exists in integrated labels obtained by label integration algorithms, this paper proposes a self-training-based label noise correction (STLNC) algorithm for crowdsourcing. There are three stages in STLNC. At the first stage, STLNC employs a filter to get a clean set and a noisy set. At the second stage, the weighted density peak clustering algorithm is used to construct the spatial structure relationship between low-density instances and high-density instances in the dataset. At the third stage, a noise instance selection strategy is at first designed according to the found spatial structure relationship. Then, these selected noise instances are corrected by the ensemble classifier trained on the clean set according to the designed instance correction strategy, and the corrected instances are added into the clean set and the ensemble classifier is retrained. The process of instance selection and correction is repeated until all noise instances are corrected. Finally, the ensemble classifier trained from the last round is used to correct all the instances. Experimental results on both simulated benchmark datasets and real-world crowdsourced datasets show that STLNC significantly outperforms other five state-of-the-art noise correction algorithms in team of the noise ratio and the model quality.
Keywords:Crowdsourcing learning  self-training  integrated labels  label noise  noise correction
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号