首页 | 本学科首页   官方微博 | 高级检索  
     

基于自适应图正则化与联合低秩矩阵分解的数字文化遗产多标签众包答案聚合方法
引用本文:王春雪.基于自适应图正则化与联合低秩矩阵分解的数字文化遗产多标签众包答案聚合方法[J].计算机应用研究,2023,40(4):1119-1129.
作者姓名:王春雪
作者单位:敦煌研究院
基金项目:甘肃省敦煌文物保护研究中心开放课题(GDW2021YB05);陇原青年创新创业人才(个人)资助项目(2022LQGR40);国家重点研发计划资助项目(2020YFC1522701,2020YFC1522705)
摘    要:多标签答案聚合问题是通过融合众包收集的大量非专家标注来估计样本的真实标签,由于数字文化遗产数据具有标注成本高、样本类别多、分布不均衡等特点,给数据集多标签答案聚合问题带来了极大挑战。以往的方法主要集中在单标签任务,忽视了多标签任务的标签关联性;大部分多标签聚合方法虽然在一定程度上考虑了标签相关性,但是很敏感地受噪声和离群值的影响。为解决这些问题,提出一种基于自适应图正则化与联合低秩矩阵分解的多标签答案聚合方法AGR-JMF。首先,将标注矩阵分解成纯净标注和噪声标注两部分;对纯净标注采用自适应图正则化方法构建标签间的关联矩阵;最后,利用标注质量、标签关联性、标注人员行为属性相似性等信息指导低秩矩阵分解,以实现多标签答案的聚合。真实数据集和莫高窟壁画数据集上的实验表明,AGR-JMF相较于现有算法在聚合准确率、识别欺诈者等方面具有明显优势。

关 键 词:多标签众包答案聚合  纯净标注数据  自适应图正则化  低秩矩阵分解
收稿时间:2022/9/18 0:00:00
修稿时间:2023/3/11 0:00:00

Multi-label crowd answer aggregation of digital cultural heritage based on adaptive graph regularization and joint low-rank matrix factorization
Affiliation:Dunhuang Academy
Abstract:Multi-label answer aggregation problem aims to estimate the ground truth labels of samples by aggregating a large number of non-expert annotations collected by crowdsourcing. Due to the high annotation cost, multiple sample categories and uneven distribution of digital cultural heritage data, it brings great challenges to multi-label answer aggregation of datasets. Previous methods mainly focused on single-label problems, ignoring the label relevance of multi-label tasks. To some extent, most multi-label aggregation methods considered label correlations but were sensitive to noises and outliers. To solve these problems, this paper proposed a multi-label answer aggregation method based on adaptive graph regularization and joint low-rank matrix factorization AGR-JMF. Firstly, it divided the input annotation matrix into two parts: pure annotations and noise annotations. Then, it constructed the association matrix between labels by adaptive graph regularization method for pure annotations. Finally, in order to realize the multi-label answer aggregations, it used labeling quality, label relevance, and the behavior attributes similarity between annotators to guide the low-rank matrix factorization. Experiments on real-world datasets and MGF dataset show that AGR-JMF has obvious advantages over existing algorithms in terms of aggregating accuracy and identifying unreliable annotators.
Keywords:multi-label crowd answer aggregation  pure annotations  adaptive graph regularization  low-rank matrix factorization
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号