首页 | 本学科首页   官方微博 | 高级检索  
     

弱标签环境下基于语义邻域学习的图像标注
引用本文:韩京宇, 陈伟, 赵静, 郎杭, 毛毅. 基于异常特征模式的心电数据标签清洗方法[J]. 计算机研究与发展, 2023, 60(11): 2594-2610. DOI: 10.7544/issn1000-1239.202220334
作者姓名:韩京宇  陈伟  赵静  郎杭  毛毅
作者单位:1.南京邮电大学计算机学院 南京 210023;2.江苏省大数据安全与智能处理重点实验室(南京邮电大学) 南京 210023
基金项目:国家自然科学基金项目(62002174)
摘    要:

心电图(electrocardiogram, ECG)异常的自动检测是一个典型的多标签分类问题,训练分类器需要大量有高质量标签的样本. 但心电数据集异常标签经常缺失或错误,如何清洗弱标签得到干净的心电数据集是一个亟待解决的问题. 在一个标签完整且准确的示例数据集辅助下,提出一种基于异常特征模式 (abnormality-feature pattern, AFP) 的方法对弱标签心电数据进行标签清洗,以获取所有正确的异常标签. 清洗分2个阶段,即基于聚类的规则构造和基于迭代的标签清洗. 在第1阶段,通过狄利克雷过程混合模型(Dirichlet process mixture model, DPMM)聚类,识别每个异常标签对应的不同特征模式,进而构建异常发现规则、排除规则和1组二分类器. 在第2阶段,根据发现和排除规则辨识初始相关标签集,然后根据二分类器迭代扩展相关标签并排除不相关标签. AFP方法捕捉了示例数据集和弱标签数据集的共享特征模式,既应用了人的知识,又充分利用了正确标记的标签;同时,渐进地去除错误标签和填补缺失标签,保证了标签清洗的可靠性. 真实和模拟数据集上的实验证明了AFP方法的有效性.



关 键 词:心电图  多标签分类  异常标签  异常特征模式  二分类器  标签清洗
收稿时间:2022-04-25
修稿时间:2022-12-09

An open access database for evaluating the algorithms of electrocardiogram rhythm and morphology abnormality detection
Han Jingyu, Chen Wei, Zhao Jing, Lang Hang, Mao Yi. A Label Cleaning Method of ECG Data Based on Abnormality-Feature Patterns[J]. Journal of Computer Research and Development, 2023, 60(11): 2594-2610. DOI: 10.7544/issn1000-1239.202220334
Authors:Han Jingyu  Chen Wei  Zhao Jing  Lang Hang  Mao Yi
Affiliation:1.School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023;2.Jiangsu Key Laboratory of Big Data Security and Intelligent Processing (Nanjing University of Posts and Telecommunications), Nanjing 210023
Abstract:Automatic detection of electrocardiogram (ECG) abnormality is a typical multi-label classification problem, which heavily relies on sufficient samples with high-quality abnormality labels for model training. Unfortunately, we often face ECG datasets with partial and incorrect labels, so how to clean weakly-labelled datasets to obtain the clean datasets with all the correct abnormality labels is becoming a pressing concern. Under the assumption that we can have a small-sized example dataset with full and correct labels, we propose an abnormality-feature pattern (AFP) based method to automatically clean the weakly-labelled datasets, thus obtaining all the correct abnormality labels. The cleaning process proceeds with two stages, clustering-based rule construction and iteration-based label cleaning. During the first stage, we construct a set of label inclusion and exclusion rules and a set of binary discriminators by exploiting the different abnormality-feature patterns which are identified through Dirichlet process mixture model (DPMM) clustering. During the second stage, we first identify the relevant abnormalities according to the label inclusion and exclusion rules, and then refine the relevant abnormalities with iterations. AFP method takes advantage of the abnormality-feature patterns shared by the example dataset and weakly-labelled dataset, which is based on both the human intelligence and the correct label information from the weakly-labelled dataset. Further, the method stepwise removes the incorrect labels and fills in the missing ones with an iteration, thus ensuring a reliable cleaning process. The experiments on real and synthetic datasets prove the effectiveness of our method.
Keywords:electrocardiogram (ECG)  multi-label classification  abnormality labels  abnormality-feature pattern (AFP)  binary discriminator  label cleaning
点击此处可从《计算机研究与发展》浏览原始摘要信息
点击此处可从《计算机研究与发展》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号