首页 | 本学科首页   官方微博 | 高级检索  
     

自适应主动半监督学习方法
引用本文:李延超,肖甫,陈志,李博. 自适应主动半监督学习方法[J]. 软件学报, 2020, 31(12): 3808-3822
作者姓名:李延超  肖甫  陈志  李博
作者单位:南京邮电大学计算机学院软件学院网络空间安全学院,江苏南京210023;南京理工大学计算机科学与工程学院,江苏南京210094
基金项目:国家自然科学基金(61932013);江苏省自然科学基金(BK20200739);江苏省333高层次人才培养工程(BRA2020065)
摘    要:主动学习从大量无标记样本中挑选样本交给专家标记.现有的批抽样主动学习算法主要受3个限制:(1)一些主动学习方法基于单选择准则或对数据、模型设定假设,这类方法很难找到既有不确定性又有代表性的未标记样本;(2)现有批抽样主动学习方法的性能很大程度上依赖于样本之间相似性度量的准确性,例如预定义函数或差异性衡量;(3)噪声标签问题一直影响批抽样主动学习算法的性能.提出一种基于深度学习批抽样的主动学习方法.通过深度神经网络生成标记和未标记样本的学习表示和采用标签循环模式,使得标记样本与未标记样本建立联系,再回到相同标签的标记样本.这样同时考虑了样本的不确定性和代表性,并且算法对噪声标签具有鲁棒性.在提出的批抽样主动学习方法中,算法使用的子模块函数确保选择的样本集合具有多样性.此外,自适应参数的优化,使得主动学习算法可以自动平衡样本的不确定性和代表性.将提出的主动学习方法应用到半监督分类和半监督聚类中,实验结果表明,所提出的主动学习方法的性能优于现有的一些先进的方法.

关 键 词:主动学习  半监督学习  分类  聚类
收稿时间:2019-07-07
修稿时间:2019-07-28

Adaptive Active Learning for Semi-supervised Learning
LI Yan-Chao,XIAO Fu,CHEN Zhi,LI Bo. Adaptive Active Learning for Semi-supervised Learning[J]. Journal of Software, 2020, 31(12): 3808-3822
Authors:LI Yan-Chao  XIAO Fu  CHEN Zhi  LI Bo
Affiliation:School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China; School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
Abstract:Active learning algorithms attempt to overcome the labeling bottleneck by asking queries from a large collection of unlabeled examples. Existing batch mode active learning algorithms suffer from three limitations: (1) the models with assumption on data are hard in finding images that are both informative and representative; (2) the methods that are based on similarity function or optimizing certain diversity measurement may lead to suboptimal performance and produce the selected set with redundant examples; (3) the problem of noise labels has been an obstacle for active learning algorithms. This study proposes a novel batch mode active learning method based on deep learning. The deep neural network generates the representations (embeddings) of labeled and unlabeled examples, and label cycle mode is adopted by connecting the embeddings from labeled examples to those of unlabeled examples and back at the same class, which considers both informativeness and representativeness of examples, as well as being robust to noisy labels. The proposed active learning method is applied to semi-supervised classification and clustering. The submodular function is designed to reduce the redundancy of the selected examples. Moreover, the query criteria of weighting losses are optimized in active learning, which automatically trade off the balance of informative and representative examples. Specifically, batch mode active scheme is incorporated into the classification approaches, in which the generalization ability is improved. For semi-supervised clustering, the proposed active scheme for constraints is used to facilitate fast convergence and perform better than unsupervised clustering. To validate the effectiveness of the proposed algorithms, extensive experiments are conducted on diversity benchmark datasets for different tasks, and the experimental results demonstrate consistent and substantial improvements over the state-of-the-art approaches.
Keywords:active learning  semi-supervised learning  classification  clustering
本文献已被 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号