首页 | 本学科首页   官方微博 | 高级检索  
 共查询到18条相似文献,搜索用时 203 毫秒
半监督学习是一种结合监督学习与无监督学习的学习方法,通过利用未标记数据,提高标记数据所建立模型的效果,目的是减少传统的机器学习任务中对大量标注数据的需求、降低人工成本.在中文电子病历实体识别领域,由于缺少足够的标注数据,且医学文本专业性较强、人工标注成本高,可以利用半监督学习方法,提升少量标注数据的训练效果.本文介绍了中文电子病历实体识别的研究背景和半监督学习的相关研究,并应用改进后的Tri-Training算法,提升中文电子病历实体识别模型的效果.  相似文献   

李维鹏  杨小冈  李传祥  卢瑞涛  黄攀 《红外与激光工程》2021,50(3):20200511-1-20200511-8
针对红外数据集规模小,标记样本少的特点,提出了一种红外目标检测网络的半监督迁移学习方法,主要用于提高目标检测网络在小样本红外数据集上的训练效率和泛化能力,提高深度学习模型在训练样本较少的红外目标检测等场景当中的适应性。文中首先阐述了在标注样本较少时无标注样本对提高模型泛化能力、抑制过拟合方面的作用。然后提出了红外目标检测网络的半监督迁移学习流程:在大量的RGB图像数据集中训练预训练模型,后使用少量的有标注红外图像和无标注红外图像对网络进行半监督学习调优。另外,文中提出了一种特征相似度加权的伪监督损失函数,使用同一批次样本的预测结果相互作为标注,以充分利用无标注图像内相似目标的特征分布信息;为降低半监督训练的计算量,在伪监督损失函数的计算中,各目标仅将其特征向量邻域范围内的预测目标作为伪标注。实验结果表明,文中方法所训练的目标检测网络的测试准确率高于监督迁移学习所获得的网络,其在Faster R-CNN上实现了1.1%的提升,而在YOLO-v3上实现了4.8%的显著提升,验证了所提出方法的有效性。  相似文献   

针对基于有监督学习通信信号分类算法需要大量有标签训练样本,而在实际场合大多无法满足数量要求的问题,提出利用数据驱动模型的半监督学习方法,通过对比预测编码无监督算法预训练和有监督学习相结合,利用LSTM (long short term memory)和ResNet (residual network)联合神经网络实现小样本自动提取特征,提高小样本条件下信号识别准确率。在真实通信调制信号集上实验表明,半监督联合神经网络结构较以往方法,识别准确率提升3%-20%,小样本条件下性能提高60%,同时在低信噪比条件下识别能力突出,0dB时对11种调制信号平均识别正确率达到92%,具有明显优势。   相似文献   

信息提取技术是自然语言处理技术的关键技术之一,其中最主要的任务是事件元素提取.本文利用深度学习网络模型实现信息提取任务进行了深入研究.训练数据来源于上海大学构建的CEC已标注的语料库.相比于采用手工设立规则的识别方式和BiLSTM网络模型,本文通过对数据进行预处理和搭建BERT-BiLSTM-CRF深度网络模型,对文本数据训练实现标注,在时间、报道时间、参与对象的识别准确率上均有所提升.  相似文献   

针对辐射源个体识别(SEI)中样本标签不完整和数据类别分布不平衡导致分类准确率下降的问题,该文提出了一种基于代价敏感学习和半监督生成式对抗网络(GAN)的特定辐射源分类方法。该方法通过半监督训练方式优化生成器和判别器的网络参数,并向残差网络中添加多尺度拓扑模块融合时域信号的多维分辨率特征,赋予生成样本额外标签从而直接利用判别器完成分类。同时设计代价敏感损失缓解优势样本导致的梯度传播失衡,改善分类器在类不平衡数据集上的识别性能。在4类失衡仿真数据集上的实验结果表明,存在40%无标记样本的情况下,该方法对于5个辐射源的平均识别率相比于交叉熵损失和焦点损失分别提高5.34%和2.69%,为解决数据标注缺失和类别分布不均条件下的特定辐射源识别问题提供了新思路。   相似文献   

聂文海  李莉  杨亮  李晓希 《信息技术》2022,(12):143-148
针对数据项与标签的相关程度不高,导致基础数据半监督分类准确率低的问题,设计一体化配电网规划中的基础数据半监督分类方法。构建配电网基础数据聚合模型,加密传输和存储数据至区块链中,挖掘基础数据关联规则,计算基础数据项与标签的相关程度,进行配电网聚合数据的半监督分类。实验结果表明,该方法提高了包含多个标签、单个标签的基础数据半监督分类准确率,缩短了数据集的半监督分类时间,提升了半监督分类的精度和效率。  相似文献   

舌色是中医(TCM)望诊最关注的诊察特征之一。在实际应用中,通过一台设备采集到的舌象数据训练得到的舌色分类模型应用于另一台设备时,由于舌象数据分布特性不一致,分类性能往往急剧下降。为此,该文提出一种基于双阶段元学习的小样本中医舌色域自适应分类方法。首先,设计了一种双阶段元学习训练策略,从源域有标注样本中提取域不变特征,并利用目标域的少量有标注数据对网络模型进行微调,使得模型可以快速适应目标域的新样本特性,提高舌色分类模型的泛化能力并克服过拟合。接下来,提出了一种渐进高质量伪标签生成方法,利用训练好的模型对目标域的未标注样本进行预测,从中挑选出置信度高的预测结果作为伪标签,逐步生成高质量的伪标签。最后,利用这些高质量的伪标签,结合目标域的有标注数据对模型进行训练,得到舌色分类模型。考虑到伪标签中含有噪声问题,采用了对比正则化函数,可以有效抑制噪声样本在训练过程中产生的负面影响,提升目标域舌色分类准确率。在两个自建中医舌色分类数据集上的实验结果表明,在目标域仅提供20张有标注样本的情况下,舌色分类准确率达到了91.3%,与目标域有监督的分类性能仅差2.05%。  相似文献   

人脸美丽预测是研究如何让计算机判断人脸美丽的前沿课题,随着深度学习的不断进展,已经取得了一定效果。然而,基于深度学习的人脸美丽预测需要大量的训练数据和昂贵的人脸美丽标注。因此,如何在少样本条件下取得较好效果,仍有待深入研究。自监督学习可在上游任务中利用无标注数据来学习较好的特征,从而能在下游任务中降低对标注数据的依赖。为此,本文将自监督学习应用于人脸美丽预测,采用批次内对象识别和多视图特征聚类。其中,批次内对象识别通过给每批次不同样本分配独立的伪标签来学习特征,使得网络有区分每个样本对象的能力。多视图特征聚类首先将人脸图像进行多次数据增强;再经过编码器,得到人脸属性特征;最后通过自监督约束使人脸属性特征聚合在一起。基于大规模亚洲人脸美丽数据库(Large Scale Asia Facial Beauty Database, LSAFBD)和SCUT-FBP5500数据库的实验结果表明,本文所提方法降低了模型对有标注数据的依赖,提高了预测准确率,在少样本条件下优于以Resnet18为基线的有监督学习方法,准确率高于常规自监督学习方法,可广泛应用于目标检测和图像分类等领域。  相似文献   

针对目前雷达欺骗干扰识别中常规特征识别方法应用受限和训练高性能深度学习模型需要的大量标注样本难以高效获取的问题,该文提出一种基于对抗域适应网络的雷达欺骗干扰识别方法,以改善标签限制;并融合注意力机制残差模块进一步提升识别精度。首先,对雷达接收信号进行时频变换后,应用基于对抗网络思想的域适应技术实现从标注源域样本到未标注目标域样本的迁移识别。其次,通过所设计的空间通道注意力残差模块使网络训练聚焦于时频图全局空间特征和高响应通道,以忽略时频图像中可迁移性低的区域抑制负迁移的产生。在不同源域与目标域雷达欺骗干扰数据集上的实验结果表明了该方法的可行性和有效性。  相似文献   

应自炉  王文琪  徐颖  李文霸 《信号处理》2023,(11):2080-2090
现有的自监督学习算法对小样本合成孔径雷达(Synthetic Aperture Radar,SAR)图像表征能力不足,无法充分地满足自动目标识别(Automatic Target Recognition,ATR)性能的需求。因此,本文提出了一种基于孪生自监督学习的SAR ATR方法。首先,将无标注SAR数据通过孪生特征提取网络模块中的数据增强方式建立正负样本对;其次,通过孪生自监督学习模块中的对比学习头部网络和特征冗余降低头部网络,依据无监督对比学习损失函数和特征信息冗余损失函数进行联合优化,进而得到具有较好表征能力的预训练网络;最后,将自监督预训练网络权重加载到下游网络中,并通过交叉熵损失对下游网络进行小样本SAR图像有监督识别。实验结果表明,对于运动与静止目标获取和识别(Moving and Stationary Target Acquisition and Recognition,MSTAR)数据集,本文的方法仅在3.13%的训练数据上可达82.95%准确率。本文所提方法可在无标注数据中获得较好的表征能力,有效地改善小样本SAR图像识别的过拟合问题。  相似文献   

The classification of network traffic, which involves classifying and identifying the type of network traffic, is the most fundamental step to network service improvement and modern network management. Classic machine learning and deep learning methods have widely adopted in the field of network traffic classification. However, there are two major challenges in practice. One is the user privacy concern in cross-domain traffic data sharing for the purpose of training a global classification model, and the other is the difficulty to obtain large amount of labeled data for training. In this paper, we propose a novel approach using federated semi-supervised learning for network traffic classification, in which the federated server and clients from different domains work together to train a global classification model. Among them, unlabeled data are used on the client side, and labeled data are used on the server side. The experimental results derived from a public dataset show that the accuracy of the proposed approach can reach 97.81%, and the accuracy gap between the federated learning approach and the centralized training method is minimal.  相似文献   

类不均衡的半监督高斯过程分类算法   总被引:1,自引:0,他引:1  
针对传统的监督学习方法难以解决真实数据集标记信息少、训练样本集中存在类不均衡的问题,提出了类不均衡的半监督高斯过程分类算法。算法引入自训练的半监督学习思想,结合高斯过程分类算法计算后验概率,向未标记数据中注入类标记以获得更多准确可信的标记数据,使得训练样本的类分布相对平衡,分类器自适应优化以获得较好的分类效果。实验结果表明,在类不均衡的训练样本及标记信息过少的情况下,该算法通过自训练分类器获得了有效标记,使分类精度得到了有效提高,为解决类不均衡数据分类提供了一个新的思路。  相似文献   

减少网络相关的投诉一直是运营商的重点工作之一。目前,网络投诉用户预警方案多以网优工程师经验为主导,准确率和效率都较低。本文通过对历史网络投诉用户数据进行全面深入的分析,基于XGboost算法建立投诉用户特征模型,实现了对网络投诉用户的预测。该方法预测准确率较高,与其他网优系统对接后能够定位用户质差原因,使网络部门能够提前进行网络优化,提升用户满意度。  相似文献   

Significant challenges still remain despite the impressive recent advances in machine learning techniques, particularly in multimedia data understanding. One of the main challenges in real-world scenarios is the nature and relation between training and test datasets. Very often, only small sets of coarse-grained labeled data are available to train models, which are expected to be applied on large datasets and fine-grained tasks. Weakly supervised learning approaches handle such constraints by maximizing useful training information in labeled and unlabeled data. In this research direction, we propose a weakly supervised approach that analyzes the dataset manifold to expand the available labeled set. A hypergraph manifold ranking algorithm is exploited to represent the contextual similarity information encoded in the unlabeled data and identify strong similarity relations, which are taken as a path to label expansion. The expanded labeled set is subsequently exploited for a more comprehensive and accurate training process. The proposed model was evaluated jointly with supervised and semi-supervised classifiers, including Graph Convolutional Networks. The experimental results on image and video datasets demonstrate significant gains and accurate results for different classifiers in diverse scenarios.  相似文献   

Canonical correlation analysis (CCA) is an efficient method for dimensionality reduction on two-view data. However, as an unsupervised learning method, CCA cannot utilize partly given label information in multi-view semi-supervised scenarios. In this paper, we propose a novel two-view semi-supervised learning method, called semi-supervised canonical correlation analysis based on label propagation (LPbSCCA). LPbSCCA incorporates a new sparse representation based label propagation algorithm to infer label information for unlabeled data. Specifically, it firstly constructs dictionaries consisting of all labeled samples; and then obtains reconstruction coefficients of unlabeled samples using sparse representation technique; at last, by combining given labels of labeled samples, estimates label information for unlabeled ones. After that, it constructs soft label matrices of all samples and probabilistic within-class scatter matrices in each view. Finally, in order to enhance discriminative power of features, it is formulated to maximize the correlations between samples of the same class from cross views, while minimizing within-class variations in the low-dimensional feature space of each view simultaneously. Furthermore, we also extend a general model called LPbSMCCA to handle data from multiple (more than two) views. Extensive experimental results from several well-known datasets demonstrate that the proposed methods can achieve better recognition performances and robustness than existing related methods.  相似文献   

Photos are becoming spontaneous, objective, and universal sources of information. This paper explores evolving situation recognition using photo streams coming from disparate sources combined with the advances of deep learning. Using visual concepts in photos together with space and time information, we formulate the situation detection into a semi-supervised learning framework and propose new graph-based models to solve the problem. To extend the method for unknown situations, we introduce a soft label method that enables the traditional semi-supervised learning framework to accurately predict predefined labels as well as effectively form new clusters. To overcome the noisy data which degrades graph quality, leading to poor recognition results, we take advantage of two kinds of noise-robust norms which can eliminate the adverse effects of outliers in visual concepts and improve the accuracy of situation recognition. Finally, we demonstrate the idea and the effectiveness of the proposed models on Yahoo Flickr Creative Commons 100 Million.  相似文献   

Because of computational complexity, the deep neural network (DNN) in embedded devices is usually trained on high-performance computers or graphic processing units (GPUs), and only the inference phase is implemented in embedded devices. Data processed by embedded devices, such as smartphones and wearables, are usually personalized, so the DNN model trained on public data sets may have poor accuracy when inferring the personalized data. As a result, retraining DNN with personalized data collected locally in embedded devices is necessary. Nevertheless, retraining needs labeled data sets, while the data collected locally are unlabeled, then how to retrain DNN with unlabeled data is a problem to be solved. This paper proves the necessity of retraining DNN model with personalized data collected in embedded devices after trained with public data sets. It also proposes a label generation method by which a fake label is generated for each unlabeled training case according to users’ feedback, thus retraining can be performed with unlabeled data collected in embedded devices. The experimental results show that our fake label generation method has both good training effects and wide applicability. The advanced neural networks can be trained with unlabeled data from embedded devices and the individualized accuracy of the DNN model can be gradually improved along with personal using.  相似文献   

针对聚类的入侵检测算法误报率高的问题,提出一种主动学习半监督聚类入侵检测算法.在半监督聚类过程中应用主动学习策略,主动查询网络中未标记数据与标记数据的约束关系,利用少量的标记数据生成正确的样本模型来指导大量的未标记数据聚类,对聚类后仍未能标记的数据采用改进的K-近邻法进一步确定未标记数据的类型,实现对新攻击类型的检测.实验结果表明了算法的可行性及有效性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号