首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 287 毫秒
1.
基于单分类支持向量机和主动学习的网络异常检测研究   总被引:1,自引:0,他引:1  
刘敬  谷利泽  钮心忻  杨义先 《通信学报》2015,36(11):136-146
对基于支持向量机和主动学习的异常检测方法进行了研究,首先利用原始数据采用无监督方式建立单分类支持向量机模型,然后结合主动学习找出对提高异常检测性能最有价值的样本进行人工标记,利用标记数据和无标记数据以半监督方式对基于单分类支持向量机的异常检测模型进行扩展。实验结果表明,所提方法能够利用少量标记数据获取性能提升,并能够通过主动学习减小人工标记代价,更适用于实际网络环境。  相似文献   

2.
Predicting protein function from protein interaction networks has been challenging because of the complexity of functional relationships among proteins. Most previous function prediction methods depend on the neighborhood of or the connected paths to known proteins. However, their accuracy has been limited due to the functional inconsistency of interacting proteins. In this paper, we propose a novel approach for function prediction by identifying frequent patterns of functional associations in a protein interaction network. A set of functions that a protein performs is assigned into the corresponding node as a label. A functional association pattern is then represented as a labeled subgraph. Our frequent labeled subgraph mining algorithm efficiently searches the functional association patterns that occur frequently in the network. It iteratively increases the size of frequent patterns by one node at a time by selective joining, and simplifies the network by a priori pruning. Using the yeast protein interaction network, our algorithm found more than 1400 frequent functional association patterns. The function prediction is performed by matching the subgraph, including the unknown protein, with the frequent patterns analogous to it. By leave-one-out cross validation, we show that our approach has better performance than previous link-based methods in terms of prediction accuracy. The frequent functional association patterns generated in this study might become the foundations of advanced analysis for functional behaviors of proteins in a system level.   相似文献   

3.
Protein molecules interact with each other in protein complexes to perform many vital functions, and different computational techniques have been developed to identify protein complexes in protein-protein interaction (PPI) networks. These techniques are developed to search for subgraphs of high connectivity in PPI networks under the assumption that the proteins in a protein complex are highly interconnected. While these techniques have been shown to be quite effective, it is also possible that the matching rate between the protein complexes they discover and those that are previously determined experimentally be relatively low and the "false-alarm" rate can be relatively high. This is especially the case when the assumption of proteins in protein complexes being more highly interconnected be relatively invalid. To increase the matching rate and reduce the false-alarm rate, we have developed a technique that can work effectively without having to make this assumption. The name of the technique called protein complex identification by discovering functional interdependence (PCIFI) searches for protein complexes in PPI networks by taking into consideration both the functional interdependence relationship between protein molecules and the network topology of the network. The PCIFI works in several steps. The first step is to construct a multiple-function protein network graph by labeling each vertex with one or more of the molecular functions it performs. The second step is to filter out protein interactions between protein pairs that are not functionally interdependent of each other in the statistical sense. The third step is to make use of an information-theoretic measure to determine the strength of the functional interdependence between all remaining interacting protein pairs. Finally, the last step is to try to form protein complexes based on the measure of the strength of functional interdependence and the connectivity between proteins. For performance evaluation, PCIFI was used to identify protein complexes in real PPI network data and the protein complexes it found were matched against those that were previously known in MIPS. The results show that PCIFI can be an effective technique for the identification of protein complexes. The protein complexes it found can match more known protein complexes with a smaller false-alarm rate and can provide useful insights into the understanding of the functional interdependence relationships between proteins in protein complexes.  相似文献   

4.
半监督学习中的Tri-Training算法打破了以往算法对充分冗余视图的限制,并通过利用三个分类器处理标记置信度和样本预测问题提高了标记效率.为进一步增强协同训练过程中分类器之间的差异性以提高性能,本文在其理论基础上提出了一种增强差异性的半监督协同分类算法.该算法利用三个不同的分类器进行学习;考虑到分类模型在更新过程中,可能会因随机抽样导致性能恶化,该算法利用基于标记类别的分层抽样法来对已标记样本集进行抽样,并通过基于分类正确率的加权投票法实现了分类器的集成,提高了预测准确率.本文通过实验对所提出算法与Tri-Training算法做了性能比较,实验结果表明本文所提出的方法在分类问题上具有较好的性能,验证了该算法的有效性和可行性.  相似文献   

5.
类不均衡的半监督高斯过程分类算法   总被引:1,自引:0,他引:1  
针对传统的监督学习方法难以解决真实数据集标记信息少、训练样本集中存在类不均衡的问题,提出了类不均衡的半监督高斯过程分类算法。算法引入自训练的半监督学习思想,结合高斯过程分类算法计算后验概率,向未标记数据中注入类标记以获得更多准确可信的标记数据,使得训练样本的类分布相对平衡,分类器自适应优化以获得较好的分类效果。实验结果表明,在类不均衡的训练样本及标记信息过少的情况下,该算法通过自训练分类器获得了有效标记,使分类精度得到了有效提高,为解决类不均衡数据分类提供了一个新的思路。  相似文献   

6.
Canonical correlation analysis (CCA) is an efficient method for dimensionality reduction on two-view data. However, as an unsupervised learning method, CCA cannot utilize partly given label information in multi-view semi-supervised scenarios. In this paper, we propose a novel two-view semi-supervised learning method, called semi-supervised canonical correlation analysis based on label propagation (LPbSCCA). LPbSCCA incorporates a new sparse representation based label propagation algorithm to infer label information for unlabeled data. Specifically, it firstly constructs dictionaries consisting of all labeled samples; and then obtains reconstruction coefficients of unlabeled samples using sparse representation technique; at last, by combining given labels of labeled samples, estimates label information for unlabeled ones. After that, it constructs soft label matrices of all samples and probabilistic within-class scatter matrices in each view. Finally, in order to enhance discriminative power of features, it is formulated to maximize the correlations between samples of the same class from cross views, while minimizing within-class variations in the low-dimensional feature space of each view simultaneously. Furthermore, we also extend a general model called LPbSMCCA to handle data from multiple (more than two) views. Extensive experimental results from several well-known datasets demonstrate that the proposed methods can achieve better recognition performances and robustness than existing related methods.  相似文献   

7.
余游  冯林  王格格  徐其凤 《电子学报》2019,47(11):2284-2291
如何将带有大量标记数据的源域知识模型迁移至带有少量标记数据的目标域是少样本学习研究领域的热点问题.针对现有的少样本学习算法在源域数据与目标域数据的特征分布差异较大时存在的泛化能力较弱的问题,提出一种基于伪标签的半监督少样本学习模型FSLSS(Few-Shot Learning based on Semi-Supervised).首先,利用pytorch深度学习框架建立一个关系型深度学习网络,并使用源域数据对网络进行预训练;然后,使用此网络对目标域数据进行分类预测,将分类概率最大的类标签作为数据的伪标签;最后,利用目标域的伪标签数据和源域的真实标签数据对网络进行混合训练,并重复伪标签标记与混合训练过程.实验结果表明,相对于现有主流少样本学习算法,FSLSS模型有更好的泛化能力及知识迁移效果.  相似文献   

8.
邓小龙  温颖 《电子学报》2016,44(9):2114-2120
社团结构划分对于分析复杂网络的统计特性非常重要.在非均匀社交网络的信息传播中,社团结构划分更是一个广泛关注的研究热点,相关研究往往侧重于研究紧密连接的社团结构对于信息传播所产生的关键影响.传统社团划分方法大多基于点和边的相关特性进行构建,如标签传播算法LPA(Label Propagation Algorithm)通过半监督机器学习方法,基于网络节点标签的智能交换和社团融合过程进行社团划分,但运行效率较低.为提高LPA类算法的运行速度,使其快速收敛,并提高社团划分精度,特别是重叠社团划分精度,针对LPA算法划分中的低运行效率和低融合收敛速度,本文从标签传播的网络连接矩阵本质出发,将该矩阵的最大非零特征值与网络标签信息传播的阀值相结合,提出了新的基于传染病传播模型的社团划分方法(简称ESLPA算法,Epidemic Spreading LPA).通过经典LFR Benchmark模拟测试网络、随机网络以及真实社交网络数据上的算法验证,结果表明该算法时间复杂度大幅优于经典LPA算法,在重叠社团划分上精确度优于基于LPA模型的经典COPRA算法,特别是在重叠社团较明显时,划分精度接近精度较高GA、N-cut和A-cut算法,明显优于GN、FastGN和CPM等经典算法.  相似文献   

9.
Automatic image annotation has emerged as a hot research topic in the last two decades due to its application in social images organization. Most studies treat image annotation as a typical multi-label classification problem, where the shortcoming of this approach lies in that in order to a learn reliable model for label prediction, it requires sufficient number of training images with accurate annotations. Being aware of this, we develop a novel graph regularized low-rank feature mapping for image annotation under semi-supervised multi-label learning framework. Specifically, the proposed method concatenate the prediction models for different tags into a matrix, and introduces the matrix trace norm to capture the correlations among different labels and control the model complexity. In addition, by using graph Laplacian regularization as a smooth operator, the proposed approach can explicitly take into account the local geometric structure on both labeled and unlabeled images. Moreover, considering the tags of labeled images tend to be missing or noisy, we introduce a supplementary ideal label matrix to automatically fill in the missing tags as well as correct noisy tags for given training images. Extensive experiments conducted on five different multi-label image datasets demonstrate the effectiveness of the proposed approach.  相似文献   

10.
余立  李哲  高飞  袁向阳  杨永 《电信科学》2021,37(10):136-142
质差用户识别是降低用户投诉率、提升用户满意度的重要环节。针对当前电信网络系统中业务感知相关的大量结构化及非结构化数据难以有效标注、质差用户标签不完备、现有监督学习模型训练样本不均衡而导致质差识别率低的问题,采用改进自训练半监督学习模型,利用少量满意度低分和投诉用户作为质差用户标签对网络数据进行标注,并通过标签迁移对大量未标注数据进行训练识别质差用户。实验表明,相比于识别准确率高但是训练成本高的全监督学习和识别准确率低的无监督学习,半监督学习可以充分利用无标签样本数据进行有效训练,保证较低训练成本的同时显著提升质差用户识别准确率。  相似文献   

11.
吴莹  罗明 《信号处理》2018,34(6):661-667
为解决在雷达信号分类识别过程中训练样本较少的问题,本文提出了联合主动学习和半监督学习,并对其伪标记样本进行迭代验证改进的分类算法。针对复杂的电磁环境下雷达信号识别率低的问题,本文将径向高斯核时频分析应用于雷达信号,并对时频分布进行奇异值分解,提取出奇异向量作为雷达信号识别的特征参数。针对传统的半监督主动学习算法的不足,利用改进的半监督主动学习算法构建分类器,该算法通过对伪标记样本进行迭代验证来提高伪标记信息的准确性,从而改善了最终的分类性能,实现了在可获取的有标签样本数量较少的条件下对雷达信号的高概率识别。仿真结果表明,本文提出的特征识别方法可以获得较高的识别率。   相似文献   

12.
基于深度学习的合成孔径雷达(SAR)舰船目标检测近年得到了快速发展。然而,传统有监督学习需要大量的标记样本来训练网络。针对此问题,该文提出一种基于图注意力网络(GAT)的半监督SAR舰船目标检测方法。首先,设计了对称卷积神经网络用于海陆分割。随后,完成超像素分割并将超像素块建模为GAT的节点,利用感兴趣区域池化层提取节点的多尺度特征。GAT采用注意力机制自适应地汇聚邻接节点特征实现对无标记节点的分类。最后,将预测为舰船目标的超像素块定位到SAR图像中并获得精细检测结果。在实测高分辨SAR图像数据集上验证了所提方法。结果表明该方法可以在少量标记样本下,以低虚警率实现对舰船目标的可靠检测。  相似文献   

13.
It is time-consuming and expensive to gather and label the growing multimedia data that is easily accessible with the prodigious development of Internet technology and digital sensors. Hence, it is essential to develop a technique that can efficiently be utilized for the large-scale multimedia data especially when labeled data is rare. Active learning is showing to be one useful approach that greedily chooses queries from unlabeled data to be labeled for further learning and then minimizes the estimated expected learning error. However, most active learning methods only take into account the labeled data in the training of the classifier. In this paper, we introduce a semi-supervised algorithm to learn the classifier and then perform active learning scheme on top of the semi-supervised scheme. Particularly, we employ Hessian regularization into support vector machine to boost the classifier. Hessian regularization exploits the potential geometry structure of data space (including labeled and unlabeled data) and then significantly leverages the performance in each round. To evaluate the proposed algorithm, we carefully conduct extensive experiments including image segmentation and human activity recognition on popular datasets respectively. The experimental results demonstrate that our method can achieve a better performance than the traditional active learning methods.  相似文献   

14.
Machine learning techniques require an enormous amount of high-quality data labeling for more naturally simulating human comprehension. Recently, mobile crowdsensing, as a new paradigm, makes it possible that a large number of instances can be often quickly labeled at low cost. Existing works only focus on the single labeling for supervised learning problems of traditional machine learning, where one instance associates with only label. However, in many real world applications, an instance may have more than one label. To the end, in this paper, we explore an incremental multi-labeling issue by incentivizing crowd users to label instances under the budget constraint, where each instance is composed of multiple labels. Considering both uncertainty and diversity of the number of each instance’s labels, this paper proposes two mechanisms for incremental multi-labeling crowdsensing by introducing both uncertainty and diversity. Through extensive simulations, we validate their theoretical properties and evaluate the performance.  相似文献   

15.
针对软件定义网络(SDN)链路故障发生时的路由收敛问题,提出了Q-Learning子拓扑收敛技术(QL-STCT)实现软件定义网络链路故障时的路由智能收敛。首先,选取网络中的部分节点作为枢纽节点,依据枢纽节点进行枢纽域的划分。然后,以枢纽域为单位构建区域特征,利用特征提出强化学习智能体探索策略来加快强化学习收敛。最后,通过强化学习构建子拓扑网络用于规划备用路径,并保证在周期窗口内备用路径的性能。实验仿真结果表明,所提方法能够有效提高链路故障网络的收敛速度与性能。  相似文献   

16.
为了解决通信辐射源个体中标签获取难问题,引入半监督机器学习理论,提出了一种基于预测置信度进行迭代的半监督学习算法(Improved Transductive Support Vector Machine Iterative Algorithm Based on the Confidence of Prediction,CP-TSVM)。该方法在TSVM算法的基础上,充分利用无标签样本,根据预测结果置信度进行迭代,能够大幅度减少分类器的运算量。计算机仿真表明,在有标签样本数目占总样本2%的情况下,CP-TSVM较TSVM算法在保证识别准确率的同时,模型训练时间缩短近60 s。  相似文献   

17.
Nonnegative matrix factorization (NMF) is a popular method for low-rank approximation of nonnegative matrix, providing a useful tool for representation learning that is valuable for clustering and classification. When a portion of data are labeled, the performance of clustering or classification is improved if the information on class labels is incorporated into NMF. To this end, we present semi-supervised NMF (SSNMF), where we jointly incorporate the data matrix and the (partial) class label matrix into NMF. We develop multiplicative updates for SSNMF to minimize a sum of weighted residuals, each of which involves the nonnegative 2-factor decomposition of the data matrix or the label matrix, sharing a common factor matrix. Experiments on document datasets and EEG datasets in BCI competition confirm that our method improves clustering as well as classification performance, compared to the standard NMF, stressing that semi-supervised NMF yields semi-supervised feature extraction.  相似文献   

18.
新近的基于图神经网络(GNN)的轴承半监督故障诊断研究仍存在标签信息挖掘不充分和诊断场景较理想等问题。工程实际中,轴承经常运行于启停等时变转速工况,且故障标签样本的获取成本越发昂贵。针对以上挑战,该文提出时变转速下基于改进图注意力网络(GAT)的轴承半监督故障诊断新方法。基于K最近邻(KNN)算法和平滑假设(SA)设计伪标签传播策略,将标签信息沿边传播给分布相似的邻域样本,从而充分利用有限样本的标签信息。将每个振动频谱样本视为一个节点,构建基于节点级图注意力网络的半监督学习模型,通过注意力机制进一步挖掘代表性的轴承故障特征。将所提方法用于分析两组时变转速下轴承故障实验数据,结果表明所提方法能够在不超过2%的低标签率情况下,准确诊断轴承的不同故障模式,性能优于其他常用的图神经网络半监督学习方法。  相似文献   

19.
In view of the strong randomness and pre-setting the related threshold of traditional overlapping community detection method based on label propagation,overlapping community detection in complex networks based on multi kernel label propagation (OMKLP) was proposed.Evaluation model of kernel nodes was proposed after analyzing the node's degree and local covering density of nodes and their neighbor nodes.And on this basis,the detection method of local kernel nodes was also presented.Based on local kernel nodes,a new asynchronous label propagation strategy ori-ented to overlapping community was proposed,which can rapidly distinguish inner nodes and outer nodes of communi-ties so as to obtain overlapping community structure.The analysis method of overlapping nodes was proposed to increase the accuracy of detecting overlapping nodes.Without any prior knowledge,only on the basis of the basic network infor-mation (nodes and links),the algorithm can detect the structure of overlapping communities accurately.Therefore,it ef-fectively solved the defect of the traditional label propagation algorithm.The algorithm was tested over benchmark net-works and real-world networks and also compared with some classic algorithms.The experiment results verify the valid-ity and feasibility of OMKLP.  相似文献   

20.
万建武  杨明  陈银娟 《电子学报》2012,40(7):1410-1415
代价敏感学习是机器学习领域的一个研究热点.在实际应用中,数据集往往是不平衡的,存在着大量的无标签样本,只有少量的有标签样本,并且存在噪声.虽然针对该情况的代价敏感学习方法的研究已取得了一定的进展,但还需要进一步的深入研究.为此,本文提出了一种基于代价敏感的半监督Laplacian支持向量机.该模型在采用无标签扩展策略的基础上,将考虑了数据不平衡的错分代价融入到Laplacian支持向量机的经验损失和Laplacian正则化项中.考虑到噪声样本对决策平面的影响,本文定义了一种样本依赖的代价,对噪声样本赋予较低的权重.在7个UCI数据集和8个NASA软件数据集上的实验结果表明了本文算法的有效性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号