首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
传统转导支持向量机有效地利用了未标记样本,具有较高的分类准确率,但是计算复杂度较高。针对该不足,论文提出了一种基于核聚类的启发式转导支持向量机学习算法。首先将未标记样本利用核聚类算法进行划分,然后对划分后的每一簇样本标记为同一类别,最后根据传统的转导支持向量机算法进行新样本集合上的分类学习。所提方法通过对核聚类后同一簇未标记样本赋予同样的类别,极大地降低了传统转导支持向量机算法的计算复杂度。在MNIST手写阿拉伯数字识别数据集上的实验表明,所提算法较好地保持了传统转导支持向量机分类精度高的优势。  相似文献   

2.
吕亚丽  苗钧重  胡玮昕 《计算机应用》2005,40(12):3430-3436
大多基于图的半监督学习方法,在样本间相似性度量时没有用到已有的和标签传播过程中得到的标签信息,同时,其度量方式相对固定,不能有效度量出分布结构复杂多样的数据样本间的相似性。针对上述问题,提出了基于标签进行度量学习的图半监督学习算法。首先,给定样本间相似性的度量方式,从而构建相似度矩阵。然后,基于相似度矩阵进行标签传播,筛选出k个低熵样本作为新确定的标签信息。最后,充分利用所有标签信息更新相似性度量方式,重复迭代优化直至学出所有标签信息。所提算法不仅利用标签信息改进了样本间相似性的度量方式,而且充分利用中间结果降低了半监督学习对标签数据的需求量。在6个真实数据集上的实验结果表明,该算法在超过95%的情况下相较三种传统的基于图的半监督学习算法取得了更高的分类准确率。  相似文献   

3.
吕亚丽  苗钧重  胡玮昕 《计算机应用》2020,40(12):3430-3436
大多基于图的半监督学习方法,在样本间相似性度量时没有用到已有的和标签传播过程中得到的标签信息,同时,其度量方式相对固定,不能有效度量出分布结构复杂多样的数据样本间的相似性。针对上述问题,提出了基于标签进行度量学习的图半监督学习算法。首先,给定样本间相似性的度量方式,从而构建相似度矩阵。然后,基于相似度矩阵进行标签传播,筛选出k个低熵样本作为新确定的标签信息。最后,充分利用所有标签信息更新相似性度量方式,重复迭代优化直至学出所有标签信息。所提算法不仅利用标签信息改进了样本间相似性的度量方式,而且充分利用中间结果降低了半监督学习对标签数据的需求量。在6个真实数据集上的实验结果表明,该算法在超过95%的情况下相较三种传统的基于图的半监督学习算法取得了更高的分类准确率。  相似文献   

4.
We study the problem of image retrieval based on semi-supervised learning. Semi-supervised learning has attracted a lot of attention in recent years. Different from traditional supervised learning. Semi-supervised learning makes use of both labeled and unlabeled data. In image retrieval, collecting labeled examples costs human efforts, while vast amounts of unlabeled data are often readily available and offer some additional information. In this paper, based on support vector machine (SVM), we introduce a semi-supervised learning method for image retrieval. The basic consideration of the method is that, if two data points are close to each, they should share the same label. Therefore, it is reasonable to search a projection with maximal margin and locality preserving property. We compare our method to standard SVM and transductive SVM. Experimental results show efficiency and effectiveness of our method.  相似文献   

5.
李远航  刘波  唐侨 《计算机科学》2014,41(11):260-264
主动学习已经广泛应用于图数据的研究,但应用于多标签图数据的分类较为少见。结合基于误差界最小化的主动学习,给出了一种多标签图数据的分类方法,即通过多标签分类与局部和全局的一致性学习(LLGC)得到一系列目标方程,并将其用于最小化直推式的拉德马赫复杂度,得到最小泛化误差上界,从而在图上获取少量的但蕴含巨大信息量的节点。实验证明,应用该方法的多标签分类器的输出有很高的精确度。  相似文献   

6.
Transduction is an inference mechanism adopted from several classification algorithms capable of exploiting both labeled and unlabeled data and making the prediction for the given set of unlabeled data only. Several transductive learning methods have been proposed in the literature to learn transductive classifiers from examples represented as rows of a classical double-entry table (or relational table). In this work we consider the case of examples represented as a set of multiple tables of a relational database and we propose a new relational classification algorithm, named TRANSC, that works in a transductive setting and employs a probabilistic approach to classification. Knowledge on the data model, i.e., foreign keys, is used to guide the search process. The transductive learning strategy iterates on a k-NN based re-classification of labeled and unlabeled examples, in order to identify borderline examples, and uses the relational probabilistic classifier Mr-SBC to bootstrap the transductive algorithm. Experimental results confirm that TRANSC outperforms its inductive counterpart (Mr-SBC).  相似文献   

7.
特征选择旨在降低高维度特征空间,进而简化问题和优化学习方法。已有的研究显示特征提取方法能够有效降低监督学习的情感分类中的特征维度空间。同以往研究不一样的是,该文首次探讨半监督情感分类中的特征提取方法,提出一种基于二部图的特征选择方法。该方法首先借助二部图模型来表述文档与单词间的关系;然后,结合小规模标注样本的标签信息和二部图模型,利用标签传播(LP)算法计算每个特征的情感概率;最后,按照特征的情感概率进行排序进而实现特征选择。多个领域的实验结果表明,在半监督情感分类任务中,基于二部图的特征选择方法明显优于随机特征选择,在保证分类效果不下降(甚至提高)的前提下有效降低了特征空间维度。  相似文献   

8.
针对直推式支持向量机(TSVM)学习模型求解难度大的问题,提出了一种基于k均值聚类的直推式支持向量机学习算法——TSVMKMC。该算法利用k均值聚类算法,将无标签样本分为若干簇,对每一簇样本赋予相同的类别标签,将无标签样本和有标签样本合并进行直推式学习。由于TSVMKMC算法有效地降低了状态空间的规模,因此运行速度较传统算法有了很大的提高。实验结果表明,TSVMSC算法能够以较快的速度达到较高的分类准确率。  相似文献   

9.
A general graph-based semi-supervised learning with novel class discovery   总被引:1,自引:0,他引:1  
In this paper, we propose a general graph-based semi-supervised learning algorithm. The core idea of our algorithm is to not only achieve the goal of semi-supervised learning, but also to discover the latent novel class in the data, which may be unlabeled by the user. Based on the normalized weights evaluated on data graph, our algorithm is able to output the probabilities of data points belonging to the labeled classes or the novel class. We also give the theoretical interpretations for the algorithm from three viewpoints on graph, i.e., regularization framework, label propagation, and Markov random walks. Experiments on toy examples and several benchmark datasets illustrate the effectiveness of our algorithm.  相似文献   

10.
提出一种基于图的半指导学习算法用于网页分类.采用k近邻算法构建一个带权图,图中节点为已标志或未标志的网页,连接边的权重表示类的传播概率,将网页分类问题形式化为图中类的概率传播.为有效利用图中未标志节点辅助分类,结合网页的内容信息和链接信息计算网页间的链接权重,通过已标志节点,类别信息以一定概率从已标志节点推向未标志节点.实验表明,本文提出的算法能有效改进网页分类结果.  相似文献   

11.
刁树民  王永利 《计算机应用》2009,29(6):1578-1581
在进行组合决策时,已有的组合分类方法需要对多个组合分类器均有效的公共已知标签训练样本。为了解决在没有已知标签样本的情况下数据流组合分类决策问题,提出一种基于约束学习的数据流组合分类器的融合策略。在判定测试样本上的决策时,根据直推学习理论设计满足每一个局部分类器约束度量的方法,保证了约束的可行性,解决了分布式分类聚集时最大熵的直推扩展问题。测试数据集上的实验证明,与已有的直推学习方法相比,此方法可以获得更好的决策精度,可以应用于数据流组合分类的融合。  相似文献   

12.
雷蕾  王晓丹  周进登 《计算机科学》2012,39(12):245-248
情感分类任务旨在自动识别文本所表达的情感色彩信息(例如,褒或者贬、支持或者反对)。提出一种基于情 绪词与情感词协作学习的情感分类方法:在基于传统情感词资源的基础上,引入少量情绪词辅助学习,只利用大规模 未标注数据实现情感分类。具体来讲,基于文档一单词二部图的标签传播算法框架,利用情绪词与情感词构建两个视 图,通过协作学习的方法从大规模未标注语料中抽取高正确率的自动标注样本作为训练数据,然后训练分类器进行情 感分类。实验表明,该方法在多个领域的情感分类任务中都取得了较好的分类效果。  相似文献   

13.
Graph-based learning algorithms including label propagation and spectral clustering are known as the effective state-of-the-art algorithms for a variety of tasks in machine learning applications. Given input data, i.e. feature vectors, graph-based methods typically proceed with the following three steps: (1) generating graph edges, (2) estimating edge weights and (3) running a graph based algorithm. The first and second steps are difficult, especially when there are only a few (or no) labeled instances, while they are important because the performance of graph-based methods heavily depends on the quality of the input graph. For the second step of the three-step procedure, we propose a new method, which optimizes edge weights through a local linear reconstruction error minimization under a constraint that edges are parameterized by a similarity function of node pairs. As a result our generated graph can capture the manifold structure of the input data, where each edge represents similarity of each node pair. To further justify this approach, we also provide analytical considerations for our formulation such as an interpretation as a cross-validation of a propagation model in the feature space, and an error analysis based on a low dimensional manifold model. Experimental results demonstrated the effectiveness of our adaptive edge weighting strategy both in synthetic and real datasets.  相似文献   

14.
Graph carries out a key role in graph-based semi-supervised label propagation, as it clarifies the structure of the data manifold. The performance of label propagation methods depends on the adopted graph and can be enhanced by merging different graphs that are obtained from multiple sources of information. While there exist algorithms that perform graph fusion they have several weaknesses. Most of these algorithms define graph fusion and label propagation as two separate tasks. Moreover, when the number of data expands, these strategies are not well-suited due to the use of transductive learning in the label propagation phase which makes the label prediction for unseen samples difficult. Furthermore, very few algorithms extract the information contained in the label space. Additionally, most of the graph fusion techniques adopt equal or static weights for different views, which is not the best choice as distinctive features (hence different graphs) contain various information. To overcome these shortcomings, we propose an Auto-weighted Multi-view Semi-Supervised Learning method (AMSSL), which is based on an inductive learning algorithm (i.e., Flexible Manifold Embedding) and profited a projection matrix for predicting the labels of out-of-sample data. The proposed AMSSL method represents a unified framework that dynamically fuses various information obtained from different features and also from the label space and adaptively designates appropriate weights according to the usefulness of each view. Our experimental results on seven small and large image datasets demonstrate the superiority of the proposed method compared to the use of one single feature and other state-of-the-art graph fusion methods.  相似文献   

15.
Transductive support vector machine (TSVM) is a well-known algorithm that realizes transductive learning in the field of support vector classification. This paper constructs a bi-fuzzy progressive transductive support vector machine (BFPTSVM) algorithm by combining the proposed notation of bi-fuzzy memberships for the temporary labeled sample appeared in progressive learning process and the sample-pruning strategy, which decreases the computation complexity and store memory of algorithm. Simulation experiments show that the BFPTSVM algorithm derives better classification performance and converges rapidly with better stability compared to the other learning algorithms.  相似文献   

16.
As a recently proposed machine learning method, active learning of Gaussian processes can effectively use a small number of labeled examples to train a classifier, which in turn is used to select the most informative examples from unlabeled data for manual labeling. However, in the process of example selection, active learning usually need consider all the unlabeled data without exploiting the structural space connectivity among them. This will decrease the classification accuracy to some extent since the selected points may not be the most informative. To overcome this shortcoming, in this paper, we present a method which applies the manifold-preserving graph reduction (MPGR) algorithm to the traditional active learning method of Gaussian processes. MPGR is a simple and efficient example sparsification algorithm which can construct a subset to represent the global structure and simultaneously eliminate the influence of noisy points and outliers. Thereby, when actively selecting examples to label, we just choose from the subset constructed by MPGR instead of the whole unlabeled data. We report experimental results on multiple data sets which demonstrate that our method obtains better classification performance compared with the original active learning method of Gaussian processes.  相似文献   

17.
In this paper, we propose a novel semi-supervised learning approach based on nearest neighbor rule and cut edges. In the first step of our approach, a relative neighborhood graph based on all training samples is constructed for each unlabeled sample, and the unlabeled samples whose edges are all connected to training samples from the same class are labeled. These newly labeled samples are then added into the training samples. In the second step, standard self-training algorithm using nearest neighbor rule is applied for classification until a predetermined stopping criterion is met. In the third step, a statistical test is applied for label modification, and in the last step, the remaining unlabeled samples are classified using standard nearest neighbor rule. The main advantages of the proposed method are: (1) it reduces the error reinforcement by using relative neighborhood graph for classification in the initial stages of semi-supervised learning; (2) it introduces a label modification mechanism for better classification performance. Experimental results show the effectiveness of the proposed approach.  相似文献   

18.
标记传播是使用最广泛的半监督分类方法之一。基于共识率的标记传播算法(Consensus Rate-based Label Propagation,CRLP)通过汇总多个聚类方法以合并数据各种属性得到的共识率来构造图。然而,CRLP算法与大多数基于图的半监督分类方法一样,在图中将每个标记样本视为同等重要,它们主要通过优化图的结构来提高算法的性能。事实上,样本不一定是均匀分布的,不同的样本在算法中的重要性也是不同的,并且CRLP算法容易受聚类数目和聚类方法的影响,对低维数据的适应性不足。针对这些问题,文中提出了一种基于加权样本和共识率的标记传播算法(Label Propagation Algorithm Based on Weighted Samples and Consensus-Rate,WSCRLP)。WSCRLP算法首先对数据集进行多次聚类,以探索样本的结构,并结合共识率和样本的局部信息构造图;然后为不同分布的标记样本分配不同的权重;最后基于构造的图和加权样本进行半监督分类。在真实数据集上的实验表明,WSCRLP算法对标记样本进行加权和构造图的方法可以显著提高分类准确率,在84%的实验中都优于对比方法。相比CRLP算法,WSCRLP算法不仅具有更好的性能,而且对输入参数具有鲁棒性。  相似文献   

19.
There has been recently a growing interest in the use of transductive inference for learning. We expand here the scope of transductive inference to active learning in a stream-based setting. Towards that end this paper proposes Query-by-Transduction (QBT) as a novel active learning algorithm. QBT queries the label of an example based on the p-values obtained using transduction. We show that QBT is closely related to Query-by-Committee (QBC) using relations between transduction, Bayesian statistical testing, Kullback-Leibler divergence, and Shannon information. The feasibility and utility of QBT is shown on both binary and multi-class classification tasks using SVM as the choice classifier. Our experimental results show that QBT compares favorably, in terms of mean generalization, against random sampling, committee-based active learning, margin-based active learning, and QBC in the stream-based setting.  相似文献   

20.
李明  杨艳屏  占惠融 《自动化学报》2010,36(12):1655-1660
基于图的算法已经成为半监督学习中的一种流行方法, 该方法把数据定义为图的节点, 用图的边表示数据之间的关系, 在各种数据分布情况下都具有很高的分类准确度. 然而图方法的计算复杂度比较高, 当图的规模比较大时, 计算所需要的时间和存储都非常大, 这在一定程度上限制了图方法的使用. 因此, 如何控制图的大小是基于图的半监督学习算法中的一个重要问题. 本文提出了一种基于密度估计的快速聚类方法, 可以在局部范围对数据点进行聚类, 以聚类形成的子集作为构图的节点, 从而大大降低了图的复杂度. 新的聚类方法计算量较小, 通过推导得到的距离函数能较好地保持原有数据分布. 实验结果表明, 通过局部聚类后构建的小图在分类效果上与在原图上的结果相当, 同时在计算速度上有极大的提高.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号