首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Recent years have witnessed a surge of interest in graph-based semi-supervised learning. However, two of the major problems in graph-based semi-supervised learning are: (1) how to set the hyperparameter in the Gaussian similarity; and (2) how to make the algorithm scalable. In this article, we introduce a general framework for graphbased learning. First, we propose a method called linear neighborhood propagation, which can automatically construct the optimal graph. Then we introduce a novel multilevel scheme to make our algorithm scalable for large data sets. The applications of our algorithm to various real-world problems are also demonstrated.  相似文献   

2.
Image retrieval based on augmented relational graph representation   总被引:1,自引:1,他引:0  
The “semantic gap” problem is one of the main difficulties in image retrieval tasks. Semi-supervised learning, typically integrated with the relevance feedback techniques, is an effective method to narrow down the semantic gap. However, in semi-supervised learning, the amount of unlabeled data is usually much greater than that of labeled data. Therefore, the performance of a semi-supervised learning algorithm relies heavily on its effectiveness of using the relationships between the labeled and unlabeled data. This paper proposes a novel algorithm to better explore those relationships by augmenting the relational graph representation built on the entire data set, expected to increase the intra-class weights while decreasing the inter-class weights and linking the potential intra-class data. The augmented relational matrix can be directly used in any semi-supervised learning algorithms. The experimental results in a range of feedback-based image retrieval tasks show that the proposed algorithm not only achieves good generality, but also outperforms other algorithms in the same semi-supervised learning framework.  相似文献   

3.
张亮  杜子平  李杨  张俊 《计算机工程》2011,37(8):202-203
采用数据点的结构信息可以提高半监督学习的性能。为此,提出一种基于图的半监督学习方法。利用局部尺度转换对不同密度区域中的边权重设置不同的尺度参数,在此基础上构造图的拉普拉斯核分类器进行分类学习。在多个数据集上的实验显示该方法优于其他基于核的半监督分类方法。  相似文献   

4.
特征选择旨在降低高维度特征空间,进而简化问题和优化学习方法。已有的研究显示特征提取方法能够有效降低监督学习的情感分类中的特征维度空间。同以往研究不一样的是,该文首次探讨半监督情感分类中的特征提取方法,提出一种基于二部图的特征选择方法。该方法首先借助二部图模型来表述文档与单词间的关系;然后,结合小规模标注样本的标签信息和二部图模型,利用标签传播(LP)算法计算每个特征的情感概率;最后,按照特征的情感概率进行排序进而实现特征选择。多个领域的实验结果表明,在半监督情感分类任务中,基于二部图的特征选择方法明显优于随机特征选择,在保证分类效果不下降(甚至提高)的前提下有效降低了特征空间维度。  相似文献   

5.
In class (cluster) formation process of machine learning techniques, data instances are usually assumed to have equal relevance. However, it is frequently not true. Such a situation is more typical in semi-supervised learning since we have to understand the data structure of both labeled and unlabeled data at the same time. In this paper, we investigate the organizational heterogeneity of data in semi-supervised learning using graph representation. This is because graph is a natural choice to characterize relationship between any pair of nodes or any pair of groups of nodes, consequently, strategical location of each node or each group of nodes can be determined by graph measures. Specifically, two issues are addressed: (1) We propose an adaptive graph construction method, we call AdaRadius, considering the heterogeneity of local interacting structure among nodes. As a result, it presents several interesting properties, namely adaptability to data density variations, low dependency on parameters setting, and reasonable computational cost, for both pool based and incremental data. (2) Moreover, we present heuristic criteria for selecting representative data samples to be labeled. Experimental study shows that selective labeling usually gets better classification results than random labeling. To our knowledge, it still lacks investigation on both issues up to now, therefore, our approach presents an important step toward the data heterogeneity characterization not only in semi-supervised learning, but also in general machine learning.  相似文献   

6.
Manifold-ranking is a powerful method in semi-supervised learning, and its performance heavily depends on the quality of the constructed graph. In this paper, we propose a novel graph structure named k-regular nearest neighbor (k-RNN) graph as well as its constructing algorithm, and apply the new graph structure in the framework of manifold-ranking based retrieval. We show that the manifold-ranking algorithm based on our proposed graph structure performs better than that of the existing graph structures such as k-nearest neighbor (k-NN) graph and connected graph in image retrieval, 2D data clustering as well as 3D model retrieval. In addition, the automatic sample reweighting and graph updating algorithms are presented for the relevance feedback of our algorithm. Experiments demonstrate that the proposed algorithm outperforms the state-of-the-art algorithms.  相似文献   

7.
Recently, newly invented features (e.g. Fisher vector, VLAD) have achieved state-of-the-art performance in large-scale video analysis systems that aims to understand the contents in videos, such as concept recognition and event detection. However, these features are in high-dimensional representations, which remarkably increases computation costs and correspondingly deteriorates the performance of subsequent learning tasks. Notably, the situation becomes even worse when dealing with large-scale video data where the number of class labels are limited. To address this problem, we propose a novel algorithm to compactly represent huge amounts of unconstrained video data. Specifically, redundant feature dimensions are removed by using our proposed feature selection algorithm. Considering unlabeled videos that are easy to obtain on the web, we apply this feature selection algorithm in a semi-supervised framework coping with a shortage of class information. Different from most of the existing semi-supervised feature selection algorithms, our proposed algorithm does not rely on manifold approximation, i.e. graph Laplacian, which is quite expensive for a large number of data. Thus, it is possible to apply the proposed algorithm to a real large-scale video analysis system. Besides, due to the difficulty of solving the non-smooth objective function, we develop an efficient iterative approach to seeking the global optimum. Extensive experiments are conducted on several real-world video datasets, including KTH, CCV, and HMDB. The experimental results have demonstrated the effectiveness of the proposed algorithm.  相似文献   

8.
Graph structure is vital to graph based semi-supervised learning. However, the problem of constructing a graph that reflects the underlying data distribution has been seldom investigated in semi-supervised learning, especially for high dimensional data. In this paper, we focus on graph construction for semi-supervised learning and propose a novel method called Semi-Supervised Classification based on Random Subspace Dimensionality Reduction, SSC-RSDR in short. Different from traditional methods that perform graph-based dimensionality reduction and classification in the original space, SSC-RSDR performs these tasks in subspaces. More specifically, SSC-RSDR generates several random subspaces of the original space and applies graph-based semi-supervised dimensionality reduction in these random subspaces. It then constructs graphs in these processed random subspaces and trains semi-supervised classifiers on the graphs. Finally, it combines the resulting base classifiers into an ensemble classifier. Experimental results on face recognition tasks demonstrate that SSC-RSDR not only has superior recognition performance with respect to competitive methods, but also is robust against a wide range of values of input parameters.  相似文献   

9.
In this paper, we propose a novel semi-supervised learning approach based on nearest neighbor rule and cut edges. In the first step of our approach, a relative neighborhood graph based on all training samples is constructed for each unlabeled sample, and the unlabeled samples whose edges are all connected to training samples from the same class are labeled. These newly labeled samples are then added into the training samples. In the second step, standard self-training algorithm using nearest neighbor rule is applied for classification until a predetermined stopping criterion is met. In the third step, a statistical test is applied for label modification, and in the last step, the remaining unlabeled samples are classified using standard nearest neighbor rule. The main advantages of the proposed method are: (1) it reduces the error reinforcement by using relative neighborhood graph for classification in the initial stages of semi-supervised learning; (2) it introduces a label modification mechanism for better classification performance. Experimental results show the effectiveness of the proposed approach.  相似文献   

10.
Graph-Based label propagation algorithms are popular in the state-of-the-art semi-supervised learning research. The key idea underlying this algorithmic family is to enforce labeling consistency between any two examples with a positive similarity. However, negative similarities or dissimilarities are equivalently valuable in practice. To this end, we simultaneously leverage similarities and dissimilarities in our proposed semi-supervised learning algorithm which we term Bidirectional Label Propagation (BLP). Different from previous label propagation mechanisms that proceed along a single direction of graph edges, the BLP algorithm can propagate labels along not only positive but also negative edge directions. By using an initial neighborhood graph and class assignment constraints inherent among the labeled examples, a set of class-specific graphs are learned, which include both positive and negative edges and thus reveal discriminative cues. Over the learned graphs, a convex propagation criterion is carried out to ensure consistent labelings along the positive edges and inconsistent labelings along the negative edges. Experimental evidence discovered in synthetic and real-world datasets validates excellent performance of the proposed BLP algorithm.  相似文献   

11.
根据聚类假设,提出一种新的基于图的半监督学习算法,称为密度敏感的半监督聚类。该算法引入一种密度敏感的距离测度,它能较好地反映聚类假设,并且充分挖掘了数据集中复杂的内在结构信息,同时与基于图的半监督学习方法相结合,使得算法在聚类性能上有了显著的提高。经过实验仿真进一步表明,该算法在特定图像应用上具有优越性。  相似文献   

12.
Dimension reduction (DR) is an efficient and effective preprocessing step of hyperspectral images (HSIs) classification. Graph embedding is a frequently used model for DR, which preserves some geometric or statistical properties of original data set. The embedding using simple graph only considers the relationship between two data points, while in real-world application, the complex relationship between several data points is more important. To overcome this problem, we present a linear semi-supervised DR method based on hypergraph embedding (SHGE) which is an improvement of semi-supervised graph learning (SEGL). The proposed SHGE method aims to find a projection matrix through building a semi-supervised hypergraph which can preserve the complex relationship of the data and the class discrimination for DR. Experimental results demonstrate that our method achieves better performance than some existing DR methods for HSIs classification and is time saving compared with the existed method SEGL which used simple graph.  相似文献   

13.
谱聚类是基于谱图划分理论的一种聚类算法,传统的谱聚类算法属于无监督学习算法,只能利用单一数据来进行聚类。针对这种情况,提出一种基于密度自适应邻域相似图的半监督谱聚类(DAN-SSC)算法。DAN-SSC算法在传统谱聚类算法的基础上结合了半监督学习的思想,很好地解决了传统谱聚类算法无法充分利用所有数据,不得不对一些有标签数据进行舍弃的问题;将少量的成对约束先验信息扩散至整个空间,使其能更好地对聚类过程进行指导。实验结果表明,DAN-SSC算法具有可行性和有效性。  相似文献   

14.
Semi-supervised graph clustering: a kernel approach   总被引:6,自引:0,他引:6  
Semi-supervised clustering algorithms aim to improve clustering results using limited supervision. The supervision is generally given as pairwise constraints; such constraints are natural for graphs, yet most semi-supervised clustering algorithms are designed for data represented as vectors. In this paper, we unify vector-based and graph-based approaches. We first show that a recently-proposed objective function for semi-supervised clustering based on Hidden Markov Random Fields, with squared Euclidean distance and a certain class of constraint penalty functions, can be expressed as a special case of the weighted kernel k-means objective (Dhillon et al., in Proceedings of the 10th International Conference on Knowledge Discovery and Data Mining, 2004a). A recent theoretical connection between weighted kernel k-means and several graph clustering objectives enables us to perform semi-supervised clustering of data given either as vectors or as a graph. For graph data, this result leads to algorithms for optimizing several new semi-supervised graph clustering objectives. For vector data, the kernel approach also enables us to find clusters with non-linear boundaries in the input data space. Furthermore, we show that recent work on spectral learning (Kamvar et al., in Proceedings of the 17th International Joint Conference on Artificial Intelligence, 2003) may be viewed as a special case of our formulation. We empirically show that our algorithm is able to outperform current state-of-the-art semi-supervised algorithms on both vector-based and graph-based data sets.  相似文献   

15.
情感分类是目前自然语言处理领域的一个热点研究问题。该文关注情感分类中的半监督学习方法(即基于少量标注样本和大量未标注样本进行学习的方式),提出了一种新的基于动态随机特征子空间的半监督学习方法。首先,动态生成多个随机特征子空间;然后,基于协同训练(Co-training)在每个特征子空间中挑选置信度高的未标注样本;最后使用这些挑选出的样本更新训练模型。实验结果表明我们的方法明显优于传统的静态产生方式及其他现有的半监督方法。此外该文还探索了特征子空间的划分数目问题。  相似文献   

16.
结合半监督核的高斯过程分类   总被引:1,自引:0,他引:1  
提出了一种半监督算法用于学习高斯过程分类器, 其通过结合非参数的半监督核向分类器提供未标记数据信息. 该算法主要包括以下几个方面: 1)通过图拉普拉斯的谱分解获得核矩阵, 其联合了标记数据和未标记数据信息; 2)采用凸最优化方法学习核矩阵特征向量的最优权值, 构建非参数的半监督核; 3)把半监督核整合到高斯过程模型中, 构建所提出的半监督学习算法. 该算法的主要特点是: 把基于整个数据集的非参数半监督核应用于高斯过程模型, 该模型有着明确的概率描述, 可以方便地对数据之间的不确定性进行建模, 并能够解决复杂的推论问题. 通过实验结果表明, 该算法与其他方法相比具有更高的可靠性.  相似文献   

17.
胡聪  吴小俊  舒振球  陈素根 《软件学报》2020,31(5):1525-1535
阶梯网络不仅是一种基于深度学习的特征提取器,而且能够应用于半监督学习中.深度学习在实现了复杂函数逼近的同时,也缓解了多层神经网络易陷入局部最小化的问题.传统的自编码、玻尔兹曼机等方法易忽略高维数据的低维流形结构信息,使用这些方法往往会获得无意义的特征表示,这些特征不能有效地嵌入到后续的预测或识别任务中.从流形学习的角度出发,提出一种基于阶梯网络的深度表示学习方法,即拉普拉斯阶梯网络LLN (Laplacian ladder network).拉普拉斯阶梯网络在训练的过程中不仅对每一编码层嵌入噪声并进行重构,而且在各重构层引入图拉普拉斯约束,将流形结构嵌入到多层特征学习中,以提高特征提取的鲁棒性和判别性.在有限的有标签数据情况下,拉普拉斯阶梯网络将监督学习损失和非监督损失融合到了统一的框架进行半监督学习.在标准手写数据数据集MNIST和物体识别数据集CIFAR-10上进行了实验,结果表明,相对于阶梯网络和其他半监督方法,拉普拉斯阶梯网络都得到了更好的分类效果,是一种有效的半监督学习算法.  相似文献   

18.
吕亚丽  苗钧重  胡玮昕 《计算机应用》2020,40(12):3430-3436
大多基于图的半监督学习方法,在样本间相似性度量时没有用到已有的和标签传播过程中得到的标签信息,同时,其度量方式相对固定,不能有效度量出分布结构复杂多样的数据样本间的相似性。针对上述问题,提出了基于标签进行度量学习的图半监督学习算法。首先,给定样本间相似性的度量方式,从而构建相似度矩阵。然后,基于相似度矩阵进行标签传播,筛选出k个低熵样本作为新确定的标签信息。最后,充分利用所有标签信息更新相似性度量方式,重复迭代优化直至学出所有标签信息。所提算法不仅利用标签信息改进了样本间相似性的度量方式,而且充分利用中间结果降低了半监督学习对标签数据的需求量。在6个真实数据集上的实验结果表明,该算法在超过95%的情况下相较三种传统的基于图的半监督学习算法取得了更高的分类准确率。  相似文献   

19.
吕亚丽  苗钧重  胡玮昕 《计算机应用》2005,40(12):3430-3436
大多基于图的半监督学习方法,在样本间相似性度量时没有用到已有的和标签传播过程中得到的标签信息,同时,其度量方式相对固定,不能有效度量出分布结构复杂多样的数据样本间的相似性。针对上述问题,提出了基于标签进行度量学习的图半监督学习算法。首先,给定样本间相似性的度量方式,从而构建相似度矩阵。然后,基于相似度矩阵进行标签传播,筛选出k个低熵样本作为新确定的标签信息。最后,充分利用所有标签信息更新相似性度量方式,重复迭代优化直至学出所有标签信息。所提算法不仅利用标签信息改进了样本间相似性的度量方式,而且充分利用中间结果降低了半监督学习对标签数据的需求量。在6个真实数据集上的实验结果表明,该算法在超过95%的情况下相较三种传统的基于图的半监督学习算法取得了更高的分类准确率。  相似文献   

20.
针对已有基于图嵌入的半监督算法的缺点,提出了一种半监督有局部差异的图嵌入算法。算法的思想是在保持数据的几何结构同时,最大化样本的差异信息,可有效地防止过学习问题。为了解决小样本问题,采用了差形式的目标函数,并通过参数来调整两部分样本所起作用的大小。最后在ORL和UMIST人脸库上进行了实验,实验结果明显优于已有2种经典算法的识别结果,最优时识别率提高了2.25%和2.23%。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号