首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A Kernel Approach for Semisupervised Metric Learning   总被引:1,自引:0,他引:1  
While distance function learning for supervised learning tasks has a long history, extending it to learning tasks with weaker supervisory information has only been studied recently. In particular, some methods have been proposed for semisupervised metric learning based on pairwise similarity or dissimilarity information. In this paper, we propose a kernel approach for semisupervised metric learning and present in detail two special cases of this kernel approach. The metric learning problem is thus formulated as an optimization problem for kernel learning. An attractive property of the optimization problem is that it is convex and, hence, has no local optima. While a closed-form solution exists for the first special case, the second case is solved using an iterative majorization procedure to estimate the optimal solution asymptotically. Experimental results based on both synthetic and real-world data show that this new kernel approach is promising for nonlinear metric learning  相似文献   

2.
Constrained clustering methods (that usually use must-link and/or cannot-link constraints) have been received much attention in the last decade. Recently, kernel adaptation or kernel learning has been considered as a powerful approach for constrained clustering. However, these methods usually either allow only special forms of kernels or learn non-parametric kernel matrices and scale very poorly. Therefore, they either learn a metric that has low flexibility or are applicable only on small data sets due to their high computational complexity. In this paper, we propose a more efficient non-linear metric learning method that learns a low-rank kernel matrix from must-link and cannot-link constraints and the topological structure of data. We formulate the proposed method as a trace ratio optimization problem and learn appropriate distance metrics through finding optimal low-rank kernel matrices. We solve the proposed optimization problem much more efficiently than SDP solvers. Additionally, we show that the spectral clustering methods can be considered as a special form of low-rank kernel learning methods. Extensive experiments have demonstrated the superiority of the proposed method compared to recently introduced kernel learning methods.  相似文献   

3.
Metric learning has been widely studied in machine learning due to its capability to improve the performance of various algorithms. Meanwhile, multi-task learning usually leads to better performance by exploiting the shared information across all tasks. In this paper, we propose a novel framework to make metric learning benefit from jointly training all tasks. Based on the assumption that discriminative information is retained in a common subspace for all tasks, our framework can be readily used to extend many current metric learning methods. In particular, we apply our framework on the widely used Large Margin Component Analysis (LMCA) and yield a new model called multi-task LMCA. It performs remarkably well compared to many competitive methods. Besides, this method is able to learn a low-rank metric directly, which effects as feature reduction and enables noise compression and low storage. A series of experiments demonstrate the superiority of our method against three other comparison algorithms on both synthetic and real data.  相似文献   

4.
In this paper, we propose a novel approach for automatic mine detection in SOund NAvigation and Ranging (SONAR) data. The proposed framework relies on possibilistic‐based fusion method to classify SONAR instances as mine or mine‐like object. The proposed semisupervised algorithm minimizes some objective function, which combines context identification, multi‐algorithm fusion criteria, and a semisupervised learning term. The optimization aims to learn contexts as compact clusters in subspaces of the high‐dimensional feature space via possibilistic semisupervised learning and feature discrimination. The semisupervised clustering component assigns degree of typicality to each data sample to identify and reduce the influence of noise points and outliers. Then, the approach yields optimal fusion parameters for each context. The experiments on synthetic data sets and standard SONAR data set show that our semisupervised local fusion outperforms individual classifiers and unsupervised local fusion.  相似文献   

5.
The recent years have witnessed a surge of interest in semisupervised learning. Numerous methods have been proposed for learning from partially labeled data. In this brief, a novel semisupervised learning approach based on an electrostatic field model is proposed. We treat the labeled data points as point charges, therefore the remaining unlabeled data points are placed in the electrostatic fields generated by these charges. The labels of these unlabeled data points can be regarded as the electric potentials of the electrostatic field at their corresponding places. Moreover, we also develop an efficient way to extend our method for out-of-sample data and analyze theoretically the relationship between our method and the traditional graph-based methods. Finally, the experimental results on both toy and real-world data sets are provided to show the effectiveness of our method.   相似文献   

6.
The results of traditional clustering methods are usually unreliable as there is not any guidance from the data labels, while the class labels can be predicted more reliable by the semisupervised learning if the labels of partial data are given. In this paper, we propose an actively self-training clustering method, in which the samples are actively selected as training set to minimize an estimated Bayes error, and then explore semisupervised learning to perform clustering. Traditional graph-based semisupervised learning methods are not convenient to estimate the Bayes error; we develop a specific regularization framework on graph to perform semisupervised learning, in which the Bayes error can be effectively estimated. In addition, the proposed clustering algorithm can be readily applied in a semisupervised setting with partial class labels. Experimental results on toy data and real-world data sets demonstrate the effectiveness of the proposed clustering method on the unsupervised and the semisupervised setting. It is worthy noting that the proposed clustering method is free of initialization, while traditional clustering methods are usually dependent on initialization.  相似文献   

7.
The traditional setting of supervised learning requires a large amount of labeled training examples in order to achieve good generalization. However, in many practical applications, unlabeled training examples are readily available, but labeled ones are fairly expensive to obtain. Therefore, semisupervised learning has attracted much attention. Previous research on semisupervised learning mainly focuses on semisupervised classification. Although regression is almost as important as classification, semisupervised regression is largely understudied. In particular, although cotraining is a main paradigm in semisupervised learning, few works has been devoted to cotraining-style semisupervised regression algorithms. In this paper, a cotraining-style semisupervised regression algorithm, that is, COREG, is proposed. This algorithm uses two regressors, each labels the unlabeled data for the other regressor, where the confidence in labeling an unlabeled example is estimated through the amount of reduction in mean squared error over the labeled neighborhood of that example. Analysis and experiments show that COREG can effectively exploit unlabeled data to improve regression estimates.  相似文献   

8.
Semisupervised learning has been of growing interest over the past years and many methods have been proposed. While existing semisupervised methods have shown some promising empirical performances, their development has been based largely on heuristics. In this paper, we investigate semisupervised multicategory classification with an imperfect mixture density model. In the proposed model, the training data come from a probability distribution, which can be modeled imperfectly by an identifiable mixture distribution. Furthermore, we propose a semisupervised multicategory classification method and establish its generalization error bounds. The theoretical analysis illustrates that the proposed method can utilize unlabeled data effectively and can achieve fast convergence rate.  相似文献   

9.
Software quality assurance is a vital component of software project development. A software quality estimation model is trained using software measurement and defect (software quality) data of a previously developed release or similar project. Such an approach assumes that the development organization has experience with systems similar to the current project and that defect data are available for all modules in the training data. In software engineering practice, however, various practical issues limit the availability of defect data for modules in the training data. In addition, the organization may not have experience developing a similar system. In such cases, the task of software quality estimation or labeling modules as fault prone or not fault prone falls on the expert. We propose a semisupervised clustering scheme for software quality analysis of program modules with no defect data or quality-based class labels. It is a constraint-based semisupervised clustering scheme that uses k-means as the underlying clustering algorithm. Software measurement data sets obtained from multiple National Aeronautics and Space Administration software projects are used in our empirical investigation. The proposed technique is shown to aid the expert in making better estimations as compared to predictions made when the expert labels the clusters formed by an unsupervised learning algorithm. In addition, the software quality knowledge learnt during the semisupervised process provided good generalization performance for multiple test data sets. An analysis of program modules that remain unlabeled subsequent to our semisupervised clustering scheme provided useful insight into the characteristics of their software attributes  相似文献   

10.
In this paper, we present a locality-constrained nonnegative robust shape interaction (LNRSI) subspace clustering method. LNRSI integrates the local manifold structure of data into the robust shape interaction (RSI) in a unified formulation, which guarantees the locality and the low-rank property of the optimal affinity graph. Compared with traditional low-rank representation (LRR) learning method, LNRSI can not only pursuit the global structure of data space by low-rank regularization, but also keep the locality manifold, which leads to a sparse and low-rank affinity graph. Due to the clear block-diagonal effect of the affinity graph, LNRSI is robust to noise and occlusions, and achieves a higher rate of correct clustering. The theoretical analysis of the clustering effect is also discussed. An efficient solution based on linearized alternating direction method with adaptive penalty (LADMAP) is built for our method. Finally, we evaluate the performance of LNRSI on both synthetic data and real computer vision tasks, i.e., motion segmentation and handwritten digit clustering. The experimental results show that our LNRSI outperforms several state-of-the-art algorithms.  相似文献   

11.
基于稀疏表示和字典学习的超分辨率重建算法没有对图像进行分解,直接将整幅图像的信息都进行了学习重建.由低秩矩阵理论知,可将图像分解成低秩部分和稀疏部分.根据图像各部分信息的特征分别用不同的方法进行超分辨率重建,将能更加有效地利用图像的特征.据此提出了一种基于低秩矩阵和字典学习的超分辨率重建方法.该方法首先通过对图像进行低秩分解得到图像的低秩部分和稀疏部分,图像的低秩部分保留了图像的大部分信息.算法只对图像的低秩部分通过字典学习的方法进行超分辨率重建,图像的稀疏部分则不参与学习重建,而是采用双三线性插值的方法进行重建.实验分析表明,图像的重建质量有所提升,同时减少了一定的重建时间,提升了算法的运行速度.与现有算法比较,在视觉效果、峰值信噪比、算法运行速度等方面均获得了更好的结果.  相似文献   

12.
Many computer vision and pattern recognition algorithms are very sensitive to the choice of an appropriate distance metric. Some recent research sought to address a variant of the conventional clustering problem called semi-supervised clustering, which performs clustering in the presence of some background knowledge or supervisory information expressed as pairwise similarity or dissimilarity constraints. However, existing metric learning methods for semi-supervised clustering mostly perform global metric learning through a linear transformation. In this paper, we propose a new metric learning method that performs nonlinear transformation globally but linear transformation locally. In particular, we formulate the learning problem as an optimization problem and present three methods for solving it. Through some toy data sets, we show empirically that our locally linear metric adaptation (LLMA) method can handle some difficult cases that cannot be handled satisfactorily by previous methods. We also demonstrate the effectiveness of our method on some UCI data sets. Besides applying LLMA to semi-supervised clustering, we have also used it to improve the performance of content-based image retrieval systems through metric learning. Experimental results based on two real-world image databases show that LLMA significantly outperforms other methods in boosting the image retrieval performance.  相似文献   

13.
The problem of clustering with side information has received much recent attention and metric learning has been considered as a powerful approach to this problem. Until now, various metric learning methods have been proposed for semi-supervised clustering. Although some of the existing methods can use both positive (must-link) and negative (cannot-link) constraints, they are usually limited to learning a linear transformation (i.e., finding a global Mahalanobis metric). In this paper, we propose a framework for learning linear and non-linear transformations efficiently. We use both positive and negative constraints and also the intrinsic topological structure of data. We formulate our metric learning method as an appropriate optimization problem and find the global optimum of this problem. The proposed non-linear method can be considered as an efficient kernel learning method that yields an explicit non-linear transformation and thus shows out-of-sample generalization ability. Experimental results on synthetic and real-world data sets show the effectiveness of our metric learning method for semi-supervised clustering tasks.  相似文献   

14.
Semisupervised learning from different information sources   总被引:2,自引:1,他引:1  
This paper studies the use of a semisupervised learning algorithm from different information sources. We first offer a theoretical explanation as to why minimising the disagreement between individual models could lead to the performance improvement. Based on the observation, this paper proposes a semisupervised learning approach that attempts to minimise this disagreement by employing a co-updating method and making use of both labeled and unlabeled data. Three experiments to test the effectiveness of the approach are presented in this paper: (i) webpage classification from both content and hyperlinks; (ii) functional classification of gene using gene expression data and phylogenetic data and (iii) machine self-maintaining from both sensory and image data. The results show the effectiveness and efficiency of our approach and suggest its application potentials.  相似文献   

15.
Low-rank representations have received a lot of interest in the application of kernel-based methods. However, these methods made an assumption that the spectrum of the Gaussian or polynomial kernels decays rapidly. This is not always true and its violation may result in performance degradation. In this paper, we propose an effective technique for learning low-rank Mercer kernels (LMK) with fast-decaying spectrum. What distinguishes our kernels from other classical kernels (Gaussian and polynomial kernels) is that the proposed always yields low-rank Gram matrices whose spectrum decays rapidly, no matter what distribution the data are. Furthermore, the LMK can control the decay rate. Thus, our kernels can prevent performance degradation while using the low-rank approximations. Our algorithm has favorable in scalability—it is linear in the number of data points and quadratic in the rank of the Gram matrix. Empirical results demonstrate that the proposed method learns fast-decaying spectrum and significantly improves the performance.  相似文献   

16.
Recently, graph-based semisupervised learning methods have been widely applied in multimedia research area. However, for the application of video semantic annotation in multilabel setting, these methods neglect an important characteristic of video data: The semantic concepts appear correlatively and interact naturally with each other rather than exist in isolation. In this paper, we adapt this semantic correlation into graph-based semisupervised learning and propose a novel method named correlative linear neighborhood propagation to improve annotation performance. Experiments conducted on the Text REtrieval Conference VIDeo retrieval evaluation data set have demonstrated its effectiveness and efficiency.   相似文献   

17.
针对训练样本和测试样本均受到严重的噪声污染的人脸识别问题,传统的子空间学习方法和经典的基于稀疏表示的分类(SRC)方法的识别性能都将急剧下降。另外,基于稀疏表示的方法也存在算法复杂度较高的问题。为了在一定程度上缓解上述问题,提出一种基于判别低秩矩阵恢复和协同表示的遮挡人脸识别方法。首先,低秩矩阵恢复可以有效地从被污损的训练样本中恢复出干净的、具备低秩结构的训练样本,而结构非相关性约束的引入可以有效提高恢复数据的鉴别能力。然后,通过学习原始污损数据与恢复出的低秩数据之间的低秩投影矩阵,将受污损的测试样本投影到相应的低维子空间,以修正污损测试样本。最后,利用协同表示的分类方法(CRC)对修正后的测试样本进行分类,获取最终的识别结果。在Extended Yale B和AR数据库上的实验结果表明,本文方法对遮挡人脸识别具有更好的识别性能。  相似文献   

18.
We introduce new inductive, generative semisupervised mixtures with more finely grained class label generation mechanisms than in previous work. Our models combine advantages of semisupervised mixtures, which achieve label extrapolation over a component, and nearest-neighbor (NN)/nearest-prototype (NP) classification, which achieve accurate classification in the vicinity of labeled samples or prototypes. For our NN-based method, we propose a novel two-stage stochastic data generation, with all samples first generated using a standard finite mixture and then all class labels generated, conditioned on the samples and their components of origin. This mechanism entails an underlying Markov random field, specific to each mixture component or cluster. We invoke the pseudo-likelihood formulation, which forms the basis for an approximate generalized expectation-maximization model learning algorithm. Our NP-based model overcomes a problem with the NN-based model that manifests at very low labeled fractions. Both models are advantageous when within-component class proportions are not constant over the feature space region "owned by" a component. The practicality of this scenario is borne out by experiments on UC Irvine data sets, which demonstrate significant gains in classification accuracy over previous semisupervised mixtures and also overall gains, over KNN classification. Moreover, for very small labeled fractions, our methods overall outperform supervised linear and nonlinear kernel support vector machines.  相似文献   

19.
“半监督学习”方法,利用已经标注好的训练样本和无标注的训练样本一起训练分类器。在标准SVM分类器训练方法中融入这种思想,给分类面附近加入混合数据,提出了一种新的基于SVM的分类器设计方法,并将这种方法应用于小样本数据的分类问题中。实验表明,新的基于SVM的分类器与传统SVM相比较,在分类准确率上有很大提高,同时偏差有所降低。  相似文献   

20.
Manually labeled data-sets are vital to graph-based semisupervised learning. However, in the real world, labeled data-sets are often heavily imbalanced, and the classifiers trained on such skewed data tend to show poor performance for low-frequency classes. In this paper, we deal with an imbalanced data case of semisupervised learning and propose a novel label matrix normalization solution called LMN to tackle the general imbalance problem. Experiments over different data-sets reveal the effectiveness of the devised algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号