首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In practice, many applications require a dimensionality reduction method to deal with the partially labeled problem. In this paper, we propose a semi-supervised dimensionality reduction framework, which can efficiently handle the unlabeled data. Under the framework, several classical methods, such as principal component analysis (PCA), linear discriminant analysis (LDA), maximum margin criterion (MMC), locality preserving projections (LPP) and their corresponding kernel versions can be seen as special cases. For high-dimensional data, we can give a low-dimensional embedding result for both discriminating multi-class sub-manifolds and preserving local manifold structure. Experiments show that our algorithms can significantly improve the accuracy rates of the corresponding supervised and unsupervised approaches.  相似文献   

2.
Traditional supervised classifiers use only labeled data (features/label pairs) as the training set, while the unlabeled data is used as the testing set. In practice, it is often the case that the labeled data is hard to obtain and the unlabeled data contains the instances that belong to the predefined class but not the labeled data categories. This problem has been widely studied in recent years and the semi-supervised PU learning is an efficient solution to learn from positive and unlabeled examples. Among all the semi-supervised PU learning methods, it is hard to choose just one approach to fit all unlabeled data distribution. In this paper, a new framework is designed to integrate different semi-supervised PU learning algorithms in order to take advantage of existing methods. In essence, we propose an automatic KL-divergence learning method by utilizing the knowledge of unlabeled data distribution. Meanwhile, the experimental results show that (1) data distribution information is very helpful for the semi-supervised PU learning method; (2) the proposed framework can achieve higher precision when compared with the state-of-the-art method.  相似文献   

3.
Multiple view data, together with some domain knowledge in the form of pairwise constraints, arise in various data mining applications. How to learn a hidden consensus pattern in the low dimensional space is a challenging problem. In this paper, we propose a new method for multiple view semi-supervised dimensionality reduction. The pairwise constraints are used to derive embedding in each view and simultaneously, the linear transformation is introduced to make different embeddings from different pattern spaces comparable. Hence, the consensus pattern can be learned from multiple embeddings of multiple representations. We derive an iterating algorithm to solve the above problem. Some theoretical analyses and out-of-sample extensions are also provided. Promising experiments on various data sets, together with some important discussions, are also presented to demonstrate the effectiveness of the proposed algorithm.  相似文献   

4.
We provide evidence that nonlinear dimensionality reduction, clustering, and data set parameterization can be solved within one and the same framework. The main idea is to define a system of coordinates with an explicit metric that reflects the connectivity of a given data set and that is robust to noise. Our construction, which is based on a Markov random walk on the data, offers a general scheme of simultaneously reorganizing and subsampling graphs and arbitrarily shaped data sets in high dimensions using intrinsic geometry. We show that clustering in embedding spaces is equivalent to compressing operators. The objective of data partitioning and clustering is to coarse-grain the random walk on the data while at the same time preserving a diffusion operator for the intrinsic geometry or connectivity of the data set up to some accuracy. We show that the quantization distortion in diffusion space bounds the error of compression of the operator, thus giving a rigorous justification for k-means clustering in diffusion space and a precise measure of the performance of general clustering algorithms.  相似文献   

5.
Figure 8 of this article shows YaleB and CMU PIE with incorrect legend titles:YaleB(Tr=1900,Te=514,NOC=100)should be YaleB(Tr=1900,Te=514,d=100)(Fig.8(a));TIE(Tr=1200,Te=2880,d=100)should be PIE(Tr=1200,Te=2880,d=100)(Fig.8(b)).In Fig.9,the legend keys and the legend texts are mismatched.The correct figure is ilustrated as follows.  相似文献   

6.
A well-designed graph plays a fundamental role in graph-based semi-supervised learning; however, the topological structure of a constructed neighborhood is unstable in most current approaches, since they are very sensitive to the high dimensional, sparse and noisy data. This generally leads to dramatic performance degradation. To deal with this issue, we developed a relative manifold based semisupervised dimensionality reduction (RMSSDR) approach by utilizing the relative manifold to construct a better neighborhood graph with fewer short-circuit edges. Based on the relative cognitive law and manifold distance, a relative transformation is used to construct the relative space and the relative manifold. A relative transformation can improve the ability to distinguish between data points and reduce the impact of noise such that it may be more intuitive, and the relative manifold can more truly reflect the manifold structure since data sets commonly exist in a nonlinear structure. Specifically, RMSSDR makes full use of pairwise constraints that can define the edge weights of the neighborhood graph by minimizing the local reconstruction error and can preserve the global and local geometric structures of the data set. The experimental results on face data sets demonstrate that RMSSDR is better than the current state of the art comparing methods in both performance of classification and robustness.  相似文献   

7.
As we all know, a well-designed graph tends to result in good performance for graph-based semi-supervised learning. Although most graph-based semi-supervised dimensionality reduction approaches perform very well on clean data sets, they usually cannot construct a faithful graph which plays an important role in getting a good performance, when performing on the high dimensional, sparse or noisy data. So this will generally lead to a dramatic performance degradation. To deal with these issues, this paper proposes a feasible strategy called relative semi-supervised dimensionality reduction (RSSDR) by utilizing the perceptual relativity to semi-supervised dimensionality reduction. In RSSDR, firstly, relative transformation will be performed over the training samples to build the relative space. It should be indicated that relative transformation improves the distinguishing ability among data points and diminishes the impact of noise on semi-supervised dimensionality reduction. Secondly, the edge weights of neighborhood graph will be determined through minimizing the local reconstruction error in the relative space such that it can preserve the global geometric structure as well as the local one of the data. Extensive experiments on face, UCI, gene expression, artificial and noisy data sets have been provided to validate the feasibility and effectiveness of the proposed algorithm with the promising results both in classification accuracy and robustness.  相似文献   

8.
Graph structure is crucial to graph based dimensionality reduction. A mixture graph based semi-supervised dimensionality reduction (MGSSDR) method with pairwise constraints is proposed. MGSSDR first constructs multiple diverse graphs on different random subspaces of dataset, then it combines these graphs into a mixture graph and does dimensionality reduction on this mixture graph. MGSSDR can preserve the pairwise constraints and local structure of samples in the reduced subspace. Meanwhile, it is robust to noise and neighborhood size. Experimental results on facial images feature extraction demonstrate its effectiveness.  相似文献   

9.
The high-dimensional data is frequently encountered and processed in real-world applications and unlabeled samples are readily available, but labeled or pairwise constrained ones are fairly expensive to capture. Traditionally, when a pattern itself is an n 1?×?n 2 image, the image first has to be vectorized to the vector pattern in $ \Re^{{n_{1} \times n_{2} }} $ by concatenating its pixels. However, such a vector representation fails to take into account the spatial locality of pixels in the images, which are intrinsically matrices. In this paper, we propose a tensor subspace learning-based semi-supervised dimensionality reduction algorithm (TS2DR), in which an image is naturally represented as a second-order tensor in $ \Re^{{n_{1} }} \otimes \Re^{{n_{2} }} $ and domain knowledge in the forms of pairwise similarity and dissimilarity constraints is used to specify whether pairs of instances belong to the same class or different classes. TS2DR has an analytic form of the global structure preserving embedding transformation, which can be easily computed based on eigen-decomposition. We also verify the efficiency of TS2DR by conducting unbalanced data classification experiments based on the benchmark real-word databases. Numerical results show that TS2DR tends to capture the intrinsic structure characteristics of the given data and achieves better classification accuracy, while being much more efficient.  相似文献   

10.
针对现有的半监督降维算法没有考虑存在于数据集中的大量未标记信息,不能得到最好的降维效果的问题。本文提出了一种改进的基于权值的局部保持半监督降维算法。该算法在保持正、负约束信息的同时,还利用距离权值来保持数据集所在的局部结构,从而提高降维效果。在UCI数据集上的实验表明,该算法能够提高降维的效果,尤其是在数据分布特性不满足流形结构时,仍能得到较好的聚类结果。  相似文献   

11.
Dimensionality reduction plays an important role in many machine learning tasks. This paper studies semi-supervised dimensionality reduction using pairwise constraints. In this setting, domain knowledge is given in the form of pairwise constraint, which specifies whether a pair of instances belongs to the same class (must-link constraint) or different classes (cannot-link constraint). In this paper, a novel semi-supervised dimensionality reduction method called LGS3DR is proposed, which can integrate both local and global topological structures of the data as well as pairwise constraints. The LGS3DR method is effective and has a closed form solution. Experiments on data visualization and face recognition show that LGS3DR is superior to many existing dimensionality reduction methods.  相似文献   

12.
半监督维数约简是指借助于辅助信息与大量无标记样本信息从高维数据空间找到一个最优低维判别空间,便于后续的分类或聚类操作,它被看作是理解基因序列、文本与人脸图像等高维数据的有效方法。提出一个基于成对约束的半监督维数约简一般框架(SSPC)。该方法首先通过使用成对约束和无标号样本的内在几何结构学习一个判别邻接矩阵;其次,新方法应用学到的投影将原来高维空间中的数据映射到低维空间中,以至于聚类内的样本之间距离变得更加紧凑,而不同聚类间的样本之间距离变得尽可能得远。所提出的算法不仅能找到一个最佳的线性判别子空间,还可以揭示流形数据的非线性结构。在一些真实数据集上的实验结果表明,新方法的性能优于当前主流基于成对约束的维数约简算法的性能。  相似文献   

13.
考虑到已有的半监督维数约减方法在利用边信息时将所有边信息等同,不能充分挖掘边所含信息,提出加权成对约束半监督局部维数约减算法(WSLDR).通过构建近邻图对边信息进行扩充,使边信息数量有所增加.另外,根据边所含信息量的不同构建边的权系数矩阵.将边信息融入近邻图对其进行修正,对修正后的近邻图和加权的成对约束寻找最优投影.算法不仅保持了数据的内在局部几何结构,而且使得类内数据分布更加紧密,类间数据分布更加分散.在UCI数据集上的实验结果验证了该算法的有效性.  相似文献   

14.
为了有效地在半监督多视图情景下进行维数约简,提出了使用非负低秩图进行标签传播的半监督典型相关分析方法。非负低秩图捕获的全局线性近邻可以利用直接邻居和间接可达邻居的信息维持全局簇结构,同时,低秩的性质可以保持图的压缩表示。当无标签样本通过标签传播算法获得估计的标签信息后,在每个视图上构建软标签矩阵和概率类内散度矩阵。然后,通过最大化不同视图同类样本间相关性的同时最小化每个视图低维特征空间类内变化来提升特征鉴别能力。实验表明所提方法比已有相关方法能够取得更好的识别性能且更鲁棒。  相似文献   

15.
《Pattern recognition》2014,47(2):758-768
Sentiment analysis, which detects the subjectivity or polarity of documents, is one of the fundamental tasks in text data analytics. Recently, the number of documents available online and offline is increasing dramatically, and preprocessed text data have more features. This development makes analysis more complex to be analyzed effectively. This paper proposes a novel semi-supervised Laplacian eigenmap (SS-LE). The SS-LE removes redundant features effectively by decreasing detection errors of sentiments. Moreover, it enables visualization of documents in perceptible low dimensional embedded space to provide a useful tool for text analytics. The proposed method is evaluated using multi-domain review data set in sentiment visualization and classification by comparing other dimensionality reduction methods. SS-LE provides a better similarity measure in the visualization result by separating positive and negative documents properly. Sentiment classification models trained over reduced data by SS-LE show higher accuracy. Overall, experimental results suggest that SS-LE has the potential to be used to visualize documents for the ease of analysis and to train a predictive model in sentiment analysis. SS-LE can also be applied to any other partially annotated text data sets.  相似文献   

16.
17.
Dealing with high-dimensional data has always been a major problem in many pattern recognition and machine learning applications. Trace ratio criterion is a criterion that can be applicable to many dimensionality reduction methods as it directly reflects Euclidean distance between data points of within or between classes. In this paper, we analyze the trace ratio problem and propose a new efficient algorithm to find the optimal solution. Based on the proposed algorithm, we are able to derive an orthogonal constrained semi-supervised learning framework. The new algorithm incorporates unlabeled data into training procedure so that it is able to preserve the discriminative structure as well as geometrical structure embedded in the original dataset. Under such a framework, many existing semi-supervised dimensionality reduction methods such as SDA, Lap-LDA, SSDR, SSMMC, can be improved using our proposed framework, which can also be used to formulate a corresponding kernel framework for handling nonlinear problems. Theoretical analysis indicates that there are certain relationships between linear and nonlinear methods. Finally, extensive simulations on synthetic dataset and real world dataset are presented to show the effectiveness of our algorithms. The results demonstrate that our proposed algorithm can achieve great superiority to other state-of-art algorithms.  相似文献   

18.
Image and video classification tasks often suffer from the problem of high-dimensional feature space. How to discover the meaningful, low-dimensional representations of such high-order, high-dimensional observations remains a fundamental challenge. In this paper, we present a unified framework for tensor based dimensionality reduction including a new tensor distance (TD) metric and a novel multilinear globality preserving embedding (MGPE) strategy. Different with the traditional Euclidean distance, which is constrained by orthogonality assumption, TD measures the distance between data points by considering the relationships among different coordinates of high-order data. To preserve the natural tensor structure in low-dimensional space, MGPE directly works on the high-order form of input data and employs an iterative strategy to learn the transformation matrices. To provide faithful global representation for datasets, MGPE intends to preserve the distances between all pairs of data points. According to the proposed TD metric and MGPE strategy, we further derive two algorithms dubbed tensor distance based multilinear multidimensional scaling (TD-MMDS) and tensor distance based multilinear isometric embedding (TD-MIE). TD-MMDS finds the transformation matrices by keeping the TDs between all pairs of input data in the embedded space, while TD-MIE intends to preserve all pairwise distances calculated according to TDs along shortest paths in the neighborhood graph. By integrating tensor distance into tensor based embedding, TD-MMDS and TD-MIE perform tensor based dimensionality reduction through the whole learning procedure and achieve obvious performance improvement on various standard datasets.  相似文献   

19.
Mnica  Daniel 《Pattern recognition》2005,38(12):2400-2408
An important objective in image analysis is dimensionality reduction. The most often used data-exploratory technique with this objective is principal component analysis, which performs a singular value decomposition on a data matrix of vectorized images. When considering an array data or tensor instead of a matrix, the high-order generalization of PCA for computing principal components offers multiple ways to decompose tensors orthogonally. As an alternative, we propose a new method based on the projection of the images as matrices and show that it leads to a better reconstruction of images than previous approaches.  相似文献   

20.
尹宝才    张超辉  胡永利    孙艳丰    王博岳   《智能系统学报》2021,16(5):963-970
随着监控摄像头的普及和数据采集技术的快速发展,多视数据呈现出规模大、维度高和多源异构的特点,使得数据存储空间大、传输慢、算法复杂度高,造成“有数据、难利用”的困境。到目前为止,国内外在多视降维方面的研究还比较少。针对这一问题,本文提出一种基于图嵌入的自适应多视降维方法。该方法在考虑视角内降维后数据重构原始高维数据的基础上,提出自适应学习相似矩阵来探索不同视角之间降维后数据的关联关系,学习各视数据的正交投影矩阵实现多视降维任务。本文在多个数据集上对降维后的多视数据进行了聚类/识别实验验证,实验结果表明基于图嵌入的自适应多视降维方法优于其他降维方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号