首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
半监督维数约简是指借助于辅助信息与大量无标记样本信息从高维数据空间找到一个最优低维判别空间,便于后续的分类或聚类操作,它被看作是理解基因序列、文本与人脸图像等高维数据的有效方法。提出一个基于成对约束的半监督维数约简一般框架(SSPC)。该方法首先通过使用成对约束和无标号样本的内在几何结构学习一个判别邻接矩阵;其次,新方法应用学到的投影将原来高维空间中的数据映射到低维空间中,以至于聚类内的样本之间距离变得更加紧凑,而不同聚类间的样本之间距离变得尽可能得远。所提出的算法不仅能找到一个最佳的线性判别子空间,还可以揭示流形数据的非线性结构。在一些真实数据集上的实验结果表明,新方法的性能优于当前主流基于成对约束的维数约简算法的性能。  相似文献   

2.
Graph-based dimensionality reduction (DR) methods play an increasingly important role in many machine learning and pattern recognition applications. In this paper, we propose a novel graph-based learning scheme to conduct Graph Optimization for Dimensionality Reduction with Sparsity Constraints (GODRSC). Different from most of graph-based DR methods where graphs are generally constructed in advance, GODRSC aims to simultaneously seek a graph and a projection matrix preserving such a graph in one unified framework, resulting in an automatically updated graph. Moreover, by applying an l1 regularizer, a sparse graph is achieved, which models the “locality” structure of data and contains natural discriminating information. Finally, extensive experiments on several publicly available UCI and face databases verify the feasibility and effectiveness of the proposed method.  相似文献   

3.
In practice, many applications require a dimensionality reduction method to deal with the partially labeled problem. In this paper, we propose a semi-supervised dimensionality reduction framework, which can efficiently handle the unlabeled data. Under the framework, several classical methods, such as principal component analysis (PCA), linear discriminant analysis (LDA), maximum margin criterion (MMC), locality preserving projections (LPP) and their corresponding kernel versions can be seen as special cases. For high-dimensional data, we can give a low-dimensional embedding result for both discriminating multi-class sub-manifolds and preserving local manifold structure. Experiments show that our algorithms can significantly improve the accuracy rates of the corresponding supervised and unsupervised approaches.  相似文献   

4.
Image and video classification tasks often suffer from the problem of high-dimensional feature space. How to discover the meaningful, low-dimensional representations of such high-order, high-dimensional observations remains a fundamental challenge. In this paper, we present a unified framework for tensor based dimensionality reduction including a new tensor distance (TD) metric and a novel multilinear globality preserving embedding (MGPE) strategy. Different with the traditional Euclidean distance, which is constrained by orthogonality assumption, TD measures the distance between data points by considering the relationships among different coordinates of high-order data. To preserve the natural tensor structure in low-dimensional space, MGPE directly works on the high-order form of input data and employs an iterative strategy to learn the transformation matrices. To provide faithful global representation for datasets, MGPE intends to preserve the distances between all pairs of data points. According to the proposed TD metric and MGPE strategy, we further derive two algorithms dubbed tensor distance based multilinear multidimensional scaling (TD-MMDS) and tensor distance based multilinear isometric embedding (TD-MIE). TD-MMDS finds the transformation matrices by keeping the TDs between all pairs of input data in the embedded space, while TD-MIE intends to preserve all pairwise distances calculated according to TDs along shortest paths in the neighborhood graph. By integrating tensor distance into tensor based embedding, TD-MMDS and TD-MIE perform tensor based dimensionality reduction through the whole learning procedure and achieve obvious performance improvement on various standard datasets.  相似文献   

5.
A Constrained large Margin Local Projection (CMLP) technique for multimodal dimensionality reduction is proposed. We elaborate the criterion of CMLP from a pairwise constrained marginal perspective. Four effective CMLP solution schemes are presented and the corresponding comparative analyses are given. An equivalent weighted least squares formulation for CMLP is also detailed. CMLP is originated from the criterion of Locality Preserving Projections (LPP), but CMLP offers a number of attractive advantages over LPP. To keep the intrinsic proximity relations of inter-class and intra-class similarity pairs, the localized pairwise Cannot-Link and Must-Link constraints are applied to specify the types of those neighboring pairs. By utilizing the CMLP criterion, margins between inter- and intra-class clusters are significantly enlarged. As a result, multimodal distributions are effectively preserved. To further optimize the CMLP criterion, one feasible improvement strategy is described. With kernel methods, we present the kernelized extensions of our approaches. Mathematical comparisons and analyses between this work and the related works are also detailed. Extensive simulations including multivariate manifold visualization and classification on the benchmark UCL, ORL, YALE, UMIST, MIT CBCL and USPS datasets are conducted to verify the efficiency of our techniques. The presented results reveal that our methods are highly competitive with and even outperform some widely used state-of-the-art algorithms.  相似文献   

6.
Dimensionality reduction is a great challenge in high dimensional unlabelled data processing. The existing dimensionality reduction methods are prone to employing similarity matrix and spectral clustering algorithm. However, the noises in original data always make the similarity matrix unreliable and degrade the clustering performance. Besides, existing spectral clustering methods just focus on the local structures and ignore the global discriminative information, which may lead to overfitting in some cases. To address these issues, a novel unsupervised 2-dimensional dimensionality reduction method is proposed in this paper, which incorporates the similarity matrix learning and global discriminant information into the procedure of dimensionality reduction. Particularly, the number of the connected components in the learned similarity matrix is equal to cluster number. We compare the proposed method with several 2-dimensional unsupervised dimensionality reduction methods and evaluate the clustering performance by K-means on several benchmark data sets. The experimental results show that the proposed method outperforms the state-of-the-art methods.  相似文献   

7.
Stable orthogonal local discriminant embedding (SOLDE) is a recently proposed dimensionality reduction method, in which the similarity, diversity and interclass separability of the data samples are well utilized to obtain a set of orthogonal projection vectors. By combining multiple features of data, it outperforms many prevalent dimensionality reduction methods. However, the orthogonal projection vectors are obtained by a step-by-step procedure, which makes it computationally expensive. By generalizing the objective function of the SOLDE to a trace ratio problem, we propose a stable and orthogonal local discriminant embedding using trace ratio criterion (SOLDE-TR) for dimensionality reduction. An iterative procedure is provided to solve the trace ratio problem, due to which the SOLDE-TR method is always faster than the SOLDE. The projection vectors of the SOLDE-TR will always converge to a global solution, and the performances are always better than that of the SOLDE. Experimental results on two public image databases demonstrate the effectiveness and advantages of the proposed method.  相似文献   

8.
尹宝才    张超辉  胡永利    孙艳丰    王博岳   《智能系统学报》2021,16(5):963-970
随着监控摄像头的普及和数据采集技术的快速发展,多视数据呈现出规模大、维度高和多源异构的特点,使得数据存储空间大、传输慢、算法复杂度高,造成“有数据、难利用”的困境。到目前为止,国内外在多视降维方面的研究还比较少。针对这一问题,本文提出一种基于图嵌入的自适应多视降维方法。该方法在考虑视角内降维后数据重构原始高维数据的基础上,提出自适应学习相似矩阵来探索不同视角之间降维后数据的关联关系,学习各视数据的正交投影矩阵实现多视降维任务。本文在多个数据集上对降维后的多视数据进行了聚类/识别实验验证,实验结果表明基于图嵌入的自适应多视降维方法优于其他降维方法。  相似文献   

9.
Canonical correlation analysis (CCA) is a popular and powerful dimensionality reduction method to analyze paired multi-view data. However, when facing semi-paired and semi-supervised multi-view data which widely exist in real-world problems, CCA usually performs poorly due to its requirement of data pairing between different views and un-supervision in nature. Recently, several extensions of CCA have been proposed, however, they just handle the semi-paired scenario by utilizing structure information in each view or just deal with semi-supervised scenario by incorporating the discriminant information. In this paper, we present a general dimensionality reduction framework for semi-paired and semi-supervised multi-view data which naturally generalizes existing related works by using different kinds of prior information. Based on the framework, we develop a novel dimensionality reduction method, termed as semi-paired and semi-supervised generalized correlation analysis (S2GCA). S2GCA exploits a small amount of paired data to perform CCA and at the same time, utilizes both the global structural information captured from the unlabeled data and the local discriminative information captured from the limited labeled data to compensate the limited pairedness. Consequently, S2GCA can find the directions which make not only maximal correlation between the paired data but also maximal separability of the labeled data. Experimental results on artificial and four real-world datasets show its effectiveness compared to the existing related dimensionality reduction methods.  相似文献   

10.
11.
Due to the noise disturbance and limited number of training samples, within-set and between-set sample covariance matrices in canonical correlation analysis (CCA) usually deviate from the true ones. In this paper, we re-estimate within-set and between-set covariance matrices to reduce the negative effect of this deviation. Specifically, we use the idea of fractional order to respectively correct the eigenvalues and singular values in the corresponding sample covariance matrices, and then construct fractional-order within-set and between-set scatter matrices which can obviously alleviate the problem of the deviation. On this basis, a new approach is proposed to reduce the dimensionality of multi-view data for classification tasks, called fractional-order embedding canonical correlation analysis (FECCA). The proposed method is evaluated on various handwritten numeral, face and object recognition problems. Extensive experimental results on the CENPARMI, UCI, AT&T, AR, and COIL-20 databases show that FECCA is very effective and obviously outperforms the existing joint dimensionality reduction or feature extraction methods in terms of classification accuracy. Moreover, its improvements for recognition rates are statistically significant on most cases below the significance level 0.05.  相似文献   

12.
Dimensionality reduction has many applications in pattern recognition, machine learning and computer vision. In this paper, we develop a general regularization framework for dimensionality reduction by allowing the use of different functions in the cost function. This is especially important as we can achieve robustness in the presence of outliers. It is shown that optimizing the regularized cost function is equivalent to solving a nonlinear eigenvalue problem under certain conditions, which can be handled by the self-consistent field (SCF) iteration. Moreover, this regularization framework is applicable in unsupervised or supervised learning by defining the regularization term which provides some types of prior knowledge of projected samples or projected vectors. It is also noted that some linear projection methods can be obtained from this framework by choosing different functions and imposing different constraints. Finally, we show some applications of our framework by various data sets including handwritten characters, face images, UCI data, and gene expression data.  相似文献   

13.
The high-dimensional data is frequently encountered and processed in real-world applications and unlabeled samples are readily available, but labeled or pairwise constrained ones are fairly expensive to capture. Traditionally, when a pattern itself is an n 1?×?n 2 image, the image first has to be vectorized to the vector pattern in $ \Re^{{n_{1} \times n_{2} }} $ by concatenating its pixels. However, such a vector representation fails to take into account the spatial locality of pixels in the images, which are intrinsically matrices. In this paper, we propose a tensor subspace learning-based semi-supervised dimensionality reduction algorithm (TS2DR), in which an image is naturally represented as a second-order tensor in $ \Re^{{n_{1} }} \otimes \Re^{{n_{2} }} $ and domain knowledge in the forms of pairwise similarity and dissimilarity constraints is used to specify whether pairs of instances belong to the same class or different classes. TS2DR has an analytic form of the global structure preserving embedding transformation, which can be easily computed based on eigen-decomposition. We also verify the efficiency of TS2DR by conducting unbalanced data classification experiments based on the benchmark real-word databases. Numerical results show that TS2DR tends to capture the intrinsic structure characteristics of the given data and achieves better classification accuracy, while being much more efficient.  相似文献   

14.
15.
We provide evidence that nonlinear dimensionality reduction, clustering, and data set parameterization can be solved within one and the same framework. The main idea is to define a system of coordinates with an explicit metric that reflects the connectivity of a given data set and that is robust to noise. Our construction, which is based on a Markov random walk on the data, offers a general scheme of simultaneously reorganizing and subsampling graphs and arbitrarily shaped data sets in high dimensions using intrinsic geometry. We show that clustering in embedding spaces is equivalent to compressing operators. The objective of data partitioning and clustering is to coarse-grain the random walk on the data while at the same time preserving a diffusion operator for the intrinsic geometry or connectivity of the data set up to some accuracy. We show that the quantization distortion in diffusion space bounds the error of compression of the operator, thus giving a rigorous justification for k-means clustering in diffusion space and a precise measure of the performance of general clustering algorithms.  相似文献   

16.
高维心电图数据存在大量不相关特征,基于监督机器学习技术很难同时获得较高敏感性与特异性。在预处理操作心电图数据,如校准基线漂移、去除高频噪声和拟合多项式特征的基础上,提出一种基于监督多元对应分析(MCA)降维技术的分类模型自动分类心跳。该方法离散化连续心电图数据为类属数据,并发展有监督MCA降维技术提取心电图数据关键特征,用各种分类算法自动分类心电图心跳数据。在PTB诊断数据库的心电图数据集上测试结果表明,与几种基于监督机器学习分类技术相比,在监督MCA降维框架中各种分类算法能以较高敏感性和特异性自动分类心电图心跳数据。  相似文献   

17.
Graph OLAP: a multi-dimensional framework for graph data analysis   总被引:2,自引:1,他引:1  
Databases and data warehouse systems have been evolving from handling normalized spreadsheets stored in relational databases, to managing and analyzing diverse application-oriented data with complex interconnecting structures. Responding to this emerging trend, graphs have been growing rapidly and showing their critical importance in many applications, such as the analysis of XML, social networks, Web, biological data, multimedia data and spatiotemporal data. Can we extend useful functions of databases and data warehouse systems to handle graph structured data? In particular, OLAP (On-Line Analytical Processing) has been a popular tool for fast and user-friendly multi-dimensional analysis of data warehouses. Can we OLAP graphs? Unfortunately, to our best knowledge, there are no OLAP tools available that can interactively view and analyze graph data from different perspectives and with multiple granularities. In this paper, we argue that it is critically important to OLAP graph structured data and propose a novel Graph OLAP framework. According to this framework, given a graph dataset with its nodes and edges associated with respective attributes, a multi-dimensional model can be built to enable efficient on-line analytical processing so that any portions of the graphs can be generalized/specialized dynamically, offering multiple, versatile views of the data. The contributions of this work are three-fold. First, starting from basic definitions, i.e., what are dimensions and measures in the Graph OLAP scenario, we develop a conceptual framework for data cubes on graphs. We also look into different semantics of OLAP operations, and classify the framework into two major subcases: informational OLAP and topological OLAP. Second, we show how a graph cube can be materialized by calculating a special kind of measure called aggregated graph and how to implement it efficiently. This includes both full materialization and partial materialization where constraints are enforced to obtain an iceberg cube. As we can see, due to the increased structural complexity of data, aggregated graphs that depend on the underlying “network” properties of the graph dataset are much harder to compute than their traditional OLAP counterparts. Third, to provide more flexible, interesting and informative OLAP of graphs, we further propose a discovery-driven multi-dimensional analysis model to ensure that OLAP is performed in an intelligent manner, guided by expert rules and knowledge discovery processes. We outline such a framework and discuss some challenging research issues for discovery-driven Graph OLAP.  相似文献   

18.
This paper develops a supervised discriminant technique, called graph embedding discriminant analysis (GEDA), for dimensionality reduction of high-dimensional data in small sample size problems. GEDA can be seen as a linear approximation of a multimanifold-based learning framework in which nonlocal property is taken into account besides the marginal property and local property. GEDA seeks to find a set of perfect projections that not only can impact the samples of intraclass and maximize the margin of interclass, but also can maximize the nonlocal scatter at the same time. This characteristic makes GEDA more intuitive and more powerful than linear discriminant analysis (LDA) and marginal fisher analysis (MFA). The proposed method is applied to face recognition and is examined on the Yale, ORL and AR face image databases. The experimental results show that GEDA consistently outperforms LDA and MFA when the training sample size per class is small.  相似文献   

19.
Data visualization of high-dimensional data is possible through the use of dimensionality reduction techniques. However, in deciding which dimensionality reduction techniques to use in practice, quantitative metrics are necessary for evaluating the results of the transformation and visualization of the lower dimensional embedding. In this paper, we propose a manifold visualization metric based on the pairwise correlation of the geodesic distance in a data manifold. This metric is compared with other metrics based on the Euclidean distance, Mahalanobis distance, City Block metric, Minkowski metric, cosine distance, Chebychev distance, and Spearman distance. The results of applying different dimensionality reduction techniques on various types of nonlinear manifolds are compared and discussed. Our experiments show that our proposed metric is suitable for quantitatively evaluating the results of the dimensionality reduction techniques if the data lies on an open planar nonlinear manifold. This has practical significance in the implementation of knowledge-based visualization systems and the application of knowledge-based dimensionality reduction methods.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号