共查询到17条相似文献,搜索用时 250 毫秒
1.
半监督聚类是近年来研究的热点,传统的方法是在无监督算法的基础上加入有限的背景知识来提高聚类性能.然而大多数半监督聚类技术都基于邻近或密度,难以处理高维数据,因此必须将约减的特征加入到半监督聚类过程中.为解决此问题,提出了一种新的半监督聚类算法框架.该算法利用样本约束传递性进行预处理,然后将特征投影到低维空间实现降维,最终用半监督算法对约减后的样本进行聚类.通过实验同现行主要降维方法进行了比较,说明此方法能有效地处理高维数据,聚类效果良好. 相似文献
2.
为克服边界Fisher判别分析(MFA)只利用少量有标记样本和构建邻域不能充分反映流形学习对邻域要求的缺点,提出一种基于局部线性结构的自适应邻域选择半监督判别分析的算法。采用自适应算法扩大或者缩小近邻系数k来构建邻域以保持局部线性结构。MFA通过少量有类别标签样本进行降维的同时UDP对大量无标签样本进行学习,以半监督的方法对高维人脸数据进行维数约减。最后,在ORL和YALE人脸数据库通过实验结果验证了该算法的有效性。 相似文献
3.
现实世界中高维数据无处不在,然而在高维数据中往往存在大量的冗余和噪声信息,这导致很多传统聚类算法在对高维数据聚类时不能获得很好的性能.实践中发现高维数据的类簇结构往往嵌入在较低维的子空间中.因而,降维成为挖掘高维数据类簇结构的关键技术.在众多降维方法中,基于图的降维方法是研究的热点.然而,大部分基于图的降维算法存在以下两个问题:(1)需要计算或者学习邻接图,计算复杂度高;(2)降维的过程中没有考虑降维后的用途.针对这两个问题,提出一种基于极大熵的快速无监督降维算法MEDR. MEDR算法融合线性投影和极大熵聚类模型,通过一种有效的迭代优化算法寻找高维数据嵌入在低维子空间的潜在最优类簇结构. MEDR算法不需事先输入邻接图,具有样本个数的线性时间复杂度.在真实数据集上的实验结果表明,与传统的降维方法相比, MEDR算法能够找到更好地将高维数据投影到低维子空间的投影矩阵,使投影后的数据有利于聚类. 相似文献
4.
与无监督聚类相比,半监督聚类是利用一部分先验信息来更好地挖掘和理解数据的内在结构,并紧密遵从用户的偏好。现有的典型半监督聚类算法仅仅适合于低维数据,文中提出一种新颖的基于判别分析的半监督聚类算法来解决高维数据聚类问题。新算法首先使用主成分分析来投影高维数据,进一步在投影空间中,使用基于球形K均值聚类算法对数据聚类;然后利用聚类结果,使用线性判别分析降维输入空间数据;最后在投影空间中对数据再次聚类。在一组真实数据集上的实验表明,所提出的算法不仅可以有效地处理高维数据,还提高了聚类性能。 相似文献
5.
针对人脸识别中的非线性特征提取和有标记样本不足问题,提出了在核空间具有正交性半监督鉴别矢量的计算方法。算法利用核函数将人脸数据映射到高维非线性空间,在该空间采用边界Fisher判别分析(Marginal Fisher Analysis,MFA)算法将少量有类别标签样本进行降维,同时采用无监督鉴别投影(Unsupervised Discriminant Projection,UDP)对大量无标签样本进行学习,以半监督的方法构造算法的目标函数,在特征值求解时以正交方式找出最优投影向量,进行人脸识别。通过实验,在ORL和YALE人脸数据库上验证了该算法的有效性。 相似文献
6.
基于成对约束的判别型半监督聚类分析 总被引:10,自引:1,他引:9
现有一些典型的半监督聚类方法一方面难以有效地解决成对约束的违反问题,另一方面未能同时处理高维数据.通过提出一种基于成对约束的判别型半监督聚类分析方法来同时解决上述问题.该方法有效地利用了监督信息集成数据降维和聚类,即在投影空间中使用基于成对约束的K均值算法对数据聚类,再利用聚类结果选择投影空间.同时,该算法降低了基于约束的半监督聚类算法的计算复杂度,并解决了聚类过程中成对约束的违反问题.在一组真实数据集上的实验结果表明,与现有相关半监督聚类算法相比,新方法不仅能够处理高维数据,还有效地提高了聚类性能. 相似文献
7.
8.
9.
10.
11.
Most existing semi-supervised clustering algorithms are not designed for handling high-dimensional data. On the other hand, semi-supervised dimensionality reduction methods may not necessarily improve the clustering performance, due to the fact that the inherent relationship between subspace selection and clustering is ignored. In order to mitigate the above problems, we present a semi-supervised clustering algorithm using adaptive distance metric learning (SCADM) which performs semi-supervised clustering and distance metric learning simultaneously. SCADM applies the clustering results to learn a distance metric and then projects the data onto a low-dimensional space where the separability of the data is maximized. Experimental results on real-world data sets show that the proposed method can effectively deal with high-dimensional data and provides an appealing clustering performance. 相似文献
12.
处理高维复杂数据的聚类问题,通常需先降维后聚类,但常用的降维方法未考虑数据的同类聚集性和样本间相关关系,难以保证降维方法与聚类算法相匹配,从而导致聚类信息损失.非线性无监督降维方法极限学习机自编码器(Ex-treme learning machine,ELM-AE)因其学习速度快、泛化性能好,近年来被广泛应用于降维及去... 相似文献
13.
Mining Projected Clusters in High-Dimensional Spaces 总被引:1,自引:0,他引:1
Bouguessa Mohamed Wang Shengrui 《Knowledge and Data Engineering, IEEE Transactions on》2009,21(4):507-522
Clustering high-dimensional data has been a major challenge due to the inherent sparsity of the points. Most existing clustering algorithms become substantially inefficient if the required similarity measure is computed between data points in the full-dimensional space. To address this problem, a number of projected clustering algorithms have been proposed. However, most of them encounter difficulties when clusters hide in subspaces with very low dimensionality. These challenges motivate our effort to propose a robust partitional distance-based projected clustering algorithm. The algorithm consists of three phases. The first phase performs attribute relevance analysis by detecting dense and sparse regions and their location in each attribute. Starting from the results of the first phase, the goal of the second phase is to eliminate outliers, while the third phase aims to discover clusters in different subspaces. The clustering process is based on the K-means algorithm, with the computation of distance restricted to subsets of attributes where object values are dense. Our algorithm is capable of detecting projected clusters of low dimensionality embedded in a high-dimensional space and avoids the computation of the distance in the full-dimensional space. The suitability of our proposal has been demonstrated through an empirical study using synthetic and real datasets. 相似文献
14.
15.
16.
MicroRNA(miRNA)是一类在生物体内发挥重要调控作用的非编码小RNA,对miRNA的预测有助于研究和理解其生物学功能。已经提出的基于成对约束的降维算法(local semi-supervised linear discriminant analysis,LSLDA)在对miRNA降维的同时,也能保持数据的局部结构信息和判别能力,可有效改进miRNA的预测性能。因此,在LSLDA算法基础上,提出了一种新的集成LSLDA算法(ensemble of local semi-supervised linear discriminant analysis,En-LSLDA)。该算法对不同约束个数下的分类结果进行集成,以集成结果作为最后的分类结果,以此进一步改进miRNA的预测性能。miRNA数据集上的实验结果表明,En-LSLDA算法是有效可行的。同时,UCI数据集上的实验结果也验证了新提出的集成方法同样适用于其他数据集。 相似文献
17.
As we all know, a well-designed graph tends to result in good performance for graph-based semi-supervised learning. Although most graph-based semi-supervised dimensionality reduction approaches perform very well on clean data sets, they usually cannot construct a faithful graph which plays an important role in getting a good performance, when performing on the high dimensional, sparse or noisy data. So this will generally lead to a dramatic performance degradation. To deal with these issues, this paper proposes a feasible strategy called relative semi-supervised dimensionality reduction (RSSDR) by utilizing the perceptual relativity to semi-supervised dimensionality reduction. In RSSDR, firstly, relative transformation will be performed over the training samples to build the relative space. It should be indicated that relative transformation improves the distinguishing ability among data points and diminishes the impact of noise on semi-supervised dimensionality reduction. Secondly, the edge weights of neighborhood graph will be determined through minimizing the local reconstruction error in the relative space such that it can preserve the global geometric structure as well as the local one of the data. Extensive experiments on face, UCI, gene expression, artificial and noisy data sets have been provided to validate the feasibility and effectiveness of the proposed algorithm with the promising results both in classification accuracy and robustness. 相似文献