首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 29 毫秒
1.
均值漂移谱聚类(MSSC)算法为模式识别聚类任务提供了一种较新的方案.然而由于其内嵌均值漂移过程的时问复杂度与样本容量呈平方关系,其在大数据集环境的实用性受到大大削弱.利用快速压缩集密度估计器(FRSDE)替代Parren窗密度估计式(PW)并融合基于图的松弛聚类(GRC)方法,提出了快速均值漂移谱聚类(FMSSC)算法.相比原MSSC,该算法的总体渐进时间复杂度与样本容量呈线性关系,并具有自适应性和便捷性.  相似文献   

2.
孙琳  张同珍 《计算机仿真》2006,23(6):279-281
该文分析了现有的对帧间特征差进行阈值比较的镜头分割方法,以及通过颜色空间和可调时间阈值进行视频聚类方法的不准确性,针对教案视频中大量文字内容体现出的特有的纹理特征,提出了使用基于灰度级共生矩阵纹理特征的C均值模糊聚类算法进行教案视频镜头分割.算法选取灰度级共生矩阵统计量之一的惯性矩作为度量帧间相似性的特征值,并根据教案视频中手写操作的特点调整特征向量,以此作为样本数据点进行模糊聚类.实验结果显示这种方法对于教案视频镜头的分割具有较好效果.  相似文献   

3.
为了改善谱聚类图像分割的精准性和时效性,文中提出融入局部几何特征的流形谱聚类图像分割算法.首先,考虑图像数据的流形结构,在数据点的K近邻域内执行局部PCA,得到数据间本征维数的关系.然后,引入流形学习中的局部线性重构技术,通过混合线性分析器得到数据间局部切空间的相似性,结合二者构造含有局部几何特征的相似性矩阵.再利用Nystr?m技术逼近待分割图像的特征向量,对构造的k个主特征向量执行谱聚类.最后,在Berkeley数据集上的对比实验验证文中算法的准确性和时效性优势.  相似文献   

4.
The leading partitional clustering technique, k-modes, is one of the most computationally efficient clustering methods for categorical data. However, in the k-modes-type algorithms, the performance of their clustering depends on initial cluster centers and the number of clusters needs be known or given in advance. This paper proposes a novel initialization method for categorical data which is implemented to the k-modes-type algorithms. The proposed method can not only obtain the good initial cluster centers but also provide a criterion to find candidates for the number of clusters. The performance and scalability of the proposed method has been studied on real data sets. The experimental results illustrate that the proposed method is effective and can be applied to large data sets for its linear time complexity with respect to the number of data points.  相似文献   

5.
The recent emergence of the discrete fractional Fourier transform has spurred research activity aiming at generating Hermite–Gaussian-like (HGL) orthonormal eigenvectors of the discrete Fourier transform (DFT) matrix F. By exploiting the unitarity of matrix F – resulting in the orthogonality of its eigenspaces pertaining to the distinct eigenvalues – the problem decouples into finding orthonormal eigenvectors for each eigenspace separately. A Direct Sequential Evaluation by constrained Optimization Algorithm (DSEOA) is contributed for the generation of optimal orthonormal eigenvectors for each eigenspace separately. This technique is direct in the sense that it does not require the generation of initial orthonormal eigenvectors as a prerequisite for obtaining the final optimal ones. The resulting eigenvectors are optimal in the sense of being as close as possible to samples of the Hermite–Gaussian functions. The technique is found to be numerically robust because total pivoting is allowed in performing the QR matrix decomposition step. The DSEOA is proved to be theoretically equivalent to each of the Gram–Schmidt algorithm (GSA) and the sequential orthogonal Procrustes algorithm (SOPA). However the three techniques are algorithmically quite distinct. An extensive comparative simulation study shows that the DSEOA is by far the most numerically robust technique among all sequential algorithms thus paying off for its relatively long computation time.  相似文献   

6.
Bagging-based spectral clustering ensemble selection   总被引:2,自引:0,他引:2  
Traditional clustering ensemble methods combine all obtained clustering results at hand. However, we can often achieve a better clustering solution if only parts of the clustering results available are combined. In this paper, we generalize the selective clustering ensemble algorithm proposed by Azimi and Fern and a novel clustering ensemble method, SELective Spectral Clustering Ensemble (SELSCE), is proposed. The component clusterings of the ensemble system are generated by spectral clustering (SC) capable of engendering diverse committees. The random scaling parameter, Nyström approximation are used to perturb SC for producing the components of the ensemble system. After the generation of component clusterings, the bagging technique, usually applied in supervised learning, is used to assess the component clustering. We randomly pick part of the available clusterings to get a consensus result and then compute normalized mutual information (NMI) or adjusted rand index (ARI) between the consensus result and the component clusterings. Finally, the components are ranked by aggregating multiple NMI or ARI values. The experimental results on UCI dataset and images demonstrate that the proposed algorithm can achieve a better result than the traditional clustering ensemble methods.  相似文献   

7.
The task of discovering natural groupings of input patterns, or clustering, is an important aspect of machine learning and pattern analysis. In this paper, we study the widely used spectral clustering algorithm which clusters data using eigenvectors of a similarity/affinity matrix derived from a data set. In particular, we aim to solve two critical issues in spectral clustering: (1) how to automatically determine the number of clusters, and (2) how to perform effective clustering given noisy and sparse data. An analysis of the characteristics of eigenspace is carried out which shows that (a) not every eigenvectors of a data affinity matrix is informative and relevant for clustering; (b) eigenvector selection is critical because using uninformative/irrelevant eigenvectors could lead to poor clustering results; and (c) the corresponding eigenvalues cannot be used for relevant eigenvector selection given a realistic data set. Motivated by the analysis, a novel spectral clustering algorithm is proposed which differs from previous approaches in that only informative/relevant eigenvectors are employed for determining the number of clusters and performing clustering. The key element of the proposed algorithm is a simple but effective relevance learning method which measures the relevance of an eigenvector according to how well it can separate the data set into different clusters. Our algorithm was evaluated using synthetic data sets as well as real-world data sets generated from two challenging visual learning problems. The results demonstrated that our algorithm is able to estimate the cluster number correctly and reveal natural grouping of the input data/patterns even given sparse and noisy data.  相似文献   

8.
目的:在分布式视频编码中,由于相关噪声残差子带不同区域具有不同的变化特性,子带级拉普拉斯估计法对整个子带用同一参数估计并不准确,为了进一步提高噪声模型估计精度,本文提出了一种基于改进的模糊C均值(FCM)聚类的模型估计方法。方法:本文算法针对每一解码子带选取不同的特征矢量;采用改进的模糊C均值进行聚类;为了避免类别样本数较少而溢出,采用阈值控制法求取相应的模型参数;然后用重建子带更新下一解码子带的特征矢量;直到一帧中所有子带解码完成。针对模糊C均值对初始聚类中心的敏感性,本文采用首先随机生成隶属度矩阵的方法来缓解聚类陷入局部最优的问题。结果:从实验效果和算法复杂度角度考虑,将残差样本聚为8类。实验结果表明,本文聚类算法可以更加准确地模拟帧内不同区域的不同信道噪声特性,对于运动越剧烈的序列效果越好,相对于子带级拉普拉斯估计,平均增益达1dB。结论:本文提出了一种新的相关噪声估计方法,针对不同的子带选取不同的特征矢量,并重建更新。实验结果表明,本文算法能更好地描述相关噪声特性,获得系统性能的提高。  相似文献   

9.
Spectral clustering is one of the most popular and important clustering methods in pattern recognition, machine learning, and data mining. However, its high computational complexity limits it in applications involving truly large-scale datasets. For a clustering problem with n samples, it needs to compute the eigenvectors of the graph Laplacian with O(n3) time complexity. To address this problem, we propose a novel method called anchor-based spectral clustering (ASC) by employing anchor points of data. Specifically, m (m ? n) anchor points are selected from the dataset, which can basically maintain the intrinsic (manifold) structure of the original data. Then a mapping matrix between the original data and the anchors is constructed. More importantly, it is proved that this data-anchor mapping matrix essentially preserves the clustering structure of the data. Based on this mapping matrix, it is easy to approximate the spectral embedding of the original data. The proposed method scales linearly relative to the size of the data but with low degradation of the clustering performance. The proposed method, ASC, is compared to the classical spectral clustering and two state-of-the-art accelerating methods, i.e., power iteration clustering and landmark-based spectral clustering, on 10 real-world applications under three evaluation metrics. Experimental results show that ASC is consistently faster than the classical spectral clustering with comparable clustering performance, and at least comparable with or better than the state-of-the-art methods on both effectiveness and efficiency.  相似文献   

10.
Density based clustering techniques like DBSCAN are attractive because it can find arbitrary shaped clusters along with noisy outliers. Its time requirement is O(n2) where n is the size of the dataset, and because of this it is not a suitable one to work with large datasets. A solution proposed in the paper is to apply the leaders clustering method first to derive the prototypes called leaders from the dataset which along with prototypes preserves the density information also, then to use these leaders to derive the density based clusters. The proposed hybrid clustering technique called rough-DBSCAN has a time complexity of O(n) only and is analyzed using rough set theory. Experimental studies are done using both synthetic and real world datasets to compare rough-DBSCAN with DBSCAN. It is shown that for large datasets rough-DBSCAN can find a similar clustering as found by the DBSCAN, but is consistently faster than DBSCAN. Also some properties of the leaders as prototypes are formally established.  相似文献   

11.
The leading partitional clustering technique, k-modes, is one of the most computationally efficient clustering methods for categorical data. However, the performance of the k-modes clustering algorithm which converges to numerous local minima strongly depends on initial cluster centers. Currently, most methods of initialization cluster centers are mainly for numerical data. Due to lack of geometry for the categorical data, these methods used in cluster centers initialization for numerical data are not applicable to categorical data. This paper proposes a novel initialization method for categorical data which is implemented to the k-modes algorithm. The method integrates the distance and the density together to select initial cluster centers and overcomes shortcomings of the existing initialization methods for categorical data. Experimental results illustrate the proposed initialization method is effective and can be applied to large data sets for its linear time complexity with respect to the number of data objects.  相似文献   

12.
Frommer and Glassner [Frommer, A. and Glassner, U., 1998, Restarted GMRES for shifted linear systems, SIAM Journal on Scientific Computing, 19, 15–26.] develop a variant of the restarted GMRES method for shifted linear systems at the expense of only one matrix–vector multiplication per iteration. However, restarting slows down the convergence, even stagnation. We present a variant of the restarted GMRES augmented with some approximate eigenvectors for the shifted systems. The convergence can be much faster at little extra expense. Numerical experiments show its efficiency.  相似文献   

13.
汤立伟  张家珲  彭勇  孔万增 《计算机应用研究》2021,38(4):1084-1087,1096
谱聚类算法存在两个不足:a)将图的构造与谱分解割裂成两个独立的阶段,导致了结果的次优性;b)常用的基于l2范数度量谱特征向量的相似性具有噪声敏感性。为了克服上述两点不足,提出基于联合结构化图学习与l1范数谱嵌入的鲁棒聚类算法(记为CLRL1)。在该算法框架下,一方面图的学习过程与聚类过程可以有效结合起来进行协同优化,另一方面l1范数的使用可以很好地约束谱特征向量的相似性以提升算法的鲁棒性。在多个常用数据集上进行的实验结果表明,改进算法聚类性能得到了明显提升。  相似文献   

14.
半监督谱聚类特征向量选择算法   总被引:7,自引:0,他引:7  
对于一个K类问题,Ng-Jordan-Weiss(NJW)谱聚类算法通常采用数据规范化亲和度矩阵的前K个最大特征值对应的特征向量作为数据的一种表示。然而,对于某些模式识别问题,这K个特征向量不一定能够体现原始数据的结构。文中提出一种半监督谱聚类特征向量选择算法。该算法利用一定量的监督信息寻找能够体现数据结构的特征向量组合,进而获得优于传统谱聚类算法的聚类性能。UCI标准数据集和MNIST手写体数据集上的仿真实验验证该算法的有效性和鲁棒性。  相似文献   

15.
基于稀疏Parzen窗密度估计的快速自适应相似度聚类方法   总被引:1,自引:1,他引:0  
相似度聚类方法(Similarity-based clustering method, SCM)因其简单易实现和具有鲁棒性而广受关注. 但由于内含相似度聚类算法 (Similarity clustering algorithm, SCA)的高时间复杂度和凝聚型层次聚类 (Agglomerative hierarchical clustering, AHC) 的高空间复杂度, SCM不适用大数据集场合. 本文首先发现了SCM和核密度估计问题的本质联系, 并以此入手, 通过快速压缩集密度估计器(Fast reduced set density estimator, FRSDE)和基于图的松弛聚类(Graph-based relaxed clustering, GRC)算法提出了快速自适应相似度聚类方法(Fast adaptive similarity-based clustering method, FASCM). 相比于原SCM, 该方法的主要优点是: 1)其总体渐近时间复杂度与样本容量呈线性关系; 2) 不依赖于人工经验的干预, 具有了自适应性. 由此, FASCM适用于大数据集环境. 该方法的有效性在图像分割应用中进行了验证.  相似文献   

16.
贾洪杰  丁世飞  史忠植 《软件学报》2015,26(11):2836-2846
谱聚类将聚类问题转化成图划分问题,是一种基于代数图论的聚类方法.在求解图划分目标函数时,一般利用Rayleigh熵的性质,通过计算Laplacian矩阵的特征向量将原始数据点映射到一个低维的特征空间中,再进行聚类.然而在谱聚类过程中,存储相似矩阵的空间复杂度是O(n2),对Laplacian矩阵特征分解的时间复杂度一般为O(n3),这样的复杂度在处理大规模数据时是无法接受的.理论证明,Normalized Cut图聚类与加权核k-means都等价于矩阵迹的最大化问题.因此,可以用加权核k-means算法来优化Normalized Cut的目标函数,这就避免了对Laplacian矩阵特征分解.不过,加权核k-means算法需要计算核矩阵,其空间复杂度依然是O(n2).为了应对这一挑战,提出近似加权核k-means算法,仅使用核矩阵的一部分来求解大数据的谱聚类问题.理论分析和实验对比表明,近似加权核k-means的聚类表现与加权核k-means算法是相似的,但是极大地减小了时间和空间复杂性.  相似文献   

17.
Spectral clustering techniques are heuristic algorithms aiming to find approximate solutions to difficult graph-cutting problems, usually NP-complete, which are useful to clustering. A fundamental working hypothesis of these techniques is that the optimal partition of K classes can be obtained from the first K eigenvectors of the graph normalized Laplacian matrix LN if the gap between the K-th and the K+1-th eigenvalue of LN is sufficiently large. If the gap is small a perturbation may swap the corresponding eigenvectors and the results can be very different from the optimal ones.In this paper we suggest a weaker working hypothesis: the optimal partition of K classes can be obtained from a K-dimensional subspace of the first M>K eigenvectors, where M is a parameter chosen by the user. We show that the validity of this hypothesis can be confirmed by the gap size between the K-th and the M+1-th eigenvalue of LN. Finally we present and analyse a simple probabilistic algorithm that generalizes current spectral techniques in this extended framework. This algorithm gives results on real world graphs that are close to the state of the art by selecting correct K-dimensional subspaces of the linear span of the first M eigenvectors, robust to small changes of the eigenvalues.  相似文献   

18.
Most knowledge discovery in databases (KDD) research is concentrated on supervised inductive learning. Conceptual clustering is an unsupervised inductive learning technique that organizes observations into an abstraction hierarchy without using predefined class values. However, a typical conceptual clustering algorithm is not suitable for a KDD task because of space and time constraints. Furthermore, typical incremental and non-incremental clustering algorithms are not designed for a partitioned data set. In this paper, we present a conceptual clustering algorithm that works on partitioned data. The proposed algorithm improves the clustering process by using less computation time and less space while maintaining the clustering quality.  相似文献   

19.
Systemic candidiasis (SC) is associated with high morbidity and mortality, because it generally affects patients with severe underlying diseases and its diagnosis is difficult and often delayed, resulting in delayed therapy. We used serological proteome analysis to screen serum anti‐Candida IgG antibody‐reactivity profiles in 24 patients under intensive care, 12 of which had confirmed SC (fungal cultures), and in 12 healthy subjects. A total of 15 immunogenic proteins from Candida albicans protoplast lysates were differentially immunorecognized by serum IgG antibodies from SC patients compared to controls. Two‐way hierarchical clustering and principal‐component analyses of these antibody‐reactivity patterns accurately differentiated SC patients from controls. Anti‐Eno1p IgG antibodies were found to be present at high abundance in SC patients and be an important molecular fingerprint in serum for SC diagnosis. Differential anti‐Eno1p IgG antibody reactivity was further validated by a tag capture ELISA and a Western blot assay in 45 SC patients and 118 non‐SC subjects. Both quantitative assays provided comparable analytical, diagnostic and prognostic performances, and verified initial proteomic‐profiling results. If confirmed in prospective cohort studies, these anti‐Eno1p IgG antibodies might be useful for SC diagnosis. However, these, at least as measured by these clinical platforms, appear to have limited prognostic value in SC patients.  相似文献   

20.
一种解决大规模数据集问题的核主成分分析算法   总被引:4,自引:0,他引:4  
史卫亚  郭跃飞  薛向阳 《软件学报》2009,20(8):2153-2159
提出一种大规模数据集求解核主成分的计算方法.首先使用Gram矩阵生成一个Gram-power矩阵,根据线性代数的理论可知,新形成的矩阵和原先的Gram矩阵具有相同的特征向量.因此,可以把Gram矩阵的每一列看成核空间迭代算法的输入样本,这样,无须使用特征分解即可迭代地计算出核主成分.该算法的空间复杂度只有O(m);在大规模数据集的情况下,时间复杂度也降低为O(pkm).实验结果表明了所提出算法的有效性.更为重要的是,在大规模数据集的情况下,当传统的特征分解技术无法使用时,该方法仍然可以提取非线性特征.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号