首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
Recently, sparse component analysis (SCA) has become a hot spot in BSS research. Instead of independent component analysis (ICA), SCA can be used to solve underdetermined mixture efficiently. Two-step approach (TSA) is one of the typical methods to solve SCA based BSS problems. It estimates the mixing matrix before the separation of the sources. K-means clustering is often used to estimate the mixing matrix. It relies on the prior knowledge of the source number strongly. However, the estimation of the source number is an obstacle. In this paper, a fuzzy clustering method is proposed to estimate the source number and mixing matrix simultaneously. After that, the sources are recovered by the shortest path method (SPM). Simulations show the availability and robustness of the proposed method.  相似文献   

2.
In this paper, we present a perturbation analysis for the matrices in the multiway normalized cut spectral clustering method based on the matrix perturbation theory. The analytical results show that the eigenvalues and the eigenspaces of the normalized Laplacian matrices are continuous. Therefore, clustering algorithms can be designed according to the special properties of the normalized Laplacian matrices in the ideal case and the method can be extended to the general case based on the continuity of the eigenvalues and the eigenspaces of the normalized Laplacian matrices. The numerical results are consistent with the theoretical results.  相似文献   

3.
Kernel selection is one of the key issues both in recent research and application of kernel methods. This is usually done by minimizing either an estimate of generalization error or some other related performance measure. Use of notions of stability to estimate the generalization error has attracted much attention in recent years. Unfortunately, the existing notions of stability, proposed to derive the theoretical generalization error bounds, are difficult to be used for kernel selection in practice. It is well known that the kernel matrix contains most of the information needed by kernel methods, and the eigenvalues play an important role in the kernel matrix. Therefore, we aim at introducing a new notion of stability, called the spectral perturbation stability, to study the kernel selection problem. This proposed stability quantifies the spectral perturbation of the kernel matrix with respect to the changes in the training set. We establish the connection between the spectral perturbation stability and the generalization error. By minimizing the derived generalization error bound, we propose a new kernel selection criterion that can guarantee good generalization properties. In our criterion, the perturbation of the eigenvalues of the kernel matrix is efficiently computed by solving the derivative of a newly defined generalized kernel matrix. Both theoretical analysis and experimental results demonstrate that our criterion is sound and effective.  相似文献   

4.
This paper proposes a sampling based hierarchical approach for solving the computational demands of the spectral clustering methods when applied to the problem of image segmentation. The authors first define the distance between a pixel and a cluster, and then derive a new theorem to estimate the number of samples needed for clustering. Finally, by introducing a scale parameter into the simi- larity function, a novel spectral clustering based image segmentation method has been developed. An important characteristic of the approach is that in the course of image segmentation one needs not only to tune the scale parameter to merge the small size clusters or split the large size clusters but also take samples from the data set at the different scales. The multiscale and stochastic nature makes it feasible to apply the method to very large grouping problem. In addition, it also makes the segmentation compute in time that is linear in the size of the image. The experimental results on various synthetic and real world images show the effective- ness of the approach.  相似文献   

5.
Co-clustering treats a data matrix in a symmetric fashion that a partitioning of rows can induce a partitioning of columns, and vice versa. It has been shown advantageous over tradition clustering. However, the computational complexity of most co-clustering algorithms are costly, and thus limit their e?ectiveness on large datasets. A recently proposed sampling-based matrix decomposition method can achieve a linear computational complexity, but selected rows and columns can not effectively represent a large sparse dataset, and many unselected rows and columns can not be mapped to the selected rows and columns because they do not share features in common, thus its performance is impaired. To address this problem, we propose a fast co-clustering framework by ranking and sampling that only representative samples are selected for co-clustering, and the remaining samples can be easily labeled by their neighbors in clustered samples. Extensive experiments on large text datasets show that our approach is able to use very few samples to achieve comparable results in linear time compared to state-of-the-art co-clustering algorithms of nonlinear computational complexity.  相似文献   

6.
A novel framework for fuzzy modeling and model-based control design is described. Based on the theory of fuzzy constraint processing, the fuzzy model can be viewed as a generalized Takagi-Sugeno (TS) fuzzy model with fuzzy functional consequences. It uses multivariate antecedent membership functions obtained by granular-prototype fuzzy clustering methods and consequent fuzzy equations obtained by fuzzy regression techniques. Constrained optimization is used to estimate the consequent parameters, where the constraints are based on control-relevant a priori knowledge about the modeled process. The fuzzy-constraint-based approach provides the following features. 1) The knowledge base of a constraint-based fuzzy model can incorporate information with various types of fuzzy predicates. Consequently, it is easy to provide a fusion of different types of knowledge. The knowledge can be from data-driven approaches and/or from controlrelevant physical models. 2) A corresponding inference mechanism for the proposed model can deal with heterogeneous information granules. 3) Both numerical and linguistic inputs can be accepted for predicting new outputs. The proposed techniques are demonstrated by means of two examples: a nonlinear function-fitting problem and the well-known Box-Jenkins gas furnace process. The first example shows that the proposed model uses fewer fuzzy predicates achieving similar results with the traditional rule-based approach, while the second shows the performance can be significantly improved when the control-relevant constraints are considered.  相似文献   

7.
The probabilistic real-time automaton (PRTA) is a representation of dynamic processes arising in the sciences and industry. Currently, the induction of automata is divided into two steps: the creation of the prefix tree acceptor (PTA) and the merge procedure based on clustering of the states. These two steps can be very time intensive when a PRTA is to be induced for massive or even unbounded datasets. The latter one can be efficiently processed, as there exist scalable online clustering algorithms. However, the creation of the PTA still can be very time consuming. To overcome this problem, we propose a genuine online PRTA induction approach that incorporates new instances by first collapsing them and then using a maximum frequent pattern based clustering. The approach is tested against a predefined synthetic automaton and real world datasets, for which the approach is scalable and stable. Moreover, we present a broad evaluation on a real world disease group dataset that shows the applicability of such a model to the analysis of medical processes.  相似文献   

8.
Clustering has long been an important data processing task in different applications. Typically, it attempts to partition the available data into groups according to their underlying distributions, and each cluster is represented by a center or an exemplar. In this paper, a new clustering algorithm called gravitational-force-based affinity propagation (GAP) is proposed, based on the well-known Newton''s law of universal gravitation. It views the available data points as nodes of a network (or planets of a universe) and the clusters and their corresponding exemplars can be obtained by transmitting affinity messages based on the gravitational forces between data points in a network. While GAP is inspired by the recently proposed affinity propagation (AP) clustering approach, it provides a new definition of the similarity between data points which makes the AP process more convincing and at the same time facilitates the differentiation of data points'' importance. The experimental results show that the GAP clustering algorithm, with comparable clustering accuracy, is even more efficient than the original AP clustering approach.  相似文献   

9.
Weighted complex dynamical networks with heterogeneous delays in both continuous-time and discrete-time domains are controlled by applying local feedback injections to a small fraction of network nodes. Some generic stability criteria ensuring delay-independent stability are derived for such controlled networks in terms of linear matrix inequalities (LMIs), which guarantee that by placing a small number of feedback controllers on some nodes the whole network can be pinned to some desired homogenous states. In some particular cases, a single controller can achieve the control objective. It is found that stabilization of such pinned networks is completely determined by the dynamics of the individual uncoupled node, the overall coupling strength, the inner-coupling matrix, and the smallest eigenvalue of the coupling and control matrix. Numerical simulations of a weighted network composing of a 3-dimensional nonlinear system are finally given for illustration and verification.  相似文献   

10.
Clustering data streams has drawn lots of attention in the last few years due to their ever-growing presence. Data streams put additional challenges on clustering such as limited time and memory and one pass clustering. Furthermore, discovering clusters with arbitrary shapes is very important in data stream applications. Data streams are infinite and evolving over time, and we do not have any knowledge about the number of clusters. In a data stream environment due to various factors, some noise appears occasionally. Density-based method is a remarkable class in clustering data streams, which has the ability to discover arbitrary shape clusters and to detect noise. Furthermore, it does not need the nmnber of clusters in advance. Due to data stream characteristics, the traditional density-based clustering is not applicable. Recently, a lot of density-based clustering algorithms are extended for data streams. The main idea in these algorithms is using density- based methods in the clustering process and at the same time overcoming the constraints, which are put out by data streanFs nature. The purpose of this paper is to shed light on some algorithms in the literature on density-based clustering over data streams. We not only summarize the main density-based clustering algorithms on data streams, discuss their uniqueness and limitations, but also explain how they address the challenges in clustering data streams. Moreover, we investigate the evaluation metrics used in validating cluster quality and measuring algorithms' performance. It is hoped that this survey will serve as a steppingstone for researchers studying data streams clustering, particularly density-based algorithms.  相似文献   

11.
针对传统谱聚类算法应用于图像分割时仅采用特征相似性信息构造相似性矩阵,而忽略了像素分布的空间临近信息的缺陷,提出一种新的相似性度量公式--加权欧氏距离的高斯核函数,充分利用图像特征相似性信息和空间临近信息构造相似性矩阵。在谱映射过程中,采用Nystrom逼近策略近似估计相似性矩阵及其特征向量,大大减少了求解相似性矩阵的运算复杂度,降低了内存消耗。对得到的低维向量子空间采用一种新型的聚类算法--近邻传播聚类算法进行聚类,避免了传统谱聚类采用K-means算法对初始值敏感,易陷入局部最优的缺陷。实验表明该算法获得了比传统谱聚类算法更好的分割效果。  相似文献   

12.
聚类分析是一种常见的分析方法,谱聚类作为聚类分析的一支,因其不受样本形状约束等特点备受瞩目。为及时掌握当前谱聚类算法研究动态,通过对比分析众多谱聚类优化算法,从半监督学习、二阶段聚类算法选择、算法执行效率优化等三个角度,将谱聚类优化算法分为三类,并对每类算法的优化思想进行综述。介绍经典多路谱聚类与基本理论,并分析相似矩阵及其特征值、特征向量选取原因及影响,旨在明确特征矩阵的重要性与优化的必要性。基于算法改进策略差异,梳理并总结每类算法的改进思想、研究现状及优缺点。在UCI数据集与手写体数据集上,针对谱聚类算法与优化算法进行实验对比,并对谱聚类优化算法的未来研究方向进行展望。  相似文献   

13.
The task of discovering natural groupings of input patterns, or clustering, is an important aspect of machine learning and pattern analysis. In this paper, we study the widely used spectral clustering algorithm which clusters data using eigenvectors of a similarity/affinity matrix derived from a data set. In particular, we aim to solve two critical issues in spectral clustering: (1) how to automatically determine the number of clusters, and (2) how to perform effective clustering given noisy and sparse data. An analysis of the characteristics of eigenspace is carried out which shows that (a) not every eigenvectors of a data affinity matrix is informative and relevant for clustering; (b) eigenvector selection is critical because using uninformative/irrelevant eigenvectors could lead to poor clustering results; and (c) the corresponding eigenvalues cannot be used for relevant eigenvector selection given a realistic data set. Motivated by the analysis, a novel spectral clustering algorithm is proposed which differs from previous approaches in that only informative/relevant eigenvectors are employed for determining the number of clusters and performing clustering. The key element of the proposed algorithm is a simple but effective relevance learning method which measures the relevance of an eigenvector according to how well it can separate the data set into different clusters. Our algorithm was evaluated using synthetic data sets as well as real-world data sets generated from two challenging visual learning problems. The results demonstrated that our algorithm is able to estimate the cluster number correctly and reveal natural grouping of the input data/patterns even given sparse and noisy data.  相似文献   

14.
蒋勇  谭怀亮  李光文 《计算机应用》2011,31(9):2546-2550
在处理大数据集聚类问题上,谱聚算法因存在占用存储空间大、时间复杂度高的缺陷而难以推广,针对此问题,提出采用多次分割、向上向下双向收缩的QR算法求得特征值对应的特征向量来实现降维,并在此基础上构造映射空间上的样本来实现量子遗传谱聚算法的聚类。该方法通过映射为后续的量子遗传谱聚算法聚类提供低维的输入,而量子遗传算法具有快速收敛到全局最优并且对初始化不敏感的特性,从而可以获得良好的聚类结果。实验结果显示,使用该算法的聚类比谱聚算法、K-means算法、NJW算法等单一方法具有更好的收敛性、稳定性和更高的全局最优。  相似文献   

15.
针对传统谱聚类算法没有解决簇划分过程中,簇间交叉区域样本点对聚类效果有影响这个问题,提出一种基于局部协方差矩阵的谱聚类算法,主要介绍了一种新的计算样本之间相似度亲和矩阵的方法,即通过计算样本点之间的欧氏距离划分出小子集,计算小子集的协方差,通过设定阈值剔除交叉点,由剩下的点构造相似矩阵,对相似矩阵进行特征值分解,用经典的[k]-means算法对由特征向量组成的矩阵聚类。通过在Control等真实数据集上的实验结果表明,该算法在聚类准确率、标准互信息等指标上比较对比算法获得更优秀的效果。  相似文献   

16.
In recent years, spectral clustering has become one of the most popular clustering algorithms in areas of pattern analysis and recognition. This algorithm uses the eigenvalues and eigenvectors of a normalized similarity matrix to partition the data, and is simple to implement. However, when the image is corrupted by noise, spectral clustering cannot obtain satisfying segmentation performance. In order to overcome the noise sensitivity of the standard spectral clustering algorithm, a novel fuzzy spectral clustering algorithm with robust spatial information for image segmentation (FSC_RS) is proposed in this paper. Firstly, a non-local-weighted sum image of the original image is generated by utilizing the pixels with a similar configuration of each pixel. Then a robust gray-based fuzzy similarity measure is defined by using the fuzzy membership values among gray values in the new generated image. Thus, the similarity matrix obtained by this measure is only dependent on the number of the gray-levels and can be easily stored. Finally, the spectral graph partitioning method can be applied to this similarity matrix to group the gray values of the new generated image and then the corresponding pixels in the image are reclassified to obtain the final segmentation result. Some segmentation experiments on synthetic and real images show that the proposed method outperforms traditional spectral clustering methods and spatial fuzzy clustering in efficiency and robustness.  相似文献   

17.
针对网络故障检测中利用先验知识不足和多数谱聚类算法需事先确定聚类数的问题,提出一种新的基于成对约束信息传播与自动确定聚类数相结合的半监督自动谱聚类算法。通过学习一种新的相似性测度函数来满足约束条件,改进NJW聚类算法,对非规范化的Laplacian矩阵特征向量进行自动谱聚类,从而提高聚类性能。在UCI标准数据集和网络实测数据上的实验表明,该算法较相关比对算法聚类准确率更高,可满足网络故障检测的实际需要。  相似文献   

18.
谱聚类算法对输入数据顺序的敏感性*   总被引:2,自引:1,他引:1  
结合矩阵分析知识,还原了实施谱聚类算法过程中的矩阵表示.发现了不同数据输入顺序使得相应的Affinity矩阵及Laplacian矩阵是相似的.这样,Laplacian矩阵的特征向量生成的矩阵Y也是相似的;而以Y的行向量作为输入数据的K-平均算法依赖于初始的k个对象的选择.由此给出了导致谱聚类算法对数据输入顺序敏感的原因.  相似文献   

19.
李鹏清  李扬定  邓雪莲  李永钢  方月 《计算机科学》2018,45(Z11):458-461, 467
传统的谱聚类算法在建立相似度矩阵时仅考虑数据点与点的距离,忽略了数据点之间隐含的内在联系。针对这一问题,提出了一种基于SimRank的谱聚类算法。该算法首先用无向图数据建立邻接矩阵,并计算出基于SimRank的相似度矩阵;然后根据相似度矩阵建立拉普拉斯矩阵表达式,对其进行归一化后再进行谱分解;最后对分解得到的特征向量进行k-means聚类。在Zoo等UCI标准数据集上的实验结果表明,所提算法在聚类精确度、标准互信息和纯度3个评价指标上均优于现有的LRR(Low Rank Rrepresentation)等基于距离相似度的谱聚类算法。  相似文献   

20.
宋艳  殷俊 《计算机应用》2005,40(11):3211-3216
为了解决谱聚类算法中相似矩阵的构造不能满足簇内数据点高度相似的问题,给出一种基于共享近邻的多视角谱聚类算法(MV-SNN)。首先,算法通过提高共享近邻个数多的两个数据点的相似度,使同簇的数据之间的相似度更高;然后,将改进后的多个视角的相似矩阵进行相加从而整合得到全局相似矩阵;最后,为了解决一般谱聚类算法在后期仍需要通过k均值聚类算法进行数据点划分的问题,给出拉普拉斯矩阵秩约束的方法,从而直接通过全局相似矩阵得到最终的类簇结构。实验结果表明,对比其他几种多视角谱聚类算法,MV-SNN算法在三个聚类衡量标准:准确度、纯度和归一化互信息上的性能提高了1%~20%,在聚类时间上减少了50%左右,可见MV-SNN算法的聚类性能更好,用时更短。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号