首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 687 毫秒
1.
In this paper, we propose a new parallel clustering algorithm, named Parallel Bisecting k-means with Prediction (PBKP), for message-passing multiprocessor systems. Bisecting k-means tends to produce clusters of similar sizes, and according to our experiments, it produces clusters with smaller entropy (i.e., purer clusters) than k-means does. Our PBKP algorithm fully exploits the data-parallelism of the bisecting k-means algorithm, and adopts a prediction step to balance the workloads of multiple processors to achieve a high speedup. We implemented PBKP on a cluster of Linux workstations and analyzed its performance. Our experimental results show that the speedup of PBKP is linear with the number of processors and the number of data points. Moreover, PBKP scales up better than the parallel k-means with respect to the dimension and the desired number of clusters. This research was supported in part by AFRL/Wright Brothers Institute (WBI).  相似文献   

2.
As a classic NP-hard problem in machine learning and computational geometry, the k-means problem aims to partition the given points into k sets to minimize the within-cluster sum of squares. The k-means problem with penalties (k-MPWP), as a generalizing problem of the k-means problem, allows a point that can be either clustered or penalized with some positive cost. In this paper, we mainly apply the parallel seeding algorithm to the k-MPWP, and show sufficient analysis to bound the expected solution quality in the case where both the number of iterations and the number of points sampled in each iteration can be given arbitrarily. The approximate guarantee can be obtained as , where is a polynomial function involving the maximal ratio M between the penalties. On one hand, this result can be viewed as a further improvement on the parallel algorithm for k-MPWP given by Li et al., where the number of iterations depends on other factors. On the other hand, our result also generalizes the one solving the k-means problem presented by Bachem et al., because k-MPWP is a variant of the k-means problem. Moreover, we present a numerical experiment to show the effectiveness of the parallel algorithm for k-means with penalties.  相似文献   

3.
Clustering is a popular data analysis and data mining technique. A popular technique for clustering is based on k-means such that the data is partitioned into K clusters. However, the k-means algorithm highly depends on the initial state and converges to local optimum solution. This paper presents a new hybrid evolutionary algorithm to solve nonlinear partitional clustering problem. The proposed hybrid evolutionary algorithm is the combination of FAPSO (fuzzy adaptive particle swarm optimization), ACO (ant colony optimization) and k-means algorithms, called FAPSO-ACO–K, which can find better cluster partition. The performance of the proposed algorithm is evaluated through several benchmark data sets. The simulation results show that the performance of the proposed algorithm is better than other algorithms such as PSO, ACO, simulated annealing (SA), combination of PSO and SA (PSO–SA), combination of ACO and SA (ACO–SA), combination of PSO and ACO (PSO–ACO), genetic algorithm (GA), Tabu search (TS), honey bee mating optimization (HBMO) and k-means for partitional clustering problem.  相似文献   

4.
贾洪杰  丁世飞  史忠植 《软件学报》2015,26(11):2836-2846
谱聚类将聚类问题转化成图划分问题,是一种基于代数图论的聚类方法.在求解图划分目标函数时,一般利用Rayleigh熵的性质,通过计算Laplacian矩阵的特征向量将原始数据点映射到一个低维的特征空间中,再进行聚类.然而在谱聚类过程中,存储相似矩阵的空间复杂度是O(n2),对Laplacian矩阵特征分解的时间复杂度一般为O(n3),这样的复杂度在处理大规模数据时是无法接受的.理论证明,Normalized Cut图聚类与加权核k-means都等价于矩阵迹的最大化问题.因此,可以用加权核k-means算法来优化Normalized Cut的目标函数,这就避免了对Laplacian矩阵特征分解.不过,加权核k-means算法需要计算核矩阵,其空间复杂度依然是O(n2).为了应对这一挑战,提出近似加权核k-means算法,仅使用核矩阵的一部分来求解大数据的谱聚类问题.理论分析和实验对比表明,近似加权核k-means的聚类表现与加权核k-means算法是相似的,但是极大地减小了时间和空间复杂性.  相似文献   

5.
Rough k-means clustering algorithm and its extensions are introduced and successfully applied to real-life data where clusters do not necessarily have crisp boundaries. Experiments with the rough k-means clustering algorithm have shown that it provides a reasonable set of lower and upper bounds for a given dataset. However, the same weight was used for all the data objects in a lower or upper approximate set when computing the new centre for each cluster while the different impacts of the objects in a same approximation were ignored. An improved rough k-means clustering based on weighted distance measure with Gaussian function is proposed in this paper. The validity of this algorithm is demonstrated by simulation and experimental analysis.  相似文献   

6.
The problem of optimal non-hierarchical clustering is addressed. A new algorithm combining differential evolution and k-means is proposed and tested on eight well-known real-world data sets. Two criteria (clustering validity indexes), namely TRW and VCR, were used in the optimization of classification. The classification of objects to be optimized is encoded by the cluster centers in differential evolution (DE) algorithm. It induced the problem of rearrangement of centers in the population to ensure an efficient search via application of evolutionary operators. A new efficient heuristic for this rearrangement was also proposed. The plain DE variants with and without the rearrangement were compared with corresponding hybrid k-means variants. The experimental results showed that hybrid variants with k-means algorithm are essentially more efficient than the non-hybrid ones. Compared to a standard k-means algorithm with restart, the new hybrid algorithm was found more reliable and more efficient, especially in difficult tasks. The results for TRW and VCR criterion were compared. Both criteria provided the same optimal partitions and no significant differences were found in efficiency of the algorithms using these criteria.  相似文献   

7.
无线传感器网络HEDSA数据聚合研究   总被引:1,自引:0,他引:1  
归奕红 《计算机工程》2011,37(7):160-162
现有普通安全方案不能满足无线传感器网络高安全性和高效率要求,为此,提出一种同态加密与数字签名算法。利用同态加密技术对加密的数据进行聚合,提高网络的数据传输效率,通过数字签名提供数据的完整性和不可否认性鉴别。理论证明和仿真实验表明,该算法具有较高的安全性和效率。  相似文献   

8.
The k-means algorithm and its variations are known to be fast clustering algorithms. However, they are sensitive to the choice of starting points and are inefficient for solving clustering problems in large datasets. Recently, incremental approaches have been developed to resolve difficulties with the choice of starting points. The global k-means and the modified global k-means algorithms are based on such an approach. They iteratively add one cluster center at a time. Numerical experiments show that these algorithms considerably improve the k-means algorithm. However, they require storing the whole affinity matrix or computing this matrix at each iteration. This makes both algorithms time consuming and memory demanding for clustering even moderately large datasets. In this paper, a new version of the modified global k-means algorithm is proposed. We introduce an auxiliary cluster function to generate a set of starting points lying in different parts of the dataset. We exploit information gathered in previous iterations of the incremental algorithm to eliminate the need of computing or storing the whole affinity matrix and thereby to reduce computational effort and memory usage. Results of numerical experiments on six standard datasets demonstrate that the new algorithm is more efficient than the global and the modified global k-means algorithms.  相似文献   

9.
Fast and exact out-of-core and distributed k-means clustering   总被引:1,自引:2,他引:1  
Clustering has been one of the most widely studied topics in data mining and k-means clustering has been one of the popular clustering algorithms. K-means requires several passes on the entire dataset, which can make it very expensive for large disk-resident datasets. In view of this, a lot of work has been done on various approximate versions of k-means, which require only one or a small number of passes on the entire dataset.In this paper, we present a new algorithm, called fast and exact k-means clustering (FEKM), which typically requires only one or a small number of passes on the entire dataset and provably produces the same cluster centres as reported by the original k-means algorithm. The algorithm uses sampling to create initial cluster centres and then takes one or more passes over the entire dataset to adjust these cluster centres. We provide theoretical analysis to show that the cluster centres thus reported are the same as the ones computed by the original k-means algorithm. Experimental results from a number of real and synthetic datasets show speedup between a factor of 2 and 4.5, as compared with k-means.This paper also describes and evaluates a distributed version of FEKM, which we refer to as DFEKM. This algorithm is suitable for analysing data that is distributed across loosely coupled machines. Unlike the previous work in this area, DFEKM provably produces the same results as the original k-means algorithm. Our experimental results show that DFEKM is clearly better than two other possible options for exact clustering on distributed data, which are down loading all data and running sequential k-means or running parallel k-means on a loosely coupled configuration. Moreover, even in a tightly coupled environment, DFEKM can outperform parallel k-means if there is a significant load imbalance. Ruoming Jin is currently an assistant professor in the Computer Science Department at Kent State University. He received a BE and a ME degree in computer engineering from Beihang University (BUAA), China in 1996 and 1999, respectively. He earned his MS degree in computer science from University of Delaware in 2001, and his Ph.D. degree in computer science from the Ohio State University in 2005. His research interests include data mining, databases, processing of streaming data, bioinformatics, and high performance computing. He has published more than 30 papers in these areas. He is a member of ACM and SIGKDD. Anjan Goswami studied robotics at the Indian Institute of Technology at Kanpur. While working with IBM, he was interested in studying computer science. He then obtained a masters degree from the University of South Florida, where he worked on computer vision problems. He then transferred to the PhD program in computer science at OSU, where he did a Masters thesis on efficient clustering algorithms for massive, distributed and streaming data. On successful completion of this, he decided to join a web-service-provider company to do research in designing and developing high-performance search solutions for very large structured data. Anjan' favourite recreations are studying and predicting technology trends, nature photography, hiking, literature and soccer. Gagan Agrawal is an Associate Professor of Computer Science and Engineering at the Ohio State University. He received his B.Tech degree from Indian Institute of Technology, Kanpur, in 1991, and M.S. and Ph.D degrees from University of Maryland, College Park, in 1994 and 1996, respectively. His research interests include parallel and distributed computing, compilers, data mining, grid computing, and data integration. He has published more than 110 refereed papers in these areas. He is a member of ACM and IEEE Computer Society. He received a National Science Foundation CAREER award in 1998.  相似文献   

10.
Adapting k-means for supervised clustering   总被引:2,自引:1,他引:1  
k-means is traditionally viewed as an algorithm for the unsupervised clustering of a heterogeneous population into a number of more homogeneous groups of objects. However, it is not necessarily guaranteed to group the same types (classes) of objects together. In such cases, some supervision is needed to partition objects which have the same label into one cluster. This paper demonstrates how the popular k-means clustering algorithm can be profitably modified to be used as a classifier algorithm. The output field itself cannot be used in the clustering but it is used in developing a suitable metric defined on other fields. The proposed algorithm combines Simulated Annealing with the modified k-means algorithm. We apply the proposed algorithm to real data sets, and compare the output of the resultant classifier to that of C4.5.  相似文献   

11.
Zhou  Jukai  Liu  Tong  Zhu  Jingting 《Multimedia Tools and Applications》2019,78(23):33415-33434

K-means clustering is one of the most popular clustering algorithms and has been embedded in other clustering algorithms, e.g. the last step of spectral clustering. In this paper, we propose two techniques to improve previous k-means clustering algorithm by designing two different adjacent matrices. Extensive experiments on public UCI datasets showed the clustering results of our proposed algorithms significantly outperform three classical clustering algorithms in terms of different evaluation metrics.

  相似文献   

12.
以K-means为代表的聚类算法被广泛地应用在许多领域, 但是K-means不能直接处理不完整数据集. km-means是一种处理不完整数据集的聚类算法, 通过调整局部距离计算方式, 减少不完整数据对聚类过程的影响. 然而km-means初始化阶段选取的聚类中心存在较大的不可靠性, 容易陷入局部最优解. 针对此问题, 本文引入可信度, 提出了结合可信度的km-means聚类算法, 通过可信度调整距离计算, 增大初始化过程中选取聚类中心的可靠性, 提高聚类算法的准确度. 最后, 通过UCI和UCR数据集验证算法的有效性.  相似文献   

13.
In recent years, there have been numerous attempts to extend the k-means clustering protocol for single database to a distributed multiple database setting and meanwhile keep privacy of each data site. Current solutions for (whether two or more) multiparty k-means clustering, built on one or more secure two-party computation algorithms, are not equally contributory, in other words, each party does not equally contribute to k-means clustering. This may lead a perfidious attack where a party who learns the outcome prior to other parties tells a lie of the outcome to other parties. In this paper, we present an equally contributory multiparty k-means clustering protocol for vertically partitioned data, in which each party equally contributes to k-means clustering. Our protocol is built on ElGamal's encryption scheme, Jakobsson and Juels's plaintext equivalence test protocol, and mix networks, and protects privacy in terms that each iteration of k-means clustering can be performed without revealing the intermediate values.  相似文献   

14.
This paper proposes a new method to weight subspaces in feature groups and individual features for clustering high-dimensional data. In this method, the features of high-dimensional data are divided into feature groups, based on their natural characteristics. Two types of weights are introduced to the clustering process to simultaneously identify the importance of feature groups and individual features in each cluster. A new optimization model is given to define the optimization process and a new clustering algorithm FG-k-means is proposed to optimize the optimization model. The new algorithm is an extension to k-means by adding two additional steps to automatically calculate the two types of subspace weights. A new data generation method is presented to generate high-dimensional data with clusters in subspaces of both feature groups and individual features. Experimental results on synthetic and real-life data have shown that the FG-k-means algorithm significantly outperformed four k-means type algorithms, i.e., k-means, W-k-means, LAC and EWKM in almost all experiments. The new algorithm is robust to noise and missing values which commonly exist in high-dimensional data.  相似文献   

15.
Fully Homomorphic Encryption (FHE) is a powerful encryption system in cloud computing that allows homomorphic computations on encrypted data without decrypting them. Multi-key fully homomorphic encryption (MFHE), as an extension to FHE, allows homomorphic computations on ciphertexts encrypted under different keys. However, most MFHE schemes require a Common Random/Reference String (CRS), while the few that do not are based on the Learning With Errors (LWE) problem, which means that they can only deal with single bit plaintext. Consequently, MFHE schemes based on the Ring Learning With Errors (RLWE) problem are more desirable, as they can handle polynomial plaintext. Requiring the CRS seems to weaken the semantic definition of MFHE, where all users generate their own keys independently. In this paper, we study the RLWE-based MFHE in the CRS model and propose the first RLWE-based MFHE without a CRS. To this end, we remove the CRS by designing a relinearization algorithm without a CRS. Like previous MFHE schemes, our RLWE-based MFHE without CRS has a simple 1-round threshold decryption, which implies a 3-round secure MPC protocol in the plain model from the RLWE assumption.  相似文献   

16.
类似于多密钥全同态加密(Multi-key Fully Homomorphic Encryption,MFHE),多身份全同态加密(Multi-id Identity-basedFully Homomorphic Encryption,MIBFHE)允许对不同用户的密文进行关于任意函数的同态计算,且后者因具有加密密钥易获取、密钥托管和密钥撤销易实现等特点,具有更深远的应用前景。
Canetti等人在PKC 2017上给出了一个框架,可将身份加密方案(Identity-based Encryption,IBE)和MFHE方案转换成MIBFHE方案。若用基于DLWE假设的IBE方案和Brakerski与Perlman的全动态MFHE方案(以下简称BP方案),可得到全动态的MIBFHE方案,但密文规模较大,为O(n5log5q),这里n,q是DLWE假设的参数,且紧致性相比于MFHE方案变弱。因密文规模是影响通信效率的主要因素,本文构造了一个密文规模较小和紧致性较强的MIBFHE方案框架,且仅用了MFHE这一个构件,然后用BP方案去实例化,得到了全动态的、选择性安全的MIBFHE方案,其密文规模为O(nlogq).  相似文献   

17.
In this paper, we propose an implicit gradient descent algorithm for the classic k-means problem. The implicit gradient step or backward Euler is solved via stochastic fixed-point iteration, in which we randomly sample a mini-batch gradient in every iteration. It is the average of the fixed-point trajectory that is carried over to the next gradient step. We draw connections between the proposed stochastic backward Euler and the recent entropy stochastic gradient descent for improving the training of deep neural networks. Numerical experiments on various synthetic and real datasets show that the proposed algorithm provides better clustering results compared to k-means algorithms in the sense that it decreased the objective function (the cluster) and is much more robust to initialization.  相似文献   

18.
The rapid development of earth observation technology has produced large quantities of remote-sensing data. Unsupervised classification (i.e. clustering) of remote-sensing images, an important means to acquire land-use/cover information, has become increasingly in demand due to its simplicity and ease of application. Traditional methods, such as k-means, struggle to solve this NP-hard (Non-deterministic Polynomial hard) image classification problem. Particle swarm optimization (PSO), always achieving better result than k-means, has recently been applied to unsupervised image classification. However, PSO was also found to be easily trapped on local optima. This article proposes a novel unsupervised Levy flight particle swarm optimization (ULPSO) method for image classification with balanced exploitation and exploration capabilities. It benefits from a new searching strategy: the worst particle in the swarm is targeted and its position is updated with Levy flight at each iteration. The effectiveness of the proposed method was tested with three types of remote-sensing imagery (Landsat Thematic Mapper (TM), Flightline C1 (FLC), and QuickBird) that are distinct in terms of spatial and spectral resolution and landscape. Our results showed that ULPSO is able to achieve significantly better and more stable classification results than k-means and the other two intelligent methods based on genetic algorithm (GA) and particle swarm optimization (PSO) over all of the experiments. ULPSO is, therefore, recommended as an effective alternative for unsupervised remote-sensing image classification.  相似文献   

19.

We have proposed a robust, secure and efficient image encryption algorithm based on chaotic maps and algebraic structure. Nowadays, the chaotic cryptosystems gained more attention due to their efficiency, the assurance of robustness and high sensitivity corresponding to initial conditions. In literature, there are many encryption algorithms that can simply guarantees security while the schemes based on chaotic systems only promises the uncertainty, both of them can not encounter the needs of current scenario. To tackle this issue, this article proposed an image encryption algorithm based on Lorenz chaotic system and primitive irreducible polynomial substitution box. First, we have proposed 16 different S-boxes based on projective general linear group and 16 primitive irreducible polynomials of Galois field of order 256, and then utilized these S-boxes with combination of chaotic map in image encryption scheme. Three chaotic sequences can be produced by the disturbed of Lorenz chaotic system corresponding to variables x, y and z. We have constructed a new pseudo random chaotic sequence ki based on x, y and z. The plain image is encrypted by the use of chaotic sequence ki and XOR operation to get a ciphered image. To show the strength of presented image encryption, some renowned analyses are performed.

  相似文献   

20.
Abstract

Two fuzzy versions of the k-means optimal, least squared error partitioning problem are formulated for finite subsets X of a general inner product space. In both cases, the extremizing solutions are shown to be fixed points of a certain operator T on the class of fuzzy, k-partitions of X, and simple iteration of T provides an algorithm which has the descent property relative to the least squared error criterion function. In the first case, the range of T consists largely of ordinary (i.e. non-fuzzy) partitions of X and the associated iteration scheme is essentially the well known ISODATA process of Ball and Hall. However, in the second case, the range of T consists mainly of fuzzy partitions and the associated algorithm is new; when X consists of k compact well separated (CWS) clusters, Xi , this algorithm generates a limiting partition with membership functions which closely approximate the characteristic functions of the clusters Xi . However, when X is not the union of k CWS clusters, the limiting partition is truly fuzzy in the sense that the values of its component membership functions differ substantially from 0 or 1 over certain regions of X. Thus, unlike ISODATA, the “fuzzy” algorithm signals the presence or absence of CWS clusters in X. Furthermore, the fuzzy algorithm seems significantly less prone to the “cluster-splitting” tendency of ISODATA and may also be less easily diverted to uninteresting locally optimal partitions. Finally, for data sets X consisting of dense CWS clusters embedded in a diffuse background of strays, the structure of X is accurately reflected in the limiting partition generated by the fuzzy algorithm. Mathematical arguments and numerical results are offered in support of the foregoing assertions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号