首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Partitional clustering is a common approach to cluster analysis. Although many algorithms have been proposed, partitional clustering remains a challenging problem with respect to the reliability and efficiency of recovering high quality solutions in terms of its criterion functions. In this paper, we propose a niching genetic k-means algorithm (NGKA) for partitional clustering, which aims at reliably and efficiently identifying high quality solutions in terms of the sum of squared errors criterion. Within the NGKA, we design a niching method, which encourages mating among similar clustering solutions while allowing for some competitions among dissimilar solutions, and integrate it into a genetic algorithm to prevent premature convergence during the evolutionary clustering search. Further, we incorporate one step of k-means operation into the regeneration steps of the resulted niching genetic algorithm to improve its computational efficiency. The proposed algorithm was applied to cluster both simulated data and gene expression data and compared with previous work. Experimental results clear show that the NGKA is an effective clustering algorithm and outperforms two other genetic algorithm based clustering methods implemented for comparison.  相似文献   

2.
Clustering techniques have received attention in many fields of study such as engineering, medicine, biology and data mining. The aim of clustering is to collect data points. The K-means algorithm is one of the most common techniques used for clustering. However, the results of K-means depend on the initial state and converge to local optima. In order to overcome local optima obstacles, a lot of studies have been done in clustering. This paper presents an efficient hybrid evolutionary optimization algorithm based on combining Modify Imperialist Competitive Algorithm (MICA) and K-means (K), which is called K-MICA, for optimum clustering N objects into K clusters. The new Hybrid K-ICA algorithm is tested on several data sets and its performance is compared with those of MICA, ACO, PSO, Simulated Annealing (SA), Genetic Algorithm (GA), Tabu Search (TS), Honey Bee Mating Optimization (HBMO) and K-means. The simulation results show that the proposed evolutionary optimization algorithm is robust and suitable for handling data clustering.  相似文献   

3.
Clustering is a very powerful data mining technique for topic discovery from text documents. The partitional clustering algorithms, such as the family of k-means, are reported performing well on document clustering. They treat the clustering problem as an optimization process of grouping documents into k clusters so that a particular criterion function is minimized or maximized. Usually, the cosine function is used to measure the similarity between two documents in the criterion function, but it may not work well when the clusters are not well separated. To solve this problem, we applied the concepts of neighbors and link, introduced in [S. Guha, R. Rastogi, K. Shim, ROCK: a robust clustering algorithm for categorical attributes, Information Systems 25 (5) (2000) 345–366], to document clustering. If two documents are similar enough, they are considered as neighbors of each other. And the link between two documents represents the number of their common neighbors. Instead of just considering the pairwise similarity, the neighbors and link involve the global information into the measurement of the closeness of two documents. In this paper, we propose to use the neighbors and link for the family of k-means algorithms in three aspects: a new method to select initial cluster centroids based on the ranks of candidate documents; a new similarity measure which uses a combination of the cosine and link functions; and a new heuristic function for selecting a cluster to split based on the neighbors of the cluster centroids. Our experimental results on real-life data sets demonstrated that our proposed methods can significantly improve the performance of document clustering in terms of accuracy without increasing the execution time much.  相似文献   

4.
Adapting k-means for supervised clustering   总被引:2,自引:1,他引:1  
k-means is traditionally viewed as an algorithm for the unsupervised clustering of a heterogeneous population into a number of more homogeneous groups of objects. However, it is not necessarily guaranteed to group the same types (classes) of objects together. In such cases, some supervision is needed to partition objects which have the same label into one cluster. This paper demonstrates how the popular k-means clustering algorithm can be profitably modified to be used as a classifier algorithm. The output field itself cannot be used in the clustering but it is used in developing a suitable metric defined on other fields. The proposed algorithm combines Simulated Annealing with the modified k-means algorithm. We apply the proposed algorithm to real data sets, and compare the output of the resultant classifier to that of C4.5.  相似文献   

5.
Ant colony optimization (ACO) and particle swarm optimization (PSO) are two popular algorithms in swarm intelligence. Recently, a continuous ACO named ACOR was developed to solve the continuous optimization problems. This study incorporated ACOR with PSO to improve the search ability, investigating four types of hybridization as follows: (1) sequence approach, (2) parallel approach, (3) sequence approach with an enlarged pheromone-particle table, and (4) global best exchange. These hybrid systems were applied to data clustering. The experimental results utilizing public UCI datasets show that the performances of the proposed hybrid systems are superior compared to those of the K-mean, standalone PSO, and standalone ACOR. Among the four strategies of hybridization, the sequence approach with the enlarged pheromone table is superior to the other approaches because the enlarged pheromone table diversifies the generation of new solutions of ACOR and PSO, which prevents traps into the local optimum.  相似文献   

6.
The problem of optimal non-hierarchical clustering is addressed. A new algorithm combining differential evolution and k-means is proposed and tested on eight well-known real-world data sets. Two criteria (clustering validity indexes), namely TRW and VCR, were used in the optimization of classification. The classification of objects to be optimized is encoded by the cluster centers in differential evolution (DE) algorithm. It induced the problem of rearrangement of centers in the population to ensure an efficient search via application of evolutionary operators. A new efficient heuristic for this rearrangement was also proposed. The plain DE variants with and without the rearrangement were compared with corresponding hybrid k-means variants. The experimental results showed that hybrid variants with k-means algorithm are essentially more efficient than the non-hybrid ones. Compared to a standard k-means algorithm with restart, the new hybrid algorithm was found more reliable and more efficient, especially in difficult tasks. The results for TRW and VCR criterion were compared. Both criteria provided the same optimal partitions and no significant differences were found in efficiency of the algorithms using these criteria.  相似文献   

7.
贾洪杰  丁世飞  史忠植 《软件学报》2015,26(11):2836-2846
谱聚类将聚类问题转化成图划分问题,是一种基于代数图论的聚类方法.在求解图划分目标函数时,一般利用Rayleigh熵的性质,通过计算Laplacian矩阵的特征向量将原始数据点映射到一个低维的特征空间中,再进行聚类.然而在谱聚类过程中,存储相似矩阵的空间复杂度是O(n2),对Laplacian矩阵特征分解的时间复杂度一般为O(n3),这样的复杂度在处理大规模数据时是无法接受的.理论证明,Normalized Cut图聚类与加权核k-means都等价于矩阵迹的最大化问题.因此,可以用加权核k-means算法来优化Normalized Cut的目标函数,这就避免了对Laplacian矩阵特征分解.不过,加权核k-means算法需要计算核矩阵,其空间复杂度依然是O(n2).为了应对这一挑战,提出近似加权核k-means算法,仅使用核矩阵的一部分来求解大数据的谱聚类问题.理论分析和实验对比表明,近似加权核k-means的聚类表现与加权核k-means算法是相似的,但是极大地减小了时间和空间复杂性.  相似文献   

8.
Cooperative coevolution (CC) was used to improve the performance of evolutionary algorithms (EAs) on complex optimization problems in a divide-and-conquer way. In this paper, we show that the CC framework can be very helpful to improve the performance of particle swarm optimization (PSO) on clustering high-dimensional datasets. Based on CC framework, the original partitional clustering problem is first decomposed to several subproblems, each of which is then evolved by an optimizer independently. We employ a very simple but efficient optimization algorithm, namely bare-bone particle swarm optimization (BPSO), as the optimizer to solve each subproblem cooperatively. In addition, we design a new centroid-based encoding schema for each particle and apply the Chernoff bounds to decide a proper population size. The experimental results on synthetic and real-life datasets illustrate the effectiveness and efficiency of the BPSO and CC framework. The comparisons show the proposed algorithm significantly outperforms five EA-based clustering algorithms, i.e., PSO, SRPSO, ACO, ABC and DE, and K-means on most of the datasets.  相似文献   

9.
This study proposed a novel PSO–SVM model that hybridized the particle swarm optimization (PSO) and support vector machines (SVM) to improve the classification accuracy with a small and appropriate feature subset. This optimization mechanism combined the discrete PSO with the continuous-valued PSO to simultaneously optimize the input feature subset selection and the SVM kernel parameter setting. The hybrid PSO–SVM data mining system was implemented via a distributed architecture using the web service technology to reduce the computational time. In a heterogeneous computing environment, the PSO optimization was performed on the application server and the SVM model was trained on the client (agent) computer. The experimental results showed the proposed approach can correctly select the discriminating input features and also achieve high classification accuracy.  相似文献   

10.
Fast and exact out-of-core and distributed k-means clustering   总被引:1,自引:2,他引:1  
Clustering has been one of the most widely studied topics in data mining and k-means clustering has been one of the popular clustering algorithms. K-means requires several passes on the entire dataset, which can make it very expensive for large disk-resident datasets. In view of this, a lot of work has been done on various approximate versions of k-means, which require only one or a small number of passes on the entire dataset.In this paper, we present a new algorithm, called fast and exact k-means clustering (FEKM), which typically requires only one or a small number of passes on the entire dataset and provably produces the same cluster centres as reported by the original k-means algorithm. The algorithm uses sampling to create initial cluster centres and then takes one or more passes over the entire dataset to adjust these cluster centres. We provide theoretical analysis to show that the cluster centres thus reported are the same as the ones computed by the original k-means algorithm. Experimental results from a number of real and synthetic datasets show speedup between a factor of 2 and 4.5, as compared with k-means.This paper also describes and evaluates a distributed version of FEKM, which we refer to as DFEKM. This algorithm is suitable for analysing data that is distributed across loosely coupled machines. Unlike the previous work in this area, DFEKM provably produces the same results as the original k-means algorithm. Our experimental results show that DFEKM is clearly better than two other possible options for exact clustering on distributed data, which are down loading all data and running sequential k-means or running parallel k-means on a loosely coupled configuration. Moreover, even in a tightly coupled environment, DFEKM can outperform parallel k-means if there is a significant load imbalance. Ruoming Jin is currently an assistant professor in the Computer Science Department at Kent State University. He received a BE and a ME degree in computer engineering from Beihang University (BUAA), China in 1996 and 1999, respectively. He earned his MS degree in computer science from University of Delaware in 2001, and his Ph.D. degree in computer science from the Ohio State University in 2005. His research interests include data mining, databases, processing of streaming data, bioinformatics, and high performance computing. He has published more than 30 papers in these areas. He is a member of ACM and SIGKDD. Anjan Goswami studied robotics at the Indian Institute of Technology at Kanpur. While working with IBM, he was interested in studying computer science. He then obtained a masters degree from the University of South Florida, where he worked on computer vision problems. He then transferred to the PhD program in computer science at OSU, where he did a Masters thesis on efficient clustering algorithms for massive, distributed and streaming data. On successful completion of this, he decided to join a web-service-provider company to do research in designing and developing high-performance search solutions for very large structured data. Anjan' favourite recreations are studying and predicting technology trends, nature photography, hiking, literature and soccer. Gagan Agrawal is an Associate Professor of Computer Science and Engineering at the Ohio State University. He received his B.Tech degree from Indian Institute of Technology, Kanpur, in 1991, and M.S. and Ph.D degrees from University of Maryland, College Park, in 1994 and 1996, respectively. His research interests include parallel and distributed computing, compilers, data mining, grid computing, and data integration. He has published more than 110 refereed papers in these areas. He is a member of ACM and IEEE Computer Society. He received a National Science Foundation CAREER award in 1998.  相似文献   

11.
The rapid development of earth observation technology has produced large quantities of remote-sensing data. Unsupervised classification (i.e. clustering) of remote-sensing images, an important means to acquire land-use/cover information, has become increasingly in demand due to its simplicity and ease of application. Traditional methods, such as k-means, struggle to solve this NP-hard (Non-deterministic Polynomial hard) image classification problem. Particle swarm optimization (PSO), always achieving better result than k-means, has recently been applied to unsupervised image classification. However, PSO was also found to be easily trapped on local optima. This article proposes a novel unsupervised Levy flight particle swarm optimization (ULPSO) method for image classification with balanced exploitation and exploration capabilities. It benefits from a new searching strategy: the worst particle in the swarm is targeted and its position is updated with Levy flight at each iteration. The effectiveness of the proposed method was tested with three types of remote-sensing imagery (Landsat Thematic Mapper (TM), Flightline C1 (FLC), and QuickBird) that are distinct in terms of spatial and spectral resolution and landscape. Our results showed that ULPSO is able to achieve significantly better and more stable classification results than k-means and the other two intelligent methods based on genetic algorithm (GA) and particle swarm optimization (PSO) over all of the experiments. ULPSO is, therefore, recommended as an effective alternative for unsupervised remote-sensing image classification.  相似文献   

12.
Clustering is an important and popular technique in data mining. It partitions a set of objects in such a manner that objects in the same clusters are more similar to each another than objects in the different cluster according to certain predefined criteria. K-means is simple yet an efficient method used in data clustering. However, K-means has a tendency to converge to local optima and depends on initial value of cluster centers. In the past, many heuristic algorithms have been introduced to overcome this local optima problem. Nevertheless, these algorithms too suffer several short-comings. In this paper, we present an efficient hybrid evolutionary data clustering algorithm referred to as K-MCI, whereby, we combine K-means with modified cohort intelligence. Our proposed algorithm is tested on several standard data sets from UCI Machine Learning Repository and its performance is compared with other well-known algorithms such as K-means, K-means++, cohort intelligence (CI), modified cohort intelligence (MCI), genetic algorithm (GA), simulated annealing (SA), tabu search (TS), ant colony optimization (ACO), honey bee mating optimization (HBMO) and particle swarm optimization (PSO). The simulation results are very promising in the terms of quality of solution and convergence speed of algorithm.  相似文献   

13.
膜计算(也称为P系统或膜系统)是一种新颖的分布式、并行计算模型.为了处理数据聚类问题,提出了一种采用混合进化机制的膜聚类算法.它使用了一个由3个细胞组成的组织P系统,为一个待聚类的数据集发现最优的簇中心.其对象表示候选的簇中心,并且这3个细胞分别使用了3种不同的进化机制:遗传算子、速度-位移模型和差分进化机制.然而,所使用的速度-位移模型和差分进化机制是结合了这个特殊膜结构和转运机制所提出的改进版本.这种混合进化机制能够增强系统中对象的多样性和改善收敛性能.在混合进化机制和转运机制控制下,这种膜聚类算法能够确定一个数据集的良好划分.所提出的膜聚类算法在3个人工数据集和5个真实数据集上被评估,并与k-means和几种进化聚类算法进行比较.统计显著性测试建立了所提出的膜聚类算法的优势.  相似文献   

14.
In this paper, we propose a new parallel clustering algorithm, named Parallel Bisecting k-means with Prediction (PBKP), for message-passing multiprocessor systems. Bisecting k-means tends to produce clusters of similar sizes, and according to our experiments, it produces clusters with smaller entropy (i.e., purer clusters) than k-means does. Our PBKP algorithm fully exploits the data-parallelism of the bisecting k-means algorithm, and adopts a prediction step to balance the workloads of multiple processors to achieve a high speedup. We implemented PBKP on a cluster of Linux workstations and analyzed its performance. Our experimental results show that the speedup of PBKP is linear with the number of processors and the number of data points. Moreover, PBKP scales up better than the parallel k-means with respect to the dimension and the desired number of clusters. This research was supported in part by AFRL/Wright Brothers Institute (WBI).  相似文献   

15.
This paper presents selective regeneration particle swarm optimization (SRPSO), a novel algorithm developed based on particle swarm optimization (PSO). It contains two new features, unbalanced parameter setting and particle regeneration operation. The unbalanced parameter setting enables fast convergence of the algorithm and the particle regeneration operation allows the search to escape from local optima and explore for better solutions. This algorithm is applied to data clustering problems for performance evaluation and a hybrid algorithm (KSRPSO) of K-means clustering method and SRPSO is developed. In the conducted numerical experiments, SRPSO and KSRPSO are compared to the original PSO algorithm, K-means, as well as, other methods proposed by other studies. The results demonstrate that SRPSO and KSRPSO are efficient, accurate, and robust methods for data clustering problems.  相似文献   

16.
Swarm-inspired optimization has become very popular in recent years. Particle swarm optimization (PSO) and Ant colony optimization (ACO) algorithms have attracted the interest of researchers due to their simplicity, effectiveness and efficiency in solving complex optimization problems. Both ACO and PSO were successfully applied for solving the traveling salesman problem (TSP). Performance of the conventional PSO algorithm for small problems with moderate dimensions and search space is very satisfactory. As the search, space gets more complex, conventional approaches tend to offer poor solutions. This paper presents a novel approach by introducing a PSO, which is modified by the ACO algorithm to improve the performance. The new hybrid method (PSO–ACO) is validated using the TSP benchmarks and the empirical results considering the completion time and the best length, illustrate that the proposed method is efficient.  相似文献   

17.
The fuzzy c-partition entropy approach for threshold selection is an effective approach for image segmentation. The approach models the image with a fuzzy c-partition, which is obtained using parameterized membership functions. The ideal threshold is determined by searching an optimal parameter combination of the membership functions such that the entropy of the fuzzy c-partition is maximized. It involves large computation when the number of parameters needed to determine the membership function increases. In this paper, a recursive algorithm is proposed for fuzzy 2-partition entropy method, where the membership function is selected as S-function and Z-function with three parameters. The proposed recursive algorithm eliminates many repeated computations, thereby reducing the computation complexity significantly. The proposed method is tested using several real images, and its processing time is compared with those of basic exhaustive algorithm, genetic algorithm (GA), particle swarm optimization (PSO), ant colony optimization (ACO) and simulated annealing (SA). Experimental results show that the proposed method is more effective than basic exhaustive search algorithm, GA, PSO, ACO and SA.  相似文献   

18.
Semi-supervised model-based document clustering: A comparative study   总被引:4,自引:0,他引:4  
Semi-supervised learning has become an attractive methodology for improving classification models and is often viewed as using unlabeled data to aid supervised learning. However, it can also be viewed as using labeled data to help clustering, namely, semi-supervised clustering. Viewing semi-supervised learning from a clustering angle is useful in practical situations when the set of labels available in labeled data are not complete, i.e., unlabeled data contain new classes that are not present in labeled data. This paper analyzes several multinomial model-based semi-supervised document clustering methods under a principled model-based clustering framework. The framework naturally leads to a deterministic annealing extension of existing semi-supervised clustering approaches. We compare three (slightly) different semi-supervised approaches for clustering documents: Seeded damnl, Constrained damnl, and Feedback-based damnl, where damnl stands for multinomial model-based deterministic annealing algorithm. The first two are extensions of the seeded k-means and constrained k-means algorithms studied by Basu et al. (2002); the last one is motivated by Cohn et al. (2003). Through empirical experiments on text datasets, we show that: (a) deterministic annealing can often significantly improve the performance of semi-supervised clustering; (b) the constrained approach is the best when available labels are complete whereas the feedback-based approach excels when available labels are incomplete. Editor: Andrew Moore  相似文献   

19.
朱二周  孙悦  张远翔  高新  马汝辉  李学俊 《软件学报》2021,32(10):3085-3103
聚类分析是统计学、模式识别和机器学习等领域的研究热点.通过有效的聚类分析,数据集的内在结构与特征可以被很好地发掘出来.然而,无监督学习的特性使得当前已有的聚类方法依旧面临着聚类效果不稳定、无法对多种结构的数据集进行正确聚类等问题.针对这些问题,首先将K-means算法和层次聚类算法的聚类思想相结合,提出了一种混合聚类算法K-means-AHC;其次,采用拐点检测的思想,提出了一个基于平均综合度的新聚类有效性指标DAS(平均综合度之差,difference of average synthesis degree),以此来评估K-means-AHC算法聚类结果的质量;最后,将K-means-AHC算法和DAS指标相结合,设计了一种寻找数据集最佳类簇数和最优划分的有效方法.实验将K-means-AHC算法用于测试多种结构的数据集,结果表明:该算法在不过多增加时间开销的同时,提高了聚类分析的准确性.与此同时,新的DAS指标在聚类结果的评价上要优于当前已有的常用聚类有效性指标.  相似文献   

20.
In order to overcome the premature convergence in particle swarm optimization (PSO), we introduce dynamical crossover, a crossover operator with variable lengths and positions, to PSO, which is briefly denoted as CPSO. To get rid of the drawbacks of only finding the convex clusters and being sensitive to the initial points in $k$ -means algorithm, a hybrid clustering algorithm based on CPSO is proposed. The difference between the work and the existing ones lies in that CPSO is firstly introduced into $k$ -means. Experimental results performing on several data sets illustrate that the proposed clustering algorithm can get completely rid of the shortcomings of $k$ -means algorithms, and acquire correct clustering results. The application in image segmentation illustrates that the proposed algorithm gains good performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号