首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recent years have witnessed a growing interest in the information bottleneck theory. Among the relevant algorithms in the extant literature, the sequential Information Bottleneck (sIB) algorithm is recognized for its balance between accuracy and complexity. However, like many other optimization techniques, it still suffers from the problem of getting easily trapped in local optima. To that end, our study proposed an iterative sIB algorithm (isIB) based on mutation for the clustering problem. From initial solution vectors of cluster labels generated by a seeding the sIB algorithm, our algorithm randomly selects a subset of elements and mutates the cluster labels according to the optimal mutation rate. The results are iteratively optimized further using genetic algorithms. Finally, the experimental results on the benchmark data sets validate the advantage of our iterative sIB algorithm over the sIB algorithm in terms of both accuracy and efficiency.  相似文献   

2.
This paper proposes an improved latent semantic analysis (LSA) model to represent textual document and takes advantage of a fuzzy logic based genetic algorithm (FLGA) for clustering. The standard genetic algorithm (GA) in conventional vector space model is rather difficult to deal with because the high dimensional encoding of GA makes it explore the optimal solution in a complicated space which is prone to cause an overflow problem. The LSA-based corpus model not only reduces the dimensions drastically, but also creates an underlying semantic structure which enhances its ability of distinguishing documents in terms of concepts and indirectly improves the ability of GA for clustering (genetic clustering). A novel FLGA is proposed in conjunction with this semantic model in this study. According to the nature of biological evolution, several fuzzy controllers are given to adaptively adjust and optimize the behaviors of the GA which can effectively prevent the premature convergence to a suboptimum solution. The experiment results show that the fuzzy logic controllers enhance the ability of the GA to explore the global optimum solution, and the utilization of the LSA-based text representation method to FLGA further improves its clustering performance.  相似文献   

3.
Clustering with a genetically optimized approach   总被引:5,自引:0,他引:5  
Describes a genetically guided approach to optimizing the hard (J 1) and fuzzy (Jm) c-means functionals used in cluster analysis. Our experiments show that a genetic algorithm (GA) can ameliorate the difficulty of choosing an initialization for the c-means clustering algorithms. Experiments use six data sets, including the Iris data, magnetic resonance, and color images. The genetic algorithm approach is generally able to find the lowest known Jm value or a Jm associated with a partition very similar to that associated with the lowest Jm value. On data sets with several local extrema, the GA approach always avoids the less desirable solutions. Degenerate partitions are always avoided by the GA approach, which provides an effective method for optimizing clustering models whose objective function can be represented in terms of cluster centers. A series random initializations of fuzzy/hard c-means, where the partition associated with the lowest Jm value is chosen, can produce an equivalent solution to the genetic guided clustering approach given the same amount of processor time in some domains  相似文献   

4.
This paper develops theory and algorithms concerning a new metric for clustering data. The metric minimizes the total volume of clusters, where the volume of a cluster is defined as the volume of the minimum volume ellipsoid (MVE) enclosing all data points in the cluster. This metric is scale-invariant, that is, the optimal clusters are invariant under an affine transformation of the data space. We introduce the concept of outliers in the new metric and show that the proposed method of treating outliers asymptotically recovers the data distribution when the data comes from a single multivariate Gaussian distribution. Two heuristic algorithms are presented that attempt to optimize the new metric. On a series of empirical studies with Gaussian distributed simulated data, we show that volume-based clustering outperforms well-known clustering methods such as k-means, Ward's method, SOM, and model-based clustering.  相似文献   

5.
Partitional clustering is a common approach to cluster analysis. Although many algorithms have been proposed, partitional clustering remains a challenging problem with respect to the reliability and efficiency of recovering high quality solutions in terms of its criterion functions. In this paper, we propose a niching genetic k-means algorithm (NGKA) for partitional clustering, which aims at reliably and efficiently identifying high quality solutions in terms of the sum of squared errors criterion. Within the NGKA, we design a niching method, which encourages mating among similar clustering solutions while allowing for some competitions among dissimilar solutions, and integrate it into a genetic algorithm to prevent premature convergence during the evolutionary clustering search. Further, we incorporate one step of k-means operation into the regeneration steps of the resulted niching genetic algorithm to improve its computational efficiency. The proposed algorithm was applied to cluster both simulated data and gene expression data and compared with previous work. Experimental results clear show that the NGKA is an effective clustering algorithm and outperforms two other genetic algorithm based clustering methods implemented for comparison.  相似文献   

6.
With the development of the World Wide Web, document clustering is receiving more and more attention as an important and fundamental technique for unsupervised document organization, automatic topic extraction, and fast information retrieval or filtering. A good document clustering approach can assist computers in organizing the document corpus automatically into a meaningful cluster hierarchy for efficient browsing and navigation, which is very valuable for complementing the deficiencies of traditional information retrieval technologies. In this paper, we study the performance of different density-based criterion functions, which can be classified as internal, external or hybrid, in the context of partitional clustering of document datasets. In our study, a weight was assigned to each document, which defined its relative position in the entire collection. To show the efficiency of the proposed approach, the weighted methods were compared to their unweighted variants. To verify the robustness of the proposed approach, experiments were conducted on datasets with a wide variety of numbers of clusters, documents and terms. To evaluate the criterion functions, we used the WebKb, Reuters-21578, 20Newsgroups-18828, WebACE and TREC-5 datasets, as they are currently the most widely used benchmarks in document clustering research. To evaluate the quality of a clustering solution, a wide spectrum of indices, three internal validity indices and seven external validity indices, were used. The internal validity indices were used for evaluating the within-cluster scatter and between cluster separations. The external validity indices were used for comparing the clustering solutions produced by the proposed criterion functions with the “ground truth” results. Experiments showed that our approach significantly improves clustering quality. In this paper, we developed a modified differential evolution (DE) algorithm to optimize the criterion functions. This modification accelerates the convergence of DE and, unlike the basic DE algorithm, guarantees that the received solution will be feasible.  相似文献   

7.
In this research, a data clustering algorithm named as non-dominated sorting genetic algorithm-fuzzy membership chromosome (NSGA-FMC) based on K-modes method which combines fuzzy genetic algorithm and multi-objective optimization was proposed to improve the clustering quality on categorical data. The proposed method uses fuzzy membership value as chromosome. In addition, due to this innovative chromosome setting, a more efficient solution selection technique which selects a solution from non-dominated Pareto front based on the largest fuzzy membership is integrated in the proposed algorithm. The multiple objective functions: fuzzy compactness within a cluster (π) and separation among clusters (sep) are used to optimize the clustering quality. A series of experiments by using three UCI categorical datasets were conducted to compare the clustering results of the proposed NSGA-FMC with two existing methods: genetic algorithm fuzzy K-modes (GA-FKM) and multi-objective genetic algorithm-based fuzzy clustering of categorical attributes (MOGA (π, sep)). Adjusted Rand index (ARI), π, sep, and computation time were used as performance indexes for comparison. The experimental result showed that the proposed method can obtain better clustering quality in terms of ARI, π, and sep simultaneously with shorter computation time.  相似文献   

8.
Categorical data clustering is a difficult and challenging task due to the special characteristic of categorical attributes: no natural order. Thus, this study aims to propose a two-stage method named partition-and-merge based fuzzy genetic clustering algorithm (PM-FGCA) for categorical data. The proposed PM-FGCA uses a fuzzy genetic clustering algorithm to partition the dataset into a maximum number of clusters in the first stage. Then, the merge stage is designed to select two clusters among the clusters that generated in the first stage based on its inter-cluster distances and merge two selected clusters to one cluster. This procedure is repeated until the number of clusters equals to the predetermined number of clusters. Thereafter, some particular instances in each cluster are considered to be re-assigned to other clusters based on the intra-cluster distances. The proposed PM-FGCA is implemented on ten categorical datasets from UCI machine learning repository. In order to evaluate the clustering performance, the proposed PM-FGCA is compared with some existing methods such as k-modes algorithm, fuzzy k-modes algorithm, genetic fuzzy k-modes algorithm, and non-dominated sorting genetic algorithm using fuzzy membership chromosomes. Adjusted Ranked Index (ARI), Normalized Mutual Information (NMI), and Davies–Bouldin (DB) index are selected as three clustering validation indices which are represented to both external index (i.e., ARI and NMI) and internal index (i.e., DB). Consequently, the experimental result shows that the proposed PM-FGCA outperforms the benchmark methods in terms of the tested indices.  相似文献   

9.
Fuzzy c-means (FCM) algorithm is an important clustering method in pattern recognition, while the fuzziness parameter, m, in FCM algorithm is a key parameter that can significantly affect the result of clustering. Cluster validity index (CVI) is a kind of criterion function to validate the clustering results, thereby determining the optimal cluster number of a data set. From the perspective of cluster validation, we propose a novel method to select the optimal value of m in FCM, and four well-known CVIs, namely XB, VK, VT, and SC, for fuzzy clustering are used. In this method, the optimal value of m is determined when CVIs reach their minimum values. Experimental results on four synthetic data sets and four real data sets have demonstrated that the range of m is [2, 3.5] and the optimal interval is [2.5, 3].  相似文献   

10.
Clustering a large volume of data in a distributed environment is a challenging issue. Data stored across multiple machines are huge in size, and solution space is large. Genetic algorithm deals effectively with larger solution space and provides better solution. In this paper, we proposed a novel clustering algorithm for distributed datasets, using combination of genetic algorithm (GA) with Mahalanobis distance and k-means clustering algorithm. The proposed algorithm is two phased; in phase 1, GA is applied in parallel on data chunks located across different machines. Mahalanobis distance is used as fitness value in GA, which considers covariance between the data points and thus provides a better representation of initial data. K-means with K-means\( ++ \) initialization is applied in phase 2 on intermediate output to get final result. The proposed algorithm is implemented on Hadoop framework, which is inherently designed to deal with distributed datasets in a fault-tolerant manner. Extensive experiments were conducted for multiple real-life and synthetic datasets to measure performance of our proposed algorithm. Results were compared with MapReduce-based algorithms, mrk-means, parallel k-means and scaling GA.  相似文献   

11.
遗传聚类算法往往需要较大的种群规模才能得到最优解,导致收敛速度慢,针对这一问题,本文提出一种基于自组织映射的超启发遗传聚类算法。首先利用自组织映射把数据空间转换到特征空间,再在特征空间里利用遗传算法进行搜索,然后进行反映射,即把聚类结果在数据空间里表现,从而得到一组解,同时利用K-means算法在数据空间里进行粗聚类,获得另一组解,最后比较2组解的聚类结果,相同的样本保留,不同的再次聚类,进而有效地保证了最优解的获得。计算机仿真实验验证了所提算法在种群规模较小的情况下,可以获得较高的准确率。   相似文献   

12.
The clustering phenomenon of defects usually occurs in semiconductor manufacturing. However, previous studies did not pay much attention to the influence of clustering phenomenon for estimating fraction nonconforming of a wafer. Thus, this paper presents a systematic estimation model with considering relevant variables about clustering defects for fraction nonconforming of a wafer. The method combines back-propagation neural network (BPNN) with genetic algorithm (GA) to obtain an estimation model. In this study, GA aims to optimize the parameters of BPNN. Five relevant variables: number of defects (ND), squared coefficient of angle variation (SCVA) for defects, squared coefficient of distance variation (SCVD) for defects, defect cluster index (CIM), and the number of cluster groups (NCG) for defects by self-organized map (SOM) are utilized as inputs for GA–BPNN. Finally, a simulation case and a real-world case are used to confirm the effectiveness of proposed method.  相似文献   

13.
In this work, a novel multi-phase modified shuffled frog leaping algorithm (MPMSFLA) framework is presented to solve the multi-depot vehicle routing problem (MDVRP) more quickly. The presented algorithm adopts the K-means algorithm to execute the clustering analyses for all customers, generates a frog population according to the result of the clustering analyses, and then proceeds to the three-phase process. In the first phase, a cluster MSFLA local search is carried out for each cluster. In the second phase, the algorithm selects good individuals through a binary tournament to construct a new population and then performs a global optimization for all customers and depots using the global MSFLA. In the third phase, a cluster adjustment is implemented for the population to generate new clusters. These procedures continue until the convergence criterion is satisfied. The experimental results show that our algorithm can achieve a high quality solution within a short runtime for the MDVRP, the MDVRP with time windows (MDVRPTW) and the capacitated vehicle routing problem (CVRP). The proposed algorithm is suitable for solving large-scale problems.  相似文献   

14.
In this paper, we develop a genetic algorithm method based on a latent semantic model (GAL) for text clustering. The main difficulty in the application of genetic algorithms (GAs) for document clustering is thousands or even tens of thousands of dimensions in feature space which is typical for textual data. Because the most straightforward and popular approach represents texts with the vector space model (VSM), that is, each unique term in the vocabulary represents one dimension. Latent semantic indexing (LSI) is a successful technology in information retrieval which attempts to explore the latent semantics implied by a query or a document through representing them in a dimension-reduced space. Meanwhile, LSI takes into account the effects of synonymy and polysemy, which constructs a semantic structure in textual data. GA belongs to search techniques that can efficiently evolve the optimal solution in the reduced space. We propose a variable string length genetic algorithm which has been exploited for automatically evolving the proper number of clusters as well as providing near optimal data set clustering. GA can be used in conjunction with the reduced latent semantic structure and improve clustering efficiency and accuracy. The superiority of GAL approach over conventional GA applied in VSM model is demonstrated by providing good Reuter document clustering results.  相似文献   

15.
The vehicle routing problem (VRP) plays a central role in the optimization of distribution networks. Since some classical instances with 75 nodes resist the best exact solution methods, most researchers concentrate on metaheuristics for solving real-life problems. Contrary to the VRP with time windows, no genetic algorithm (GA) can compete with the powerful tabu search (TS) methods designed for the VRP. This paper bridges the gap by presenting a relatively simple but effective hybrid GA. In terms of average solution cost, this algorithm outperforms most published TS heuristics on the 14 classical Christofides instances and becomes the best solution method for the 20 large-scale instances generated by Golden et al.Scope and purposeThe framework of this research is the development of effective metaheuristics for hard combinatorial optimization problems met in vehicle routing. It is surprising to notice in the literature the absence of effective genetic algorithms (GA) for the vehicle routing problem (VRP, the main capacitated node routing problem), contrary to node routing problems with time windows or arc routing problems. Earlier attempts were based on chromosomes with trip delimiters and needed a repair procedure to get feasible children after each crossover. Such procedures are known to weaken the genetic transmission of information from parents to children. This paper proposes a GA without trip delimiters, hybridized with a local search procedure. At any time, a chromosome can be converted into an optimal VRP solution (subject to chromosome sequence), thanks to a special splitting procedure. This design choice avoids repair procedures and enables the use of classical crossovers like OX. The resulting algorithm is flexible, relatively simple, and very effective when applied to two sets of standard benchmark instances ranging from 50 to 483 customers.  相似文献   

16.
基于可变染色体长度的遗传K均值聚类算法   总被引:2,自引:2,他引:0  
针对传统K-均值聚类算法需要事先确定聚类数,以及对初始质心的选择具有敏感性,从而容易陷入局部极值点的缺点,使用了一种基于可变染色体编码长度的遗传算法对传统K-均值聚类进行改进.该算法可以在事先不确定K值的情况下,通过多次的选择、交叉.变异的遗传操作,最终得到最优的聚类数,以及最优的初始质心集.通过Reuters数据集的实验结果表明,基于该算法的聚类划分结果明显优于传统K-均值聚类算法,并且好过基于固定染色体编码长度遗传算法的K-均值聚类算法.  相似文献   

17.
Clustering divides data into meaningful or useful groups (clusters) without any prior knowledge. It is a key technique in data mining and has become an important issue in many fields. This article presents a new clustering algorithm based on the mechanism analysis of chaotic ant swarm (CAS). It is an optimization methodology for clustering problem which aims to obtain global optimal assignment by minimizing the objective function. The proposed algorithm combines three advantages into one: finding global optimal solution to the objective function, not sensitive to clusters with different size and density and suitable to multi-dimensional data sets. The quality of this approach is evaluated on several well-known benchmark data sets. Compared with the popular clustering method named k-means algorithm and the PSO-based clustering technique, experimental results show that our algorithm is an effective clustering technique and can be used to handle data sets with complex cluster sizes, densities and multiple dimensions.  相似文献   

18.
In this study, a novel approach via improved genetic algorithm (IGA)-based fuzzy observer is proposed to realise exponential optimal H synchronisation and secure communication in multiple time-delay chaotic (MTDC) systems. First, an original message is inserted into the MTDC system. Then, a neural-network (NN) model is employed to approximate the MTDC system. Next, a linear differential inclusion (LDI) state-space representation is established for the dynamics of the NN model. Based on this LDI state-space representation, this study proposes a delay-dependent exponential stability criterion derived in terms of Lyapunov's direct method, thus ensuring that the trajectories of the slave system approach those of the master system. Subsequently, the stability condition of this criterion is reformulated into a linear matrix inequality (LMI). Due to GA's random global optimisation search capabilities, the lower and upper bounds of the search space can be set so that the GA will seek better fuzzy observer feedback gains, accelerating feedback gain-based synchronisation via the LMI-based approach. IGA, which exhibits better performance than traditional GA, is used to synthesise a fuzzy observer to not only realise the exponential synchronisation, but also achieve optimal H performance by minimizing the disturbance attenuation level and recovering the transmitted message. Finally, a numerical example with simulations is given in order to demonstrate the effectiveness of our approach.  相似文献   

19.
遗传算法是一种模拟自然进化的优化搜索算法,它仅依靠适应度函数就可以搜索最优解.介绍了一种基于遗传算法的聚类分析方法,采用浮点数编码方式对聚类的中心进行编码,并用特征向量与相应聚类中心的欧氏距离的和来判断聚类划分的质量,通过选择、交叉和变异操作对聚类中心的编码进行优化,得到使聚类划分效果最好的聚类中心.实验结果显示,该方...  相似文献   

20.
In this paper the problem of automatic clustering a data set is posed as solving a multiobjective optimization (MOO) problem, optimizing a set of cluster validity indices simultaneously. The proposed multiobjective clustering technique utilizes a recently developed simulated annealing based multiobjective optimization method as the underlying optimization strategy. Here variable number of cluster centers is encoded in the string. The number of clusters present in different strings varies over a range. The points are assigned to different clusters based on the newly developed point symmetry based distance rather than the existing Euclidean distance. Two cluster validity indices, one based on the Euclidean distance, XB-index, and another recently developed point symmetry distance based cluster validity index, Sym-index, are optimized simultaneously in order to determine the appropriate number of clusters present in a data set. Thus the proposed clustering technique is able to detect both the proper number of clusters and the appropriate partitioning from data sets either having hyperspherical clusters or having point symmetric clusters. A new semi-supervised method is also proposed in the present paper to select a single solution from the final Pareto optimal front of the proposed multiobjective clustering technique. The efficacy of the proposed algorithm is shown for seven artificial data sets and six real-life data sets of varying complexities. Results are also compared with those obtained by another multiobjective clustering technique, MOCK, two single objective genetic algorithm based automatic clustering techniques, VGAPS clustering and GCUK clustering.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号