首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
蚁群算法作为一种新型的优化方法,具有很强的适应性和鲁棒性。基于蚁群算法的聚类方法已经在当前数据挖掘研究中得到应用。文章提出了一个新颖策略来解决无人监督的数据聚类问题,利用信息素控制蚂蚁随机移动提高算法效率,采用运动速度各异的多个蚂蚁独立并行进行聚类来提高聚类质量。实验结果表明该方法是有效的。  相似文献   

2.
This paper presents a multi-ant colonies approach for clustering data that consists of some parallel and independent ant colonies and a queen ant agent. Each ant colony process takes different types of ants moving speed and different versions of the probability conversion function to generate various clustering results with an ant-based clustering algorithm. These results are sent to the queen ant agent and combined by a hypergraph model to calculate a new similarity matrix. The new similarity matrix is returned back to each ant colony process to re-cluster the data using the new information. Experimental evaluation shows that the average performance of the aggregated multi-ant colonies algorithms outperforms that of the single ant-based clustering algorithm and the popular K-means algorithm. The result also shows that the lowest outliers strategy for selecting the current data set has the best performance quality.  相似文献   

3.
A novel ant-based clustering algorithm using the kernel method   总被引:1,自引:0,他引:1  
A novel ant-based clustering algorithm integrated with the kernel (ACK) method is proposed. There are two aspects to the integration. First, kernel principal component analysis (KPCA) is applied to modify the random projection of objects when the algorithm is run initially. This projection can create rough clusters and improve the algorithm’s efficiency. Second, ant-based clustering is performed in the feature space rather than in the input space. The distance between the objects in the feature space, which is calculated by the kernel function of the object vectors in the input space, is applied as a similarity measure. The algorithm uses an ant movement model in which each object is viewed as an ant. The ant determines its movement according to the fitness of its local neighbourhood. The proposed algorithm incorporates the merits of kernel-based clustering into ant-based clustering. Comparisons with other classic algorithms using several synthetic and real datasets demonstrate that ACK method exhibits high performance in terms of efficiency and clustering quality.  相似文献   

4.
Aggregation pheromone density based data clustering   总被引:1,自引:0,他引:1  
Ants, bees and other social insects deposit pheromone (a type of chemical) in order to communicate between the members of their community. Pheromone, that causes clumping or clustering behavior in a species and brings individuals into a closer proximity, is called aggregation pheromone. This article presents a new algorithm (called, APC) for clustering data sets based on this property of aggregation pheromone found in ants. An ant is placed at each location of a data point, and the ants are allowed to move in the search space to find points with higher pheromone density. The movement of an ant is governed by the amount of pheromone deposited at different points of the search space. More the deposited pheromone, more is the aggregation of ants. This leads to the formation of homogenous groups of data. The proposed algorithm is evaluated on a number of well-known benchmark data sets using different cluster validity measures. Results are compared with those obtained using two popular standard clustering techniques namely average linkage agglomerative and k-means clustering algorithm and with an ant-based method called adaptive time-dependent transporter ants for clustering (ATTA-C). Experimental results justify the potentiality of the proposed APC algorithm both in terms of the solution (clustering) quality as well as execution time compared to other algorithms for a large number of data sets.  相似文献   

5.
蚁群算法中参数在不同取值情况下,常常会对算法的性能和求解效率产生重大影响。该文在基于蚁群聚类组合方法的研究基础上,重点研究了蚁群聚类组合方法KMAOC算法中蚁群算法参数蚂蚁数m对KMAOC算法性能的影响,对KMAOC算法中的参数蚂蚁数m分别取值进行实验,通过几组实验验证提供了KMAOC算法中参数蚂蚁数m配置的较好建议。  相似文献   

6.
Person name queries often bring up web pages that correspond to individuals sharing the same name. The Web People Search (WePS) task consists of organizing search results for ambiguous person name queries into meaningful clusters, with each cluster referring to one individual. This paper presents a fuzzy ant based clustering approach for this multi-document person name disambiguation problem. The main advantage of fuzzy ant based clustering, a technique inspired by the behavior of ants clustering dead nestmates into piles, is that no specification of the number of output clusters is required. This makes the algorithm very well suited for the Web Person Disambiguation task, where we do not know in advance how many individuals each person name refers to. We compare our results with state-of-the-art partitional and hierarchical clustering approaches (k-means and Agnes) and demonstrate favorable results. This is particularly interesting as the latter involve manual setting of a similarity threshold, or estimating the number of clusters in advance, while the fuzzy ant based clustering algorithm does not.  相似文献   

7.
Ant-based clustering is a type of clustering algorithm that imitates the behavior of ants. To improve the efficiency, increase the adaptability to non-Gaussian datasets and simplify the parameters of the algorithm, a novel ant-based clustering algorithm using Renyi Entropy (NAC-RE) is proposed. There are two aspects to application of Renyi entropy. Firstly, Kernel Entropy Component Analysis (KECA) is applied to modify the random projection of objects when the algorithm is run initially. This projection can create rough clusters and improve the algorithm's efficiency. Secondly, a novel ant movement model governed by Renyi entropy is proposed. The model takes each object as an ant. When the object (ant) moves to a new region, the Renyi entropy in its local neighborhood will be changed. The differential value of entropy governs whether the object should move or be moveless. The new model avoids complex parameters that have influence on the clustering results. The theoretical analysis has been conducted by kernel method to show that Renyi entropy metric is feasible and superior to distance metric. The novel algorithm was compared with other classic ones by several well-known benchmark datasets. The Friedman test with the corresponding Nemenyi test are applied to compare and conclude the algorithms’ performance The results indicate that NAC-RE can get better results for non-linearly separable datasets while its parameters are simple.  相似文献   

8.
Evolving clusters in gene-expression data   总被引:1,自引:0,他引:1  
Clustering is a useful exploratory tool for gene-expression data. Although successful applications of clustering techniques have been reported in the literature, there is no method of choice in the gene-expression analysis community. Moreover, there are only a few works that deal with the problem of automatically estimating the number of clusters in bioinformatics datasets. Most clustering methods require the number k of clusters to be either specified in advance or selected a posteriori from a set of clustering solutions over a range of k. In both cases, the user has to select the number of clusters. This paper proposes improvements to a clustering genetic algorithm that is capable of automatically discovering an optimal number of clusters and its corresponding optimal partition based upon numeric criteria. The proposed improvements are mainly designed to enhance the efficiency of the original clustering genetic algorithm, resulting in two new clustering genetic algorithms and an evolutionary algorithm for clustering (EAC). The original clustering genetic algorithm and its modified versions are evaluated in several runs using six gene-expression datasets in which the right clusters are known a priori. The results illustrate that all the proposed algorithms perform well in gene-expression data, although statistical comparisons in terms of the computational efficiency of each algorithm point out that EAC outperforms the others. Statistical evidence also shows that EAC is able to outperform a traditional method based on multiple runs of k-means over a range of k.  相似文献   

9.
蚁群聚类组合方法的研究   总被引:2,自引:0,他引:2       下载免费PDF全文
基于蚁群算法的聚类算法已经在当前的数据挖掘研究中得到应用。针对蚁群聚类算法早期出现的缺点,提出一种蚁群聚类组合方法使其得以改进。改进思路是引入K-means作为蚁群算法的预处理过程。通过K-means快速、粗略地确定聚类中心,利用K-means方法的结果作为初值,再进行蚁群算法聚类。有效地解决了蚁群算法早期收敛过慢等问题。  相似文献   

10.
Almost all subspace clustering algorithms proposed so far are designed for numeric datasets. In this paper, we present a k-means type clustering algorithm that finds clusters in data subspaces in mixed numeric and categorical datasets. In this method, we compute attributes contribution to different clusters. We propose a new cost function for a k-means type algorithm. One of the advantages of this algorithm is its complexity which is linear with respect to the number of the data points. This algorithm is also useful in describing the cluster formation in terms of attributes contribution to different clusters. The algorithm is tested on various synthetic and real datasets to show its effectiveness. The clustering results are explained by using attributes weights in the clusters. The clustering results are also compared with published results.  相似文献   

11.
Algorithms for clustering Web search results have to be efficient and robust. Furthermore they must be able to cluster a data set without using any kind of a priori information, such as the required number of clusters. Clustering algorithms inspired by the behavior of real ants generally meet these requirements. In this article we propose a novel approach to ant‐based clustering, based on fuzzy logic. We show that it improves existing approaches and illustrates how our algorithm can be applied to the problem of Web search results clustering. © 2007 Wiley Periodicals, Inc. Int J Int Syst 22: 455–474, 2007.  相似文献   

12.
In cluster analysis, one of the most challenging and difficult problems is the determination of the number of clusters in a data set, which is a basic input parameter for most clustering algorithms. To solve this problem, many algorithms have been proposed for either numerical or categorical data sets. However, these algorithms are not very effective for a mixed data set containing both numerical attributes and categorical attributes. To overcome this deficiency, a generalized mechanism is presented in this paper by integrating Rényi entropy and complement entropy together. The mechanism is able to uniformly characterize within-cluster entropy and between-cluster entropy and to identify the worst cluster in a mixed data set. In order to evaluate the clustering results for mixed data, an effective cluster validity index is also defined in this paper. Furthermore, by introducing a new dissimilarity measure into the k-prototypes algorithm, we develop an algorithm to determine the number of clusters in a mixed data set. The performance of the algorithm has been studied on several synthetic and real world data sets. The comparisons with other clustering algorithms show that the proposed algorithm is more effective in detecting the optimal number of clusters and generates better clustering results.  相似文献   

13.
Clustering is a powerful machine learning technique that groups “similar” data points based on their characteristics. Many clustering algorithms work by approximating the minimization of an objective function, namely the sum of within-the-cluster distances between points. The straightforward approach involves examining all the possible assignments of points to each of the clusters. This approach guarantees the solution will be a global minimum; however, the number of possible assignments scales quickly with the number of data points and becomes computationally intractable even for very small datasets. In order to circumvent this issue, cost function minima are found using popular local search-based heuristic approaches such as k-means and hierarchical clustering. Due to their greedy nature, such techniques do not guarantee that a global minimum will be found and can lead to sub-optimal clustering assignments. Other classes of global search-based techniques, such as simulated annealing, tabu search, and genetic algorithms, may offer better quality results but can be too time-consuming to implement. In this work, we describe how quantum annealing can be used to carry out clustering. We map the clustering objective to a quadratic binary optimization problem and discuss two clustering algorithms which are then implemented on commercially available quantum annealing hardware, as well as on a purely classical solver “qbsolv.” The first algorithm assigns N data points to K clusters, and the second one can be used to perform binary clustering in a hierarchical manner. We present our results in the form of benchmarks against well-known k-means clustering and discuss the advantages and disadvantages of the proposed techniques.  相似文献   

14.
In clustering algorithms, it is usually assumed that the number of clusters is known or given. In the absence of such a priori information, a procedure is needed to find an appropriate number of clusters. This paper presents a clustering algorithm that incorporates a mechanism for finding the appropriate number of clusters as well as the locations of cluster prototypes. This algorithm, called multi-scale clustering, is based on scale-space theory by considering that any prominent data structure ought to survive over many scales. The number of clusters as well as the locations of cluster prototypes are found in an objective manner by defining and using lifetime and drift speed clustering criteria. The outcome of this algorithm does not depend on the initial prototype locations that affect the outcome of many clustering algorithms. As an application of this algorithm, it is used to enhance the Hough transform technique.  相似文献   

15.
提出了一种改进的基于对称点距离的蚂蚁聚类算法。该算法不再采用Euclidean距离来计算类内对象的相似性,而是使用新的对称点距离来计算相似性,在处理带有对称性质的数据集时,可以有效地识别给定数据集的聚类数目和合适的划分。在该算法中,用人工蚂蚁代表数据对象,根据算法给定的聚类规则来寻找最合适的聚类划分。最后用本算法与标准的蚂蚁聚类算法分别对不同的数据集进行了聚类实验。实验结果证实了算法的有效性。  相似文献   

16.
基于信息熵的蚁群聚类算法在客户细分中的应用①   总被引:1,自引:0,他引:1  
传统的蚁群聚类算法需设置较多参数,且聚类时间较长。基于信息熵的蚁群聚类算法通过信息熵改变蚂蚁拾起和放下数据的规则,减少了参数的设置、缩短了聚类的时间,将其应用于客户细分,并且与采用传统的蚁群聚类算法得到的细分结果进行比较分析,实验表明。基于信息熵的蚁群聚类算法可以加快客户细分的聚类进程。  相似文献   

17.
在现有的自适应蚂蚁聚类算法中,自适应参数的调整往往凭经验取值,从而影响聚类质量。针对该问题,提出一种利用快速模拟退火算法实现蚂蚁聚类自适应参数动态调整的改进方法。基于该算法构建的入侵检测系统无需预先指定簇的数目,也不要求满足正常行为的数目远大于入侵行为的数目等条件。对KDD CUP1999数据集的仿真实验结果表明,该算法可以得到较理想的聚类,对未知入侵有较好的检测效果。  相似文献   

18.
传统的蚁群聚类算法需设置较多参数,且聚类时间较长。基于信息熵的蚁群聚类算法通过信息熵改变蚂蚁拾起和放下数据的规则,减少了参数的设置、缩短了聚类的时间,将其应用于客户细分,并且与采用传统的蚁群聚类算法得到的细分结果进行比较分析,实验表明。基于信息熵的蚁群聚类算法可以加快客户细分的聚类进程。  相似文献   

19.
Clustering algorithms can be optimized using nature‐inspired techniques. Many algorithms inspired by nature, namely, firefly algorithm, ant colony optimization algorithm, and so forth, have improved clustering results. k‐means is a popular clustering technique but has limitations of local optima, which have been overcome using its various hybrids. k‐means++ is a hybrid k‐means clustering algorithm that gives the procedure to initialize centre of the clusters. In the proposed work, hybrids of nature‐inspired techniques using cuckoo and krill herd algorithm are implemented on k‐means++ algorithm to enhance cluster quality and generate optimized clusters. The designed algorithms are implemented, and the results are compared with their counterparts. Performance parameters such as accuracy, f‐measure, error rate, standard deviation, CPU time, cluster quality check, and so forth are used to measure the clustering capabilities of these algorithms. The results indicate the high performance of newly designed algorithms.  相似文献   

20.
高维数据流的自适应子空间聚类算法   总被引:1,自引:0,他引:1       下载免费PDF全文
高维数据流聚类是数据挖掘领域中的研究热点。由于数据流具有数据量大、快速变化、高维性等特点,许多聚类算法不能取得较好的聚类质量。提出了高维数据流的自适应子空间聚类算法SAStream。该算法改进了HPStream中的微簇结构并定义了候选簇,只在相应的子空间内计算新来数据点到候选簇质心的距离,减少了聚类时被检查微簇的数目,将形成的微簇存储在金字塔时间框架中,使用时间衰减函数删除过期的微簇;当数据流量大时,根据监测的系统资源使用情况自动调整界限半径和簇选择因子,从而调节聚类的粒度。实验结果表明,该算法具有良好的聚类质量和快速的数据处理能力。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号