共查询到20条相似文献,搜索用时 15 毫秒
1.
Parallel algorithms on SIMD (single-instruction stream multiple-data stream) machines for hierarchical clustering and cluster validity computation are proposed. The machine model uses a parallel memory system and an alignment network to facilitate parallel access to both pattern matrix and proximity matrix. For a problem with N patterns, the number of memory accesses is reduced from O (N 3) on a sequential machine to O (N 2) on an SIMD machine with N PEs 相似文献
2.
Efficient parallel hierarchical clustering algorithms 总被引:3,自引:0,他引:3
Clustering of data has numerous applications and has been studied extensively. Though most of the algorithms in the literature are sequential, many parallel algorithms have also been designed. In this paper, we present parallel algorithms with better performance than known algorithms. We consider algorithms that work well in the worst case as well as algorithms with good expected performance. 相似文献
3.
Xiaolu Zhang 《International journal of systems science》2013,44(3):562-576
Recently, hesitant fuzzy sets (HFSs) have been studied by many researchers as a powerful tool to describe and deal with uncertain data, but relatively, very few studies focus on the clustering analysis of HFSs. In this paper, we propose a novel hesitant fuzzy agglomerative hierarchical clustering algorithm for HFSs. The algorithm considers each of the given HFSs as a unique cluster in the first stage, and then compares each pair of the HFSs by utilising the weighted Hamming distance or the weighted Euclidean distance. The two clusters with smaller distance are jointed. The procedure is then repeated time and again until the desirable number of clusters is achieved. Moreover, we extend the algorithm to cluster the interval-valued hesitant fuzzy sets, and finally illustrate the effectiveness of our clustering algorithms by experimental results. 相似文献
4.
Applying agglomerative hierarchical clustering algorithms to component identification for legacy systems 总被引:1,自引:0,他引:1
Jian Feng CuiHeung Seok Chae 《Information and Software Technology》2011,53(6):601-614
Context
Component identification, the process of evolving legacy system into finely organized component-based software systems, is a critical part of software reengineering. Currently, many component identification approaches have been developed based on agglomerative hierarchical clustering algorithms. However, there is a lack of thorough investigation on which algorithm is appropriate for component identification.Objective
This paper focuses on analyzing agglomerative hierarchical clustering algorithms in software reengineering, and then identifying their respective strengths and weaknesses in order to apply them effectively for future practical applications.Method
A series of experiments were conducted for 18 clustering strategies combined according to various similarity measures, weighting schemes and linkage methods. Eleven subject systems with different application domains and source code sizes were used in the experiments. The component identification results are evaluated by the proposed size, coupling and cohesion criteria.Results
The experimental results suggested that the employed similarity measures, weighting schemes and linkage methods can have various effects on component identification results with respect to the proposed size, coupling and cohesion criteria, so the hierarchical clustering algorithms produced quite different clustering results.Conclusions
According to the experimental results, it can be concluded that it is difficult to produce perfectly satisfactory results for a given clustering algorithm. Nevertheless, these algorithms demonstrated varied capabilities to identify components with respect to the proposed size, coupling and cohesion criteria. 相似文献5.
Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging 总被引:2,自引:0,他引:2
Cheng-Ru Lin Ming-Syan Chen 《Knowledge and Data Engineering, IEEE Transactions on》2005,17(2):145-159
Data clustering has attracted a lot of research attention in the field of computational statistics and data mining. In most related studies, the dissimilarity between two clusters is defined as the distance between their centroids or the distance between two closest (or farthest) data points However, all of these measures are vulnerable to outliers and removing the outliers precisely is yet another difficult task. In view of this, we propose a new similarity measure, referred to as cohesion, to measure the intercluster distances. By using this new measure of cohesion, we have designed a two-phase clustering algorithm, called cohesion-based self-merging (abbreviated as CSM), which runs in time linear to the size of input data set. Combining the features of partitional and hierarchical clustering methods, algorithm CSM partitions the input data set into several small subclusters in the first phase and then continuously merges the subclusters based on cohesion in a hierarchical manner in the second phase. The time and the space complexities of algorithm CSM are analyzed. As shown by our performance studies, the cohesion-based clustering is very robust and possesses excellent tolerance to outliers in various workloads. More importantly, algorithm CSM is shown to be able to cluster the data sets of arbitrary shapes very efficiently and provide better clustering results than those by prior methods. 相似文献
6.
We show that for any data set in any metric space, it is possible to construct a hierarchical clustering with the guarantee that for every k, the induced k-clustering has cost at most eight times that of the optimal k-clustering. Here the cost of a clustering is taken to be the maximum radius of its clusters. Our algorithm is similar in simplicity and efficiency to popular agglomerative heuristics for hierarchical clustering, and we show that these heuristics have unbounded approximation factors. 相似文献
7.
On-line hierarchical clustering 总被引:1,自引:0,他引:1
Most of the techniques used in the literature for hierarchical clustering are based on off-line operation. The main contribution of this paper is to propose a new algorithm for on-line hierarchical clustering by finding the nearest k objects to each introduced object so far and these nearest k objects are continuously updated by the arrival of a new object. By final object, we have the objects and their nearest k objects which are sorted to produce the hierarchical dendogram. The results of the application of the new algorithm on real and synthetic data and also using simulation experiments, show that the new technique is quite efficient and, in many respects, superior to traditional off-line hierarchical methods. 相似文献
8.
For streaming data that arrive continuously such as multimedia data and financial transactions, clustering algorithms are typically allowed to scan the data set only once. Existing research in this domain mainly focuses on improving the accuracy of clustering. In this paper, a novel density-based hierarchical clustering scheme for streaming data is proposed in order to improve both accuracy and effectiveness; it is based on the agglomerative clustering framework. Traditionally, clustering algorithms for streaming data often use the cluster center to represent the whole cluster when conducting cluster merging, which may lead to unsatisfactory results. We argue that even if the data set is accessed only once, some parameters, such as the variance within cluster, the intra-cluster density and the inter-cluster distance, can be calculated accurately. This may bring measurable benefits to the process of cluster merging. Furthermore, we employ a general framework that can incorporate different criteria and, given the same criteria, will produce similar clustering results for both streaming and non-streaming data. In experimental studies, the proposed method demonstrates promising results with reduced time and space complexity. 相似文献
9.
Stefano Rizzi 《Pattern recognition letters》1998,19(14):1293-1300
In this paper we propose an encoding scheme and ad hoc operators for a genetic approach to hierarchical graph clustering. Given a connected graph whose vertices correspond to points within a Euclidean space and a fitness function, a hierarchy of graphs in which each vertex corresponds to a connected subgraph of the graph below is generated. Both the number of clustering levels and the number of clusters on each level are not fixed a priori and are subject to optimization. 相似文献
10.
《Journal of Computer and System Sciences》2006,72(3):425-443
We formulate and (approximately) solve hierarchical versions of two prototypical problems in discrete location theory, namely, the metric uncapacitated k-median and facility location problems. Our work yields new insights into hierarchical clustering, a widely used technique in data analysis. For example, we show that every metric space admits a hierarchical clustering that is within a constant factor of optimal at every level of granularity with respect to the average (squared) distance objective. A key building block of our hierarchical facility location algorithm is a constant-factor approximation algorithm for an “incremental” variant of the facility location problem; the latter algorithm may be of independent interest. 相似文献
11.
Clustering is a well known technique in identifying intrinsic structures and find out useful information from large amount of data. One of the most extensively used clustering techniques is the fuzzy c-means algorithm. However, computational task becomes a problem in standard objective function of fuzzy c-means due to large amount of data, measurement uncertainty in data objects. Further, the fuzzy c-means suffer to set the optimal parameters for the clustering method. Hence the goal of this paper is to produce an alternative generalization of FCM clustering techniques in order to deal with the more complicated data; called quadratic entropy based fuzzy c-means. This paper is dealing with the effective quadratic entropy fuzzy c-means using the combination of regularization function, quadratic terms, mean distance functions, and kernel distance functions. It gives a complete framework of quadratic entropy approaching for constructing effective quadratic entropy based fuzzy clustering algorithms. This paper establishes an effective way of estimating memberships and updating centers by minimizing the proposed objective functions. In order to reduce the number iterations of proposed techniques this article proposes a new algorithm to initialize the cluster centers.In order to obtain the cluster validity and choosing the number of clusters in using proposed techniques, we use silhouette method. First time, this paper segments the synthetic control chart time series directly using our proposed methods for examining the performance of methods and it shows that the proposed clustering techniques have advantages over the existing standard FCM and very recent ClusterM-k-NN in segmenting synthetic control chart time series. 相似文献
12.
Knowledge and Information Systems - Uncertainty about data appears in many real-world applications and an important issue is how to manage, analyze and solve optimization problems over such data.... 相似文献
13.
Parallel clustering algorithms 总被引:3,自引:0,他引:3
Clustering techniques play an important role in exploratory pattern analysis, unsupervised learning and image segmentation applications. Many clustering algorithms, both partitional clustering and hierarchical clustering, require intensive computation, even for a modest number of patterns. This paper presents two parallel clustering algorithms. For a clustering problem with N = 2n patterns and M = 2m features, the time complexity of the traditional partitional clustering algorithm on a single processor computer is O(MNK), where K is the number of clusters. The proposed algorithm on anSIMD computer with MN processors has a time complexity O(K(n + m)). The time complexity of the proposed single-link hierarchical clustering algorithm is reduced from O(MN2) of the uniprocessor algorithm to O(nN) with MN processors. 相似文献
14.
Reusable components for partitioning clustering algorithms 总被引:1,自引:1,他引:0
Boris Delibašić Kathrin Kirchner Johannes Ruhland Miloš Jovanović Milan Vukićević 《Artificial Intelligence Review》2009,32(1-4):59-75
Clustering algorithms are well-established and widely used for solving data-mining tasks. Every clustering algorithm is composed of several solutions for specific sub-problems in the clustering process. These solutions are linked together in a clustering algorithm, and they define the process and the structure of the algorithm. Frequently, many of these solutions occur in more than one clustering algorithm. Mostly, new clustering algorithms include frequently occurring solutions to typical sub-problems from clustering, as well as from other machine-learning algorithms. The problem is that these solutions are usually integrated in their algorithms, and that original algorithms are not designed to share solutions to sub-problems outside the original algorithm easily. We propose a way of designing cluster algorithms and to improve existing ones, based on reusable components. Reusable components are well-documented, frequently occurring solutions to specific sub-problems in a specific area. Thus we identify reusable components, first, as solutions to characteristic sub-problems in partitioning cluster algorithms, and, further, identify a generic structure for the design of partitioning cluster algorithms. We analyze some partitioning algorithms (K-means, X-means, MPCK-means, and Kohonen SOM), and identify reusable components in them. We give examples of how new cluster algorithms can be designed based on them. 相似文献
15.
E. T. Y. Lee 《Computing》1986,36(3):229-238
Two algorithms appeared (Lee [6] and Boehm [1]), both in this journal, for the computation of derivatives of aB-spline series that are faster than the repeated application of de Boor's algorithm to the derived series. Here we report on some test examples which show, however, that both these are less stable than the repeated de Boor. 相似文献
16.
We consider the problem of control of hierarchical Markov decision processes and develop a simulation based two-timescale actor-critic algorithm in a general framework. We also develop certain approximation algorithms that require less computation and satisfy a performance bound. One of the approximation algorithms is a three-timescale actor-critic algorithm while the other is a two-timescale algorithm, however, which operates in two separate stages. All our algorithms recursively update randomized policies using the simultaneous perturbation stochastic approximation (SPSA) methodology. We briefly present the convergence analysis of our algorithms. We then present numerical experiments on a problem of production planning in semiconductor fabs on which we compare the performance of all algorithms together with policy iteration. Algorithms based on certain Hadamard matrix based deterministic perturbations are found to show the best results. 相似文献
17.
为了使分簇后的网络更便于数据融合,对最小生成树(MST)的性质进行了研究,论证并实现了一种新的基于MST性质的分布式多层分簇算法.分簇过程中,节点各自独立运行该算法,利用生成的局部MST传递并融合连接信息以完成本层级的网络分簇.经过多次的连接信息间的融合,逐渐形成一个便于数据融合的多层分簇网络.实验分析表明,该算法具有收敛速度快、资源消耗低的优点. 相似文献
18.
In this paper, we present an efficient global illumination technique, and then we discuss the results of its extensive experimental validation. The technique is a hybrid of cluster-based hierarchical and progressive radiosity techniques, which does not require storing links between interacting surfaces and clusters. We tested our technique by applying a multistage validation procedure, which we designed specifically for global illumination solutions. First, we experimentally validate the algorithm against analytically derived and measured real-world data to check how calculation speed is traded for lighting simulation accuracy for various clustering and meshing scenarios. Then we test the algorithm performance and rendering quality by directly comparing the virtual and real-world images of a complex environment. 相似文献
19.
20.
《Computer Communications》2007,30(14-15):2826-2841
The past few years have witnessed increased interest in the potential use of wireless sensor networks (WSNs) in applications such as disaster management, combat field reconnaissance, border protection and security surveillance. Sensors in these applications are expected to be remotely deployed in large numbers and to operate autonomously in unattended environments. To support scalability, nodes are often grouped into disjoint and mostly non-overlapping clusters. In this paper, we present a taxonomy and general classification of published clustering schemes. We survey different clustering algorithms for WSNs; highlighting their objectives, features, complexity, etc. We also compare of these clustering algorithms based on metrics such as convergence rate, cluster stability, cluster overlapping, location-awareness and support for node mobility. 相似文献