首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到15条相似文献,搜索用时 15 毫秒
1.
2.
Clustering is a useful data mining technique which groups data points such that the points within a single group have similar characteristics, while the points in different groups are dissimilar. Density-based clustering algorithms such as DBSCAN and OPTICS are one kind of widely used clustering algorithms. As there is an increasing trend of applications to deal with vast amounts of data, clustering such big data is a challenging problem. Recently, parallelizing clustering algorithms on a large cluster of commodity machines using the MapReduce framework have received a lot of attention.In this paper, we first propose the new density-based clustering algorithm, called DBCURE, which is robust to find clusters with varying densities and suitable for parallelizing the algorithm with MapReduce. We next develop DBCURE-MR, which is a parallelized DBCURE using MapReduce. While traditional density-based algorithms find each cluster one by one, our DBCURE-MR finds several clusters together in parallel. We prove that both DBCURE and DBCURE-MR find the clusters correctly based on the definition of density-based clusters. Our experimental results with various data sets confirm that DBCURE-MR finds clusters efficiently without being sensitive to the clusters with varying densities and scales up well with the MapReduce framework.  相似文献   

3.
Khuller  S.  Raghavachari  B.  Young  N. 《Algorithmica》1995,14(4):305-321
We give a simple algorithm to find a spanning tree that simultaneously approximates a shortest-path tree and a minimum spanning tree. The algorithm provides a continuous tradeoff: given the two trees and a>0, the algorithm returns a spanning tree in which the distance between any vertex and the root of the shortest-path tree is at most 1+2 times the shortest-path distance, and yet the total weight of the tree is at most 1+2/ times the weight of a minimum spanning tree. Our algorithm runs in linear time and obtains the best-possible tradeoff. It can be implemented on a CREW PRAM to run a logarithmic time using one processor per vertex.Current research supported by NSF Research Initiation Award CCR-9307462. This work was done while this author was supported by NSF Grants CCR-8906949, CCR-9103135, and CCR-9111348.Part of this work was done while this, author was at the University of Maryland Institute for Advanced Computer Studies (UMIACS) and supported by NSF Grants CCR-8906949 and CCR-9111348.  相似文献   

4.
5.
针对传统最小生成树聚类算法需要事先知道聚类数目和使用静态全局分类依据,导致聚类密度相差较大时,算法有效性下降,计算复杂度大等问题,提出一种改进的最小生成树自适应分层聚类算法,根据最近邻关系,自动为每个聚类簇设定独立的阈值,使之适应分布密度相差较大的情况,并能自动确定聚类数目。实验表明,算法具有较好的性能,尤其对数据密度分布不均匀的情况也能得到较好的聚类结果。  相似文献   

6.
Summary In this paper, we present an efficient distributed protocol for constructing a minimum-weight spanning tree (MST). Gallager, Humblet and Spira [5] proposed a protocol for this problem with time and message complexitiesO(N logN) andO(E+NlogN) respectively. A protocol withO(N) time complexity was proposed by Awerbuch [1]. We show that the time complexity of the protocol in [5] can also be expressed asO((D+d) logN), whereD is the maximum degree of a node andd is a diameter of the MST and therefore this protocol performs better than the protocol in [1] wheneverD+d<N/logN. We give a protocol which requiresO(min(N, (D+d)logN)) time andO(E+NlogNlogN/loglogN) messages. The protocol constructs a minimum spanning tree by growing disjoint subtrees of the MST (which are referred to asfragments). Fragments having the same minimum-weight outgoing edge are combined until a single fragment which spans the entire network remains. The protocols in [5] and [1] enforce a balanced growth of fragments. We relax the requirement of balanced growth and obtain a highly asynchronous protocol. In this protocol, fast growing fragments combine more often and there-fore speed up the execution. Gurdip Singh received the B. Tech degree in Computer Science from Indian Institute of Technology, New Delhi in 1986 and the M.S. and Ph.D. degrees in Computer Science from State University of New York at Stony Brook in 1989 and 1991 respectively. Since 1991, he has been an Assistant Professor in the Department of Computing and Information Sciences at Kansas State University. His research interest include design and verification of distributed algorithms, communication networks and distributed shared memories. Arthur Bernstein received his PhD in Electrical Engineering from Columbia University. He is currently a professor in the Computer Science Department at the State University of New York at Stony Brook. His research interests center around concurrency in programming and data-base systems.This work was supported by NSF under grants CCR8701671, CCR8901966 and CCR9211621  相似文献   

7.
Two parallel algorithms for finding minimum spanning forest (MSF) of a weighted undirected graph on hypercube computers, consisting of a fixed number of processors, are presented. One algorithm is suited for sparse graphs, the other for dense graphs. Our design strategy is based on successive elimination of non-MSF edges. The input graph is partitioned equally among different processors, which then repeatedly eliminate non-MSF edges and merge results to gradually construct the desired MSF of the entire graph. Low communication overhead is achieved by restricting the message-flow to between the neighboring processors in the hypercube topology. The correctness of our approach is due to a theorem which states that with total-ordered edges, if an edge of an arbitrary subgraph does not belong to its MSF, then it does not belong to the MSF of the entire graph. For a graph of n vertices and m edges, our first algorithm finds an MSF in O(m log m)/p) time using p processors for p ≤ (mlog m)/n(1+log(m/n)). The second algorithm, efficient for dense graphs, requires O(n2/p) time for pn/log n.  相似文献   

8.
We present the first optimal parallel algorithms for the verification and sensitivity analysis of minimum spanning trees. Our algorithms are deterministic and run inO(logn) time and require linear-work in the CREW PRAM model. These algorithms are used as a subroutine in the linear-work randomized algorithm for finding minimum spanning trees of Cole, Klein, and Tarjan. Research partially supported by a National Science Foundation Graduate Fellowship and by DIMACS (Center for Discrete Mathematics and Theoretical Computer Science), a National Science Foundation Science and Technology Center, Grant No. NSF-STC88-09648. Research at Princeton University was partially supported by the National Science Foundation, Grant No. CCR-8920505, the Office of Naval Research, Contract No. N00014-91-J-1463, and by DIMACS (Center for Discrete Mathematics and Theoretical Computer Science), a National Science Foundation Science and Technology Center, Grant No. NSF-STC88-09648.  相似文献   

9.
发现现有的针对非均匀分簇路由算法没有充分考虑簇首与基站之间最优路径选择,而导致传输路径上的能量消耗不均衡的问题。为了更好地均衡传输路径上节点能量的消耗,提出了基于最小生成树的非均匀分簇的路由算法。该算法利用节点剩余能量和节点到基站的距离选举簇首,然后通过建立最小生成树搜寻最优传输路径,这样可以减少传输路径上的能量消耗,有效地解决能耗不均衡问题。理论分析和实验结果均表明,该算法无论在存活节点个数还是在能量消耗上都明显优于EEUC算法和EBCA。  相似文献   

10.
Parallel updates of minimum spanning trees (MSTs) have been studied in the past. These updates allowed a single change in the underlying graph, such as a change in the cost of an edge or an insertion of a new vertex. Multiple update problems for MSTs are concerned with handling more than one such change. In the sequential case multiple update problems may be solved using repeated applications of an efficient algorithm for a single update. However, for efficiency reasons, parallel algorithms for multiple update problems must consider all changes to the underlying graph simultaneously. In this paper we describe parallel algorithms for updating an MST whenk new vertices are inserted or deleted in the underlying graph, when the costs ofk edges are changed, or whenk edge insertions and deletions are performed. For multiple vertex insertion update, our algorithm achieves time and processor bounds ofO(log n·logk) and nk/(logn·logk), respectively, on a CREW parallel random access machine. These bounds are optimal for dense graphs. A novel feature of this algorithm is a transformation of the previous MST andk new vertices to a bipartite graph which enables us to obtain the above-mentioned bounds.  相似文献   

11.
The problem of verifying a Minimum Spanning Tree (MST) was introduced by Tarjan in a sequential setting. Given a graph and a tree that spans it, the algorithm is required to check whether this tree is an MST. This paper investigates the problem in the distributed setting, where the input is given in a distributed manner, i.e., every node “knows” which of its own emanating edges belong to the tree. Informally, the distributed MST verification problem is the following. Label the vertices of the graph in such a way that for every node, given (its own state and label and) the labels of its neighbors only, the node can detect whether these edges are indeed its MST edges. In this paper, we present such a verification scheme with a maximum label size of O(log n log W), where n is the number of nodes and W is the largest weight of an edge. We also give a matching lower bound of Ω(log n log W) (as long as W > (log n)1+ε for some fixed ε > 0). Both our bounds improve previously known bounds for the problem. For the related problem of tree sensitivity also presented by Tarjan, our method yields rather efficient schemes for both the distributed and the sequential settings. A preliminary version of this work was presented in ACM PODC 2006. A. Korman was supported in part at the Technion by an Aly Kaufman fellowship. S. Kutten was supported in part by a grant from the Israeli Ministry for Science and Technology.  相似文献   

12.
We study the problem of sharing in a fair manner the cost of a service provided to a set of players in the context of Cooperative Game Theory. We introduce a new fairness measure capturing the dissatisfaction (or happiness) of each player and we propose two cost sharing methods minimizing the maximum or average dissatisfaction of the clients for the classical minimum spanning tree game.  相似文献   

13.
In real life applications we often have the following problem: How to find the reasonable assignment strategy to satisfy the source and destination requirement without shipping goods from any pairs of prohibited sources simultaneously to the same destination so that the total cost can be minimized. This kind of problem is known as the transportation problem with exclusionary side constraint (escTP). Since this problem is one of nonlinear programming models, it is impossible to solve this problem using a traditional linear programming software package (i.e., LINDO). In this paper, an evolutionary algorithm based on a genetic algorithm approach is proposed to solve it. We adopt a Prüfer number to represent the candidate solution to the problem and design the feasibility of the chromosome. Moreover, to handle the infeasible chromosome, here we also propose the repairing procedure. In order to improve the performance of the genetic algorithm, the fuzzy logic controller (FLC) is used to dynamically control the genetic operators. Comparisons with other conventional methods and the spanning tree-based genetic algorithm (st-GA) are presented and the results show the proposed approach to be better as a whole.  相似文献   

14.
We study in this paper the problem of finding in a graph a subset of k edges whose deletion causes the largest increase in the weight of a minimum spanning tree. We propose for this problem an explicit enumeration algorithm whose complexity, when compared to the current best algorithm, is better for general k but very slightly worse for fixed k. More interestingly, unlike in the previous algorithms, we can easily adapt our algorithm so as to transform it into an implicit enumeration algorithm based on a branch and bound scheme. We also propose a mixed integer programming formulation for this problem. Computational results show a clear superiority of the implicit enumeration algorithm both over the explicit enumeration algorithm and the mixed integer program.  相似文献   

15.
The vertex updating problem for a minimum spanning tree (MST) is defined as follows: Given a graphG=(V, E G) and an MSTT forG, find a new MST forG to which a new vertexz has been added along with weighted edges that connectz with the vertices ofG. We present a set of rules that produce simple optimal parallel algorithms that run inO(lgn) time usingn/lgn EREW PRAM processors, wherenV¦. These algorithms employ any valid tree-contraction schedule that can be produced within the stated resource bounds. These rules can also be used to derive simple linear-time sequential algorithms for the same problem. The previously best-known parallel result was a rather complicated algorithm that usedn processors in the more powerful CREW PRAM model. Furthermore, we show how our solution can be used to solve the multiple vertex updating problem: Update a given MST whenk new vertices are introduced simultaneously. This problem is solved inO(lgk·lgn) parallel time using (k·n)/(lgk·lgn) EREW PRAM processors. This is optimal for graphs having (kn) edges.Part of this work was done while P. Metaxas was with the Department of Mathematics and Computer Science, Dartmouth College.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号