首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 640 毫秒
1.
We consider a distributed system where each node keeps a local count for items (similar to elections where nodes are ballot boxes and items are candidates). A top-k query in such a system asks which are the k items whose global count, across all nodes in the system, is the largest. In this paper, we present a Monte Carlo algorithm that outputs, with high probability, a set of k candidates which approximates the top-k items. The algorithm is motivated by sensor networks in that it focuses on reducing the individual communication complexity. In contrast to previous algorithms, the communication complexity depends only on the global scores and not on the partition of scores among nodes. If the number of nodes is large, our algorithm dramatically reduces the communication complexity when compared with deterministic algorithms. We show that the complexity of our algorithm is close to a lower bound on the cell-probe complexity of any non-interactive top-k approximation algorithm. We show that for some natural global distributions (such as the Geometric or Zipf distributions), our algorithm needs only polylogarithmic number of communication bits per node. An extended abstract of this paper appeared in Proc. 13th Int. Colloquium on Structural Information and Communication Complexity, SIROCCO 2006, Lecture Notes in Computer Science 4056, pp. 319–333.  相似文献   

2.
随着分布式系统规模扩大及计算复杂度增加,分布式计算的平均故障修复时间和容错计算所产生的通信开销呈现日益上升趋势。结合分布式编码计算和副本冗余技术,提出一种新的容错算法。map节点应用分布式编码计算的思想,将数据冗余分配至多个计算节点创建编码中间结果,降低计算节点在shuffle阶段的数据传输量。reduce节点通过对接收到的编码中间结果进行解码,从而验证中间结果的正确性并得到最终计算结果。实验结果表明,在基于MapReduce的分布式计算框架下,与三模冗余和两阶段三模冗余容错算法相比,该算法在完成容错计算的同时能降低计算过程中的通信开销和平均故障修复时间,并提高分布式系统的可用性和可靠性。  相似文献   

3.
针对无线传感器网络的离群点检测算法由于没有充分考虑数据的时空关联性和网络的分布特性,导致检测精度低、通信量大和计算复杂度高等局限,提出了基于时空关联的分布计算与过滤的在线离群点检测算法。该算法在各传感器节点上利用传感器读数的时间关联性生成候选离群点,并利用空间关联性对候选离群点进行过滤得到局部离群点,最终将所有传感器节点上的局部离群点集中到sink节点上获得全局离群点。利用时空关联性提高了检测精度,利用分布计算与过滤减少了通信量和计算量,理论分析和实验结果均表明该算法优于现有算法。  相似文献   

4.
A dominating set is a subset of the nodes of a graph such that all nodes are in the set or adjacent to a node in the set. A minimum dominating set approximation is a dominating set that is not much larger than a dominating set with the fewest possible number of nodes. This article summarizes the state-of-the-art with respect to finding minimum dominating set approximations in distributed systems, where each node locally executes a protocol on its own, communicating with its neighbors in order to achieve a solution with good global properties. Moreover, we present a number of recent results for specific families of graphs in detail. A unit disk graph is given by an embedding of the nodes in the Euclidean plane, where two nodes are joined by an edge exactly if they are in distance at most one. For this family of graphs, we prove an asymptotically tight lower bound on the trade-off between time complexity and approximation ratio of deterministic algorithms. Next, we consider graphs of small arboricity, whose edge sets can be decomposed into a small number of forests. We give two algorithms, a randomized one excelling in its approximation ratio and a uniform deterministic one which is faster and simpler. Finally, we show that in planar graphs, which can be drawn in the Euclidean plane without intersecting edges, a constant approximation factor can be ensured within a constant number of communication rounds.  相似文献   

5.
6.
A performance model for analysis of heterogeneous multi-cluster systems   总被引:1,自引:0,他引:1  
This paper addresses the problem of performance modeling for large-scale heterogeneous distributed systems with emphases on multi-cluster computing systems. Since the overall performance of distributed systems is often depends on the effectiveness of its communication network, the study of the interconnection networks for these systems is very important. Performance modeling is required to avoid poorly chosen components and architectures as well as discovering a serious shortfall during system testing just prior to deployment time. However, the multiplicity of components and associated complexity make performance analysis of distributed computing systems a challenging task. To this end, we present an analytical performance model for the interconnection networks of heterogeneous multi-cluster systems. The analysis is based on a parametric family of fat-trees, the m-port n-tree, and a deterministic routing algorithm, which is proposed in this paper. The model is validated through comprehensive simulation, which demonstrated that the proposed model exhibits a good degree of accuracy for various system organizations and under different working conditions.  相似文献   

7.
A distributed hash table (DHT) is an infrastructure to support resource discovery in large distributed systems. In a DHT, data items such as resources, indexes of resources or resource metadata, are distributed across an overlay network based on a hash function. However, this may not be desirable in commercial applications such as Grid and cloud computing whereby the presence of multiple administrative domains leads to the issues of data ownership and self-economic interests. In this paper, we present R-DHT (Read-only DHT), a DHT-based resource discovery scheme without distributing data items. To map each data item back onto its resource owner, a physical host, we virtualize each host into virtual nodes. Nodes are further organized as a segment-based overlay network which increases node failure resiliency without replicating data items. We demonstrate the feasibility of our proposed scheme by presenting R-Chord, an implementation of R-DHT using Chord as the underlying overlay graph, with lookup and maintenance optimizations. Through analytical and simulation analyses, we evaluate the performance of R-DHT and compare it with traditional DHTs in terms of lookup path length, resiliency to node failures, and maintenance overhead. Overall, we found that R-DHT is effective and efficient for resource indexing and discovery in large distributed systems with a strong commercial requirement.  相似文献   

8.
We present a method for parallel block-sparse matrix-matrix multiplication on distributed memory clusters. By using a quadtree matrix representation, data locality is exploited without prior information about the matrix sparsity pattern. A distributed quadtree matrix representation is straightforward to implement due to our recent development of the Chunks and Tasks programming model [Parallel Comput. 40, 328 (2014)]. The quadtree representation combined with the Chunks and Tasks model leads to favorable weak and strong scaling of the communication cost with the number of processes, as shown both theoretically and in numerical experiments.Matrices are represented by sparse quadtrees of chunk objects. The leaves in the hierarchy are block-sparse submatrices. Sparsity is dynamically detected by the matrix library and may occur at any level in the hierarchy and/or within the submatrix leaves. In case graphics processing units (GPUs) are available, both CPUs and GPUs are used for leaf-level multiplication work, thus making use of the full computing capacity of each node.The performance is evaluated for matrices with different sparsity structures, including examples from electronic structure calculations. Compared to methods that do not exploit data locality, our locality-aware approach reduces communication significantly, achieving essentially constant communication per node in weak scaling tests.  相似文献   

9.
《Computer Networks》2007,51(12):3595-3616
As mobile ad hoc network (MANET) systems research has matured and several testbeds have been built to study MANETs, research has focused on developing new MANET applications such as collaborative games, collaborative computing, messaging systems, distributed security schemes, MANET middleware, peer-to-peer file sharing systems, voting systems, resource management and discovery, vehicular computing and collaborative education systems. The growing set of diverse applications developed for MANETs pose far more complex traffic patterns than the simple one-to-one traffic pattern, and hence the one-to-one traffic pattern widely used in previous protocol studies has become inadequate in reflecting the relative performance of these protocols when deployed to support these emerging applications.As a first step towards effectively supporting newly developed and future diverse MANET applications, this paper studies the performance impact of diverse traffic patterns on routing protocols in MANETs. Specifically, we propose a new communication model that extends the previous communication model to include a more general traffic pattern that varies the number of connections per source node. We study the performance impact of traffic patterns on various routing protocols via detailed simulations of an ad hoc network of 112 mobile nodes. Our simulation results show that many of the conclusions drawn in previous protocol comparison studies no longer hold under the new traffic patterns. These results motivate the need for performance evaluation of ad hoc networks to not only include rich and diverse mobility models as has been done in the past but also include diverse traffic patterns that stress a wide set of protocol design issues.  相似文献   

10.
王浩云  徐焕良  任守纲 《计算机科学》2012,39(10):54-59,64
在对中继节点的安全度进行评估的基础上,提出了一种基于节点安全度的P2P网络分布式多路径中继路由协议NSD-DPMRR(Distributed Protocol for Multipath Relay Routing based on Node's Security Degree).该协议可分布式地计算出源端节点发送数据的最佳速率以及各中继节点的最佳转发速率.仿真实验表明,该协议在将恶意中继节点对数据传输的危害降低到最低程度的同时,能够最大化目的端节点所能接收到的正常数据,保证了中继路由的安全性和有效性,且协议的复杂度较低.  相似文献   

11.
基于Hadoop分布式计算平台,给出一种适用于大数据集的并行挖掘算法。该算法对非结构化的原始大数据集以及中间结果文件进行垂直划分以确保能够获得完整的频繁项集,将各个垂直分块数据分配给不同的Hadoop计算节点进行处理,以减少各个计算节点的存储数据,进而减少各个计算节点执行交集操作的次数,提高并行挖掘效率。实验结果表明,给出的并行挖掘算法解决了大数据集挖掘过程中产生的大量数据通信、中间数据以及执行大量交集操作的问题,算法高效、可扩展。  相似文献   

12.
《Information Fusion》2008,9(3):399-411
Information fusion can assist in the development of sensor network applications by merging capabilities, raw data and decisions from multiple sensors through distributed and collaborative integration algorithms. In this paper, we introduce a multi-layered, middleware-driven, multi-agent, interoperable architecture for distributed sensor networks that bridges the gap between the programmable application layer consisting of software agents and the physical layer consisting of sensor nodes. We adopt an energy-efficient, fault-tolerant approach for collaborative information processing among multiple sensor nodes using a mobile-agent-based computing model. In this model the sink/base-station deploys mobile agents that migrate from node to node following a certain itinerary, either pre-determined or determined on-the-fly, and fuse the information/data locally at each node. This way, the intelligence is distributed throughout the network edge and communication cost is reduced to make the sensor network energy-efficient. We evaluate the performance of our mobile-agent-based approach as well as that of the traditional client/server-based computing model, vis-à-vis energy consumption and execution time, through both analytical study and simulation. We draw important conclusions based on our findings. Finally, we consider a collaborative target classification application, supported by our architectural framework, to illustrate the efficacy of the mobile-agent-based computing model.  相似文献   

13.
Current group communication services have mostly been implemented on a homogeneous, distributed computing environment. This limits their applicability because most modern distributed computing environment are heterogeneous in nature. This paper describes the design, implementation, and performance evaluation of a CORBA group communication service. Using CORBA to implement a group communication service enables that group communication service to operate in a heterogeneous, distributed computing environment. To evaluate the effect of CORBA on the performance of a group communication service, this paper provides a detailed comparison of the performance measured from three implementations of an atomic broadcast protocol and a group membership protocol. Two of these implementations use CORBA, while the third uses UDP sockets for interprocess communication. The main conclusion is that heterogeneity can be achieved in group communication services by implementing them using CORBA, but there is a substantial performance cost. This performance cost can be reduced to a certain extent by carefully choosing a design and tuning various protocol parameters such as buffer sizes and timer values  相似文献   

14.
Broadcast, referring to a process of information dissemination in a distributed system whereby a message originating from a certain node is sent to all other nodes in the system, is a very important issue in distributed computing. All-to-all broadcast means the process by which every node broadcasts its certain piece of information to all other nodes. In this paper, we first develop the optimal all-to-all broadcast scheme for the case of one-port communication, which means that each node can only send out one message in one communication step, and then, extend our results to the case of multi-port communication, i.e., k-port communication, meaning that each node can send out k messages in one communication step. We prove that the proposed schemes are optimal for the model considered in the sense that they not only require the minimal number of communication steps, but also incur the minimal number of messages  相似文献   

15.
针对垂直分布的数据,给出一种基于隐私保护的朴素贝叶斯分类协议。该协议利用同态加密、门限密码及数字信封技术,实现数据垂直分布时的数据分类,并保证不向其他方泄露任何与结果有关的信息。理论分析表明,该协议在满足安全性的同时具有较低的通信与计算复杂度。  相似文献   

16.
针对无人机协同作业信息安全和数据通信问题,提出一种基于区块链的分布式无人机数据安全模型。首先,利用轻量化加密技术重构无人机区块链结构,设计适用于物联网边缘计算场景的分布式区块链网络模型;然后,调用智能合约实现区块链数据的安全共享,并结合信誉评估方案和代理权益证明思想,提出融合共识协议的的工作量证明方法完成数据交易。实验结果表明:作为数据安全共享实例,所提方法可使受攻击的无人机信誉值降至不可信任状态,并在不同攻击模式下的能够有效抑制恶意攻击,执行自适应工作量证明的共识算法的正常节点交易率可提升3-4倍,为无人机数据共享提供了安全保障。  相似文献   

17.
一种可靠可伸缩组通信系统设计与实现   总被引:2,自引:0,他引:2  
组通信系统是支持一致性和容错的分布式协同系统中非常重要的组成部分.为了满足大规模协同应用的需求,文中采用了基于流言的协议与确定性协议组合的方法设计并实现了一种可靠可伸缩组通信系统SGCS.该系统主要包括可靠消息传输服务与组成员管理服务,其中基于流言的可靠多播协议和确定的消息恢复、流量控制、排序协议的组合,基于流言的失败检测协议与确定的视图一致化协议的组合以及乐观虚同步机制应用使系统具有良好的可伸缩性、可靠性和灵活性.  相似文献   

18.
There is an important class of interactive multimedia applications that deals with stream data from distributed sources. Indexing the data temporally facilitates ordering individual streams as well as correlating items from different streams. The Stampede programming system organizes stream data into channels that are distributed and synchronized data structures that contain timestamped items. A Stampede program is a data flow graph of threads and channels. Stampede semantics for channels allow concurrent access from multiple threads for input and output. While a channel holds timestamped items, the semantics do not place any restriction on either the production or consumption order of these items. Furthermore, timestamps of items in a channel need not be contiguous. These flexibilities are required due to the dynamic and parallel structure of stream-oriented applications targeted by the Stampede system. Under such circumstances, a key issue is the "garbage collection” (GC) of channel items. In this paper, we present and compare three different GC algorithms: 1) REF is a simple algorithm that keeps a reference count on individual items; 2) TGC is a distributed algorithm for computing a global low watermark for timestamp values of interest in the entire application; 3) DGC is another distributed algorithm that uses information about the dependencies between the producers and consumers of data streams to compute a low water mark local to each node of the data flow graph. DGC can simultaneously eliminate garbage from channels and unneeded computations from threads. In tests performed using an interactive application, DGC enjoys nearly 30 percent reduction in the application memory footprint compared to TGC and REF. DGC and REF are also shown to be more scalable compared to TGC.  相似文献   

19.
We study the following distributed access problem which arises naturally in many settings: given a set of n data items shared among n nodes in a distributed network, all nodes want to access all (or a subset of) the items residing on different nodes in a conflict-free manner. In addition, items may move from one node to the other during access. Our goal is to design distributed protocols so that all nodes access all the desired items as quickly as possible, while at the same time not overloading the storage space of any one node. Using centralized coordination among the nodes it is easy to design an optimal scheme in which all nodes can access all the items in n−1 steps storing only one item at any time. We show that a simple randomized distributed protocol performs almost as well as the optimal (centralized) scheme but with no coordination overhead. Our protocol takes O(n) time with high probability to access all n items which is asymptotically as good as the optimal centralized scheme. The protocol guarantees that the maximum load (the maximum number of items stored in any node) at any time is at most O(log n/log log n) with high probability which is only slightly larger compared to the Ω(1) load of the optimal scheme. Our analysis involves a stochastic analysis of a “balls into bins” problem in a dynamic setting where balls (data items) move into bins (nodes) on request and we study the time and load requirements to move all the balls to the requested bins. A short version of this paper appeared in the Proceedings of the 24th Annual ACM Symposium on Principles of Distributed Computing (PODC), 2005.  相似文献   

20.
We present algorithms for binary cube networks that emulate butterfly network computations on binary-reflected Gray-coded data in the same time complexity as that required for binary-coded data. The code conversion required for the emulation with binary-reflected Gray-coded data is either performed in local memories or through concurrent exchanges. The emulation of a butterfly network with one or two rows mapped to each binary cube node requires n communication cycles on an n-cube. For more than two rows per node, one additional communication cycle is required for every pair of rows, with concurrent communication on all channels of every node. The encoding upon completion can be either binary, or binary-reflected Gray code, or any combination thereof, without affecting the communication complexity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号