期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Dynamic load distribution using anti-tasks and load state vectors

Qin Lu Sau-Ming Lau Kwong-Sak Leung 《Concurrency and Computation》1998,10(14):1251-1269

Polling-based load distribution (LD) algorithms suffer from two weaknesses: (i) load information exchanged during a polling session is confined to the two negotiating nodes only; (ii) as the distributed system grows in size (in terms of the number of constituent nodes), a larger number of polling sessions, and thus a higher amount of network bandwidth consumption and CPU overhead, are needed. We propose a new LD algorithm which is based on anti-tasks and load state vectors. This new algorithm avoids the above weaknesses of polling-based LD algorithms. Anti-tasks are composite agents which travel around a distributed system to facilitate the pairing up of task senders and receivers, as well as the collection and dissemination of load information. Time-stamped load information of processing nodes is stored in load state vectors which, when used together with anti-tasks, encourage mutual sharing of load information among processing nodes. Anti-tasks, which make use of load state vectors to decide their traveling paths, are spontaneously directed towards processing nodes having high transient workload, thus allowing their surplus workload to be relocated quickly. Using simulations, we evaluate the performance of our new algorithm by comparing its performance with a number of well-known polling-based load distribution algorithms. We found that our algorithm provides significant reduction of mean task response time over a large range of system sizes. The cost of achieving this performance gain in terms of CPU overhead and channel bandwidth consumption is generally comparable to the other algorithms we studied. © 1998 John Wiley & Sons, Ltd. 相似文献

2.

MESHMdl event spaces — A coordination middleware for self-organizing applications in ad hoc networks

Klaus Gero Michael A. 《Pervasive and Mobile Computing》2007,3(4):467-487

Mobile ad hoc networks (MANETs) are gaining importance as a promising technology for flexible, proximity-based, mobile communication. However, the inherent dynamics of MANETs imposes strong limitations on the design of distributed applications. They need to be able to adapt to changing conditions quickly and organize themselves in terms of component placement and communication habits. In this paper, we present MESHMdl, a middleware that provides a high level of awareness and decoupling for application components to make them more flexible and adaptable. We focus on the Event Space as the central communication medium of MESHMdl. The Event Space offers a simple, unified communication interface for inter-agent communication as well as for communication with the middleware and resource access. Furthermore, it serves as a means for flexibly extending a MESHMdl daemon. We investigate the performance of the Event Space on different mobile devices and show that it is superior to comparable systems. 相似文献

3.

Distributed Model Checking: From Abstract Algorithms to Concrete Implementations

Christophe Joubert 《Electronic Notes in Theoretical Computer Science》2003,89(1):114

Distributed Model Checking (DMC) is based on several distributed algorithms, which are often complex and error prone. In this paper, we consider one fundamental aspect of DMC design: message passing communication, the implementation of which presents hidden tradeoffs often dismissed in DMC related literature. We show that, due to such communication models, high level abstract DMC algorithms might face implicit pitfalls when implemented concretely. We illustrate our discussion with a generic distributed state space generation algorithm. 相似文献

4.

Hierarchical decision tree induction in distributed genomic databases

Bar-Or A. Keren D. Schuster A. Wolff R. 《Knowledge and Data Engineering, IEEE Transactions on》2005,17(8):1138-1151

Classification based on decision trees is one of the important problems in data mining and has applications in many fields. In recent years, database systems have become highly distributed, and distributed system paradigms, such as federated and peer-to-peer databases, are being adopted. In this paper, we consider the problem of inducing decision trees in a large distributed network of genomic databases. Our work is motivated by the existence of distributed databases in healthcare and in bioinformatics, and by the emergence of systems which automatically analyze these databases, and by the expectancy that these databases will soon contain large amounts of highly dimensional genomic data. Current decision tree algorithms require high communication bandwidth when executed on such data, which are large-scale distributed systems. We present an algorithm that sharply reduces the communication overhead by sending just a fraction of the statistical data. A fraction which is nevertheless sufficient to derive the exact same decision tree learned by a sequential learner on all the data-in the network. Extensive experiments using standard synthetic SNP data show that the algorithm utilizes the high dependency among attributes, typical to genomic data, to reduce communication overhead by up to 99 percent. Scalability tests show that the algorithm scales well with both the size of the data set, the dimensionality of the data, and the size of the distributed system. 相似文献

5.

Approximate distributed top-<Emphasis Type="Italic">k</Emphasis> queries

Boaz Patt-Shamir Allon Shafrir 《Distributed Computing》2008,21(1):1-22

We consider a distributed system where each node keeps a local count for items (similar to elections where nodes are ballot boxes and items are candidates). A top-k query in such a system asks which are the k items whose global count, across all nodes in the system, is the largest. In this paper, we present a Monte Carlo algorithm that outputs, with high probability, a set of k candidates which approximates the top-k items. The algorithm is motivated by sensor networks in that it focuses on reducing the individual communication complexity. In contrast to previous algorithms, the communication complexity depends only on the global scores and not on the partition of scores among nodes. If the number of nodes is large, our algorithm dramatically reduces the communication complexity when compared with deterministic algorithms. We show that the complexity of our algorithm is close to a lower bound on the cell-probe complexity of any non-interactive top-k approximation algorithm. We show that for some natural global distributions (such as the Geometric or Zipf distributions), our algorithm needs only polylogarithmic number of communication bits per node. An extended abstract of this paper appeared in Proc. 13th Int. Colloquium on Structural Information and Communication Complexity, SIROCCO 2006, Lecture Notes in Computer Science 4056, pp. 319–333. 相似文献

6.

一种支持分布式进程迁移的动态负载平衡征募算法的研究 总被引：1，自引：0，他引：1

喻占武胡瑞敏庞丽萍李德仁《小型微型计算机系统》1999,20(5):321-325

负载平衡是分布式系统必须考虑的问题,本文介绍的征募算法独立于网络拓扑结构,其思想可以应用到分布式系统中,征募算法的设计思想向传统负载平衡算法提出了挑战,它不但克服了投标算法的缺点,而且在减小通讯开销和提高处理机利用率两方面作了很多努力,使其成为一种高效的分布式进程迁移和动态负载平衡策略。我们在分布式ＵＮＩＸ系统上实现并验证了征募算法的高效性。相似文献

7.

A fault-tolerant algorithm for replicated data management 总被引：1，自引：0，他引：1

Rangarajan S. Setia S. Tripathi S.K. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(12):1271-1282

We examine the tradeoff between message overhead and data availability that arises in the design of fault-tolerant algorithms for replicated data management in distributed systems. We propose a property called asymptotically high resiliency which is useful for evaluating the fault-tolerance of replica control algorithms and distributed mutual exclusion algorithms. We present a new algorithm for replica control that can be tailored (through a design parameter) to achieve the desired balance between low message overhead and high data availability. Further, we show that for a message overhead of O(√(Nlog N)), our algorithm can achieve asymptotically high resiliency 相似文献

8.

Scalable distributed Louvain algorithm for community detection in large graphs

Sattar Naw Safrin Arifuzzaman Shaikh 《The Journal of supercomputing》2022,78(7):10275-10309

Community detection (or clustering) in large-scale graphs is an important problem in graph mining. Communities reveal interesting organizational and functional characteristics of a network. Louvain algorithm is an efficient sequential algorithm for community detection. However, such sequential algorithms fail to scale for emerging large-scale data. Scalable parallel algorithms are necessary to process large graph datasets. In this work, we show a comparative analysis of our different parallel implementations of Louvain algorithm. We design parallel algorithms for Louvain method in shared memory and distributed memory settings. Developing distributed memory parallel algorithms is challenging because of inter-process communication and load balancing issues. We incorporate dynamic load balancing in our final algorithm DPLAL (Distributed Parallel Louvain Algorithm with Load-balancing). DPLAL overcomes the performance bottleneck of the previous algorithms and shows around 12-fold speedup scaling to a larger number of processors. We also compare the performance of our algorithm with some other prominent algorithms in the literature and get better or comparable performance . We identify the challenges in developing distributed memory algorithm and provide an optimized solution DPLAL showing performance analysis of the algorithm on large-scale real-world networks from different domains.

相似文献

9.

Distributed stream join query processing with semijoins

Tri Minh Tran Byung Suk Lee 《Distributed and Parallel Databases》2010,27(3):211-254

This paper addresses the distributed stream processing of window-based multi-way join queries considering the semijoin as a key join operator. In distributed stream processing, data streams arriving at remote sites need to be shipped to the processing site for query execution. This typically introduces high communication overhead. Our observation is that semijoin, effective in reducing communication overhead in distributed database query processing, can be also effective in distributed stream query processing. The challenge, however, lies in the streaming nature of the tuples, as it requires continuous and incremental processing of an unbounded sequence of tuples instead of one-time processing of a set of stored tuples. This paper describes our comprehensive work done to address the challenge. Specifically, we first propose a distributed stream join processing model that handles the issue of network delays introduced from the shipment of data streams, and allows for efficient batch processing. Then, based on the model, we propose join algorithms in a multi-way join case: first, one-way join algorithms for different combinations of join placement and join method and, then, multi-way join algorithms assuming linear join ordering. Regarding the join method, two distributed join methods are introduced: (1) simple join, in which full tuples are forwarded to the query processing site and (2) semijoin-based join, in which partial tuples are forwarded. A semijoin-based join can be executed with different possible semijoin strategies which incur different communication overheads. We present a complete set of join algorithms considering all possible semijoin strategies, and propose an optimization algorithm. The join algorithms are executed continuously in an incremental manner as tuples arrive, and never ship tuples redundantly. The optimization algorithm constructs an efficient multi-way join plan by using a greedy heuristic which adds to the plan one stream with the minimum join execution cost in each step. Through extensive experiments, we conduct comparative studies of the performance among the proposed one-way join algorithms and the efficiency of the generated plan between the optimization algorithm based on the greedy heuristic and the exhaustive search, respectively. 相似文献

10.

A Computation+Communication Load Balanced Loop Partitioning Method for Distributed Memory Systems

Santosh Pande Tareq Bali 《Journal of Parallel and Distributed Computing》1999,58(3):251

Due to a significant communication overhead of sending and receiving data, the loop partitioning approaches on distributed memory systems must guarantee not just the computation load balance but computation+communication load balance. The previous approaches in loop partitioning have achieved a communication-free, computation load balanced iteration space partitioning solution for a limited subset of DOALL loops. But a large category of DOALL loops inevitably result in communication and the trade-offs between computation and communication must be carefully analyzed for these loops in order to balance out the combined computation time and communication overheads. In this work, we describe a partitioning approach based on the above motivation for the general cases of DOALL loops. Our goal is to achieve a computation+communication load balanced partitioning through static data and iteration space distribution. Our approach first performs partitioning of iteration and data spaces of a loop nest by analyzing communication and parallelism; it then performs architecture-dependent analysis to adjust the granularity of partitions, load balance each partition with respect to total computation+communication, and then performs mapping of partitions onto the available number of processors. This multiphase partitioning method works as follows. First, the code partitioning phase analyzes the references in the body of the DOALL loop nest and determines a set of directions for reducing a larger degree of communication by trading a lesser degree of parallelism. The partitioning is carried out in the iteration space of the loop by cyclically following a set of direction vectors such that the data references are maximally localized and reused, eliminating a larger communication volume than parallelism. We then perform data space partitioning based on a new larger partition owns rule to minimize the communication overhead for a compute intensive partition by localizing its references relatively more than a smaller noncompute intensive partition. A partition interaction graph is then constructed which is used by the architecture-dependent analysis phase to merge the partitions to achieve granularity adjustment, computation+communication load balance, and mapping on the actual number of available processors. Relevant theory and algorithms are developed along with a performance evaluation on the Cray T3D. 相似文献

11.

Distributed Algorithms for Rate-Adaptive Media Streams

Steven Weber Vilas Veeraraghavan 《Networks and Spatial Economics》2008,8(1):61-94

相似文献

12.

A work-efficient distributed algorithm for reachability analysis

Orna Grumberg Tamir Heyman Assaf Schuster 《Formal Methods in System Design》2006,29(2):157-175

This work presents a novel distributed symbolic algorithm for reachability analysis that can effectively exploit, as needed, a large number of machines working in parallel. The novelty of the algorithm is in its dynamic allocation and reallocation of processes to tasks and in its mechanism for recovery from local state explosion. As a result, the algorithm is work-efficient: it utilizes only those resources that are actually needed. In addition, its high adaptability makes it suitable for exploiting the resources of very large and heterogeneous distributed, nondedicated environments. Thus, it suitable for verifying very large systems. We implemented our algorithm in a tool called Division. Our experimental results show that the algorithm is indeed work-efficient. Although the goal of this research is to check larger models, the results also indicate that the algorithm can obtain high speedups, because communication overhead is very small. 相似文献

13.

Fast convergence to network fairness

Ageliki Tsioliaridou^{Author Vitae} Vassilis Tsaoussidis Author Vitae 《Journal of Systems and Software》2010,83(5):745-762

Most AQM algorithms, such as RED, assure fairness through randomness in congestion notification. However, randomness results in fair allocation of network resources only when time limitations are not considered. This is not compatible with the current Internet, where traffic oscillations are frequent and the demand for fair treatment is rather urgent, due to short duration of most applications. Given the short duration of most modern Internet applications, fast convergence to fairness is necessitated. In this paper, we use fairness as the major criterion to adjust traffic and present a corresponding algorithm of active queue management, which is called Explicit Global Congestion Notifier (EGCN). EGCN notifies flows almost simultaneously about incipient congestion by marking packets arriving at the router’s queue, when the load in the network increases and buffer overflow is expected. This is a new approach compared with the random notification policy of RED or ECN. EGCN distributes the burden to adjust backward to more flows and consequently allows for smoother window adjustments. We elaborate on the properties of system-wide response in terms of fairness, smoothness and efficiency. Simulation results demonstrate a clear-cut advantage of the proposed scheme. 相似文献

14.

Load Balancing Parallel Explicit State Model Checking

Rahul Kumar Eric G. Mercer 《Electronic Notes in Theoretical Computer Science》2005,128(3):19

This paper first identifies some of the key concerns about the techniques and algorithms developed for parallel model checking; specifically, the inherent problem with load balancing and large queue sizes resultant in a static partition algorithm. This paper then presents a load balancing algorithm to improve the run time performance in distributed model checking, reduce maximum queue size, and reduce the number of states expanded before error discovery. The load balancing algorithm is based on generalized dimension exchange (GDE). This paper presents an empirical analysis of the GDE based load balancing algorithm on three different supercomputing architectures—distributed memory clusters, Networks of Workstations (NOW) and shared memory machines. The analysis shows increased speedup, lower maximum queue sizes and fewer total states explored before error discovery on each of the architectures. Finally, this paper presents a study of the communication overhead incurred by using the load balancing algorithm, which although significant, does not offset performance gains. 相似文献

15.

Efficient task migration algorithm for distributed systems

Suen T.T.Y. Wong J.S.K. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(4):488-499

The objective of the study was to achieve balanced load among processors, reduce the communication overhead of the load balancing algorithm, and improve respource utilization, which results in better average resonse time. A communication protocol and a fully distributed algorithm for dynamic load balancing through task migration in a connected N-processor network are presented. Each processor communicates its load directly with only a subset (of the size √ N) of processors, reducing communication traffic and average response time. It is proved that the given algorithm will perform task migration even if there is only one light load processor and one heavy load processor in the system. Simulation results show that the proposed scheme can save up to 60% of the protocol messages used by the broadcast algorithms and can reduce the average response time 相似文献

16.

PRAM programming: in theory and in practice

D. S. Lecomber C. J. Siniolakis K. R. Sujithan 《Concurrency and Computation》2000,12(4):211-226

That the influence of the PRAM model is ubiquitous in parallel algorithm design is as clear as the fact that it is technologically infeasible for the forseeable future. The current generation of parallel hardware prominently features distributed memory and high‐performance interconnection networks—very much the antithesis of the shared memory required for the PRAM model. It has been shown that, in spite of communication costs, for some problems very fast parallel algorithms are available for distributed‐memory machines—from embarassingly parallel problems to sorting and numerical analysis. In contrast it is known that for other classes of problem PRAM‐style shared‐memory simulation on a distributed‐memory machine can, in theory, produce solutions of comparable performance to the best possible for such architectures. The Bulk Synchronous Parallel (BSP) model accurately represents most parallel machines—theoretical and actual—in an execution and cost model. We introduce a scalable portable PRAM realization appropriate for BSP computers and a methodology for usage. Our system is fast and built upon the familiar sequential C++ coupled with the new standard BSP library of parallel computation and communication primitives. It is portable to and predictable on a vast number of parallel computers including workstation clusters, a 256‐processor Cray T3D, an 8‐node IBM SP/2 and a 4‐node shared‐memory SGI Power Challenge machine. Our approach achieves simplicity of programming over direct‐mode BSP programming for reasonable overhead cost. We objectively compare optimized BSP and PRAM algorithms implemented with our C++ PRAM library and provide encouraging experimental results for our new style of programming. Copyright © 2000 John Wiley & Sons, Ltd. 相似文献

17.

Benchmarking hybrid algorithms for distributed constraint optimisation games

Archie C. Chapman Alex Rogers Nicholas R. Jennings 《Autonomous Agents and Multi-Agent Systems》2011,22(3):385-414

In this paper, we consider algorithms for distributed constraint optimisation problems (DCOPs). Using a potential game characterisation of DCOPs, we decompose eight DCOP algorithms, taken from the game theory and computer science literatures, into their salient components. We then use these components to construct three novel hybrid algorithms. Finally, we empirical evaluate all eleven algorithms, in terms of solution quality, timeliness and communication resources used, in a series of graph colouring experiments. Our experimental results show the existence of several performance trade-offs (such as quick convergence to a solution, but with a cost of high communication needs), which may be exploited by a system designer to tailor a DCOP algorithm to suit their mix of requirements. 相似文献

18.

Randomized gossip algorithms for maintaining a distributed bulletin board with guaranteed age properties

Lior Amar Amnon Barak Zvi Drezner Michael Okun 《Concurrency and Computation》2009,21(15):1907-1927

Scalable computer systems, including clusters and multi‐cluster grids, require routine exchange of information about the state of system‐wide resources among their nodes. Gossip‐based algorithms are popular for providing such information services due to their simplicity, fault tolerance and low communication overhead. This paper presents a randomized gossip algorithm for maintaining a distributed bulletin board among the nodes of a scalable computer system. In this algorithm each node routinely disseminates its most recently acquired information while maintaining a snapshot of the other nodes' states. The paper provides analytical approximations for the expected average age, the age distribution and the expected maximal age for the acquired information at each node. We confirm our results by measurements of the performance of the algorithm on a multi‐cluster campus grid with 256 nodes and by simulations of configurations with up to 2048 nodes. The paper then presents practical enhancements of the algorithm, which makes it more suitable for a real system. Such enhancements include using fixed‐size messages, reducing the number of messages sent to inactive nodes and supporting urgent information. The enhanced algorithm guarantees the age properties of the information at each node in the configurations with an arbitrary number of inactive nodes. It is being used in our campus grid for resource discovery, for dynamic assignment of processes to the best available nodes, for load‐balancing and for on‐line monitoring. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

19.

A Distributed Drafting Algorithm for Load Balancing

《IEEE transactions on pattern analysis and machine intelligence》1985,(10):1153-1161

It is desirable for the load in a distributed system to be balanced evenly. A dynamic process migration protocol is needed in order to achieve load balancing in a user transparent manner. A distributed algorthim for load balancing which is network topology independent is proposed in this paper. Different network topologies and low-level communications protocols affect the choice of only some system design parameters. The "drafting" algorithm attempts to compromise two contradictory goals: maximize the processor utilization and minimize the communication overhead. The main objective of this paper is to describe the dynamic process migration protocol based on the proposed drafting algorithm. A sample distributed system is used to further illustrate the drafting algorithm and to show how to define system design parameters. The system performance is measured by simulation experiments based on the sample system. 相似文献

20.

Tool integration for flexible simulation of distributed algorithms

Shashank Khanvilkar Sol M. Shatz 《Software》2001,31(14):1363-1380

Over the last two decades, considerable research has been done in distributed operating systems, which can be attributed to faster processors and better communication technologies. A distributed operating system requires distributed algorithms to provide basic operating system functionality like mutual exclusion, deadlock detection, etc. A number of such algorithms have been proposed in the literature. Traditionally, these distributed algorithms have been presented in a theoretical way, with limited attempts to simulate actual working models. This paper discusses our experience in simulating distributed algorithms with the aid of some existing tools, including OPNET and Xplot. We discuss our efforts to define a basic model‐based framework for rapid simulation and visualization, and illustrate how we used this framework to evaluate some classic algorithms. We have also shown how the performance of different algorithms can be compared based on some collected statistics. To keep the focus of this paper on the approach itself, and our experience with tool integration, we only discuss some relatively simple models. Yet, the approach can be applied to more complex algorithm specifications. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献