期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Eventual Leader Election with Weak Assumptions on Initial Knowledge,Communication Reliability,and Synchrony

Antonio Fernández Anta Ernesto Jiménez Michel Raynal 《计算机科学技术学报》2010,25(6):1267-1281

his paper considers the eventual leader election problem in asynchronous message-passing systems where an arbitrary number t of processes can crash (t < n, where n is the total number of processes). It considers weak assumptions both on the initial knowledge of the processes and on the network behavior. More precisely, initially, a process knows only its identity and the fact that the process identities are different and totally ordered (it knows neither n nor t). Two eventual leader election protocols and a lower bound are presented. The first protocol assumes that a process also knows a lower bound α on the number of processes that do not crash. This protocol requires the following behavioral properties from the underlying network: the graph made up of the correct processes and fair lossy links is strongly connected, and there is a correct process connected to (n − f) − α other correct processes (where f is the actual number of crashes in the considered run) through eventually timely paths (paths made up of correct processes and eventually timely links). This protocol is not communication-efficient in the sense that each correct process has to send messages forever. The second protocol is communication-efficient: after some time, only the final common leader has to send messages forever. This protocol does not require the processes to know α, but requires stronger properties from the underlying network: each pair of correct processes has to be connected by fair lossy links (one in each direction), and there is a correct process whose n − f − 1 output links to the rest of correct processes have to be eventually timely. A matching lower bound result shows that any eventual leader election protocol must have runs with this number of eventually timely links, even if all processes know all the processes identities. In addition to being communication-efficient, the second protocol has another noteworthy efficiency property, namely, be the run finite or infinite, all the local variables and message fields have a finite domain in the run. 相似文献

2.

Computing global functions in asynchronous distributed systems with perfect failure detectors

Helary J.-M. Hurfin M. Mostefaoui A. Raynal M. Tronel F. 《Parallel and Distributed Systems, IEEE Transactions on》2000,11(9):897-909

A Global Data is a vector with one entry per process. Each entry must be filled with an appropriate value provided by the corresponding process. Several distributed computing problems amount to compute a function on a global data. This paper proposes a protocol to solve such problems in the context of asynchronous distributed systems where processes may fail by crashing. The main problem that has to be solved lies in computing the global data and in providing each noncrashed process with a copy of it, despite the possible crash of some processes. To be consistent, the global data must contain, at least, all the values provided by the processes that do not crash. This defines the Global Data Computation (GDC) problem. To solve this problem, processes execute a sequence of asynchronous rounds during which they construct, in a decentralized way, the value of the global data and eventually each process gets a copy of it. To cope with process crashes, the protocol uses a perfect failure detector. The proposed protocol has been designed to be time efficient: it allows early decision. Let t be the maximum number of processes that may crash, t相似文献

3.

Consensus-based fault-tolerant total order multicast

Fritzke U. Jr Ingels P. Mostefaoui A. Raynal M. 《Parallel and Distributed Systems, IEEE Transactions on》2001,12(2):147-156

While total order broadcast (or atomic broadcast) primitives have received a lot of attention, this paper concentrates on total order multicast to multiple groups in the context of asynchronous distributed systems in which processes may suffer crash failures. “Multicast to Multiple Groups” means that each message is sent to a subset of the process groups composing the system, distinct messages possibly having distinct destination groups. “Total Order” means that all message deliveries must be totally ordered. This paper investigates a consensus-based approach to solve this problem and proposes a corresponding protocol to implement this multicast primitive. This protocol is based on two underlying building blocks, namely, uniform reliable multicast and uniform consensus. Its design characteristics lie in the two following properties. The first one is a minimality property, more precisely, only the sender of a message and processes of its destination groups have to participate in the total order multicast of the message. The second property is a locality property: No execution of a consensus has to involve processes belonging to distinct groups (i.e., consensus is executed on a “per group” basis). This locality property is particularly useful when one is interested in using the total order multicast primitive in large-scale distributed systems. In addition to a correctness proof, an improvement that reduces the cost of the protocol is also suggested 相似文献

4.

Failure detection and consensus in the crash-recovery model 总被引：2，自引：0，他引：2

Marcos Kawazoe Aguilera Wei Chen Sam Toueg 《Distributed Computing》2000,13(2):99-125

Summary. We study the problems of failure detection and consensus in asynchronous systems in which processes may crash and recover, and links may lose messages. We first propose new failure detectors that are particularly suitable to the crash-recovery model. We next determine under what conditions stable storage is necessary to solve consensus in this model. Using the new failure detectors, we give two consensus algorithms that match these conditions: one requires stable storage and the other does not. Both algorithms tolerate link failures and are particularly efficient in the runs that are most likely in practice – those with no failures or failure detector mistakes. In such runs, consensus is achieved within time and with 4 n messages, where is the maximum message delay and n is the number of processes in the system. Received: May 1998 / Accepted: November 1999 相似文献

5.

Performance-based path determination for interprocessorcommunication in distributed computing systems

JunSeong Kim Lilja D.J. 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(3):316-327

The different types of messages used by a parallel application program executing in a distributed computing system can each have unique characteristics so that no single communication network can produce the lowest latency for all messages. For instance, short control messages may be sent with the lowest overhead on one type of network, such as Ethernet, while bulk data transfers may be better suited to a different type of network, such as Fibre Channel or HIPPI. This work investigates how to exploit multiple heterogeneous communication networks that interconnect the same set of processing nodes using a set of techniques we call performance-based path determination (PBPD). The performance-based path selection (PBPS) technique selects the best (lowest latency) network among several for each individual message to reduce the communication overhead of parallel programs. The performance-based path aggregation (PBPA) technique, on the other hand, aggregates multiple networks into a single virtual network to increase the available bandwidth. We test the PBPD techniques on a cluster of SGI multiprocessors interconnected with Ethernet, Fibre Channel, and HiPPI networks using a custom communication library built on top of the TCP/IP protocol layers. We find that PBPS can reduce communication overhead in applications compared to using either network alone, while aggregating networks into a single virtual network can reduce communication latency for bandwidth-limited applications. The performance of the PBPD techniques depends on the mix of message sizes in the application program and the relative overheads of the networks, as demonstrated in our analytical models 相似文献

6.

Early stopping in Global Data Computation

Delporte-Gallet C. Fauconnier H. Helary J.-M. Raynal M. 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(9):909-921

The Global Data Computation problem consists of providing each process with the same vector (with one entry per process) such that each entry is filled by a value provided by the corresponding process. This paper presents a protocol that solves this problem in an asynchronous distributed system where processes can crash, but equipped with a perfect failure detector. This protocol requires that processes execute asynchronous computation rounds. The number of rounds is upper bounded by min(f+2, t+1, n), where n, t, and f represent the total number of processes, the maximum number of processes that can crash, and the number of processes that actually crash, respectively. This value is a lower bound for the number of rounds when t相似文献

7.

Efficient elections in chordal ring networks

Hagit Attiya Jan van Leeuwen Nicola Santoro Shmuel Zaks 《Algorithmica》1989,4(1-4):437-446

We study the message complexity of the problem of distributively electing a leader in chordal rings. Such networks consist of a basic ring with additional links, the extreme cases being the oriented ring and the complete graph with a full sense of direction. We present a general election algorithm for these networks, and prove its optimality. As a corollary, we show thatO(logn) chords at each processor suffice to obtain an algorithm that uses at mostO(n) messages; this improves and extends a previous work, where an algorithm, also usingO(n) messages, was suggested for the case where alln-1 chords exist (the oriented complete network). 相似文献

8.

基于历史相遇间隔的机会网络路由协议

下载免费PDF全文

张翼周四望《计算机工程》2011,37(14):85-87

针对大多数机会网络路由协议在寻找端到端通信链路时不能很好地抓住节点社会性质的问题,提出一种基于历史相遇间隔(HICR) 协议的路由算法。HICR协议利用社会关系的特点,根据节点之间的历史相遇间隔判断它们的亲密程度,转发消息给离目的节点更亲近的节点,使得消息朝更靠近目的节点方向发送。仿真结果表明,该HICR协议在网络资源有限的的情况下,与Epidemic协议和Prophet协议相比,能获得更高的消息交付率。相似文献

9.

Concurrent checkpoint initiation and recovery algorithms on asynchronous ring networks

《Journal of Parallel and Distributed Computing》2004,64(5):649-661

Checkpointing with rollback recovery is a well-known method for achieving fault-tolerance in distributed systems. In this work, we introduce algorithms for checkpointing and rollback recovery on asynchronous unidirectional and bi-directional ring networks. The proposed checkpointing algorithms can handle multiple concurrent initiations by different processes. While taking checkpoints, processes do not have to take into consideration any application message dependency. The synchronization is achieved by passing control messages among the processes. Application messages are acknowledged. Each process maintains a list of unacknowledged messages. Here we use a logical checkpoint, which is a standard checkpoint (i.e., snapshot of the process) plus a list of messages that have been sent by this process but are unacknowledged at the time of taking the checkpoint. The worst case message complexity of the proposed checkpointing algorithm is O(kn) when k initiators initiate concurrently. The time complexity is O(n). For the recovery algorithm, time and message complexities are both O(n). 相似文献

10.

Efficient rollback-recovery technique in distributed computingsystems

Ge-Ming Chiu Cheng-Ru Young 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(6):565-577

We propose an approach for implementing rollback recovery in a distributed computing system. A concept of logical ring is introduced for the maintenance of information required for consistent recovery from a system crash. Message processing order of a process is kept by all other processes on its logical ring. Transmission of data messages are accompanied by the circulation of the associated order messages on the ring. The sizes of the order messages are small. In addition, redundant transmission of order information is avoided, thereby reducing the communication overhead incurred during failure free operation. Furthermore, updating of the order information and garbage collection task are simplified in the proposed mechanism. Our approach does not require information about message processing order be written to stable storage; in fact, the time consuming operations of saving information in stable storage are confined to the checkpointing activities. When failures occur, a surviving process need roll back only if some preceding order information is totally lost, which is relatively unlikely considering the ever growing speed of communication networks. It is shown that a system can recover correctly as long as there exists at least one surviving process 相似文献

11.

Synchronous atomic broadcast for redundant broadcast channels 总被引：4，自引：3，他引：1

Flaviu Cristian 《Real-Time Systems》1990,2(3):195-212

We propose a synchronous atomic broadcast protocol for distributed real-time systems based on redundant broadcast channels. The protocol can tolerate a finite number f of concurrent processor crash failures, channel adapter performance failures and channel omission failures. Its message cost is optimal: when no failures occur only f+1 messages are sent per broadcast. The cost implications of providing tolerance to other failure classes are also investigated. 相似文献

12.

Efficient elections in chordal ring networks

Hagit Attiya Jan van Leeuwen Nicola Santoro Shmuel Zaks 《Algorithmica》1989,4(1):437-446

We study the message complexity of the problem of distributively electing a leader in chordal rings. Such networks consist of a basic ring with additional links, the extreme cases being the oriented ring and the complete graph with a full sense of direction. We present a general election algorithm for these networks, and prove its optimality. As a corollary, we show thatO(logn) chords at each processor suffice to obtain an algorithm that uses at mostO(n) messages; this improves and extends a previous work, where an algorithm, also usingO(n) messages, was suggested for the case where alln-1 chords exist (the oriented complete network).The work of this author was supported in part by the Natural Sciences and Engineering Research Council of Canada under Grant No. A2415.The work of this author was supported in part by Technion Research Grant No. 120-0641. 相似文献

13.

基于车载自组织网络的紧急报文传输方法

慕健高强朱建启《计算机应用研究》2019,36(4)

针对车载自组网中信息传输时延问题进行了研究,提出一个紧急报文的传输方法。通过Netfilter架构截取得到报文的数据信息,利用Linux虚拟设备重新组装安全警告信息报文,直接指向物理网卡发送给目的终端。传输过程减少了传统的缓存等待时间和TCP/IP对等层的额外开销,结果表明在传输过程中有效降低了传输时延,提高了车辆的行驶安全。相似文献

14.

A fully adaptive routing algorithm for dynamically injuredhypercubes, meshes, and tori

Ming-Jer Tsai Sheng-De Wang 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(2):163-174

Unicast V is a progressive, misrouting algorithm for packet or virtual cut-through networks. A progressive protocol forwards a message at an intermediate node if a nonfaulty profitable link is available and waits, deroutes, or aborts otherwise. A misrouting protocol uses both profitable and nonprofitable links at each node; thus, a message can move farther away from its destination at some steps. Unicast V is simple for hardware implementation, requires a very small message overhead, and makes routing decisions by local failure information only. However, it is claimed to be partially adaptive and to be able to tolerate static faults in hypercubes only. In this paper, we uncover some new features of Unicast V: (1) it is fully-adaptive, (2) it also applies to meshes and tori, and (3) it can tolerate dynamic faults by careful implementation. In addition, we also provide bounds on the performance of the algorithm 相似文献

15.

A fault-tolerant protocol for atomic broadcast

Luan S.-W. Gligor V.D. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(3):271-285

A general protocol for atomic broadcast in networks is presented. The protocol tolerates loss, duplication, reordering, delay of messages, and network partitioning in an arbitrary network of fail-stop sites (i.e. no Byzantine site behavior is tolerated). The protocol is based on majority-concensus decisions to commit on unique ordering of received broadcast messages. Under normal operating conditions, the protocol requires three phases to complete and approximately 4N/V messages where N is the number of sites. This overhead is distributed among the messages of which the delivery decision is made and the heavier the broadcast message traffic, the lower the overhead per broadcast message becomes. Under abnormal operating conditions, a decentralized termination protocol (also presented) is invoked. A performance analysis of this protocol is presented, showing that this protocol commits with high probability under realistic operating conditions without invoking termination protocol if N is sufficiently large. The protocol retains its efficiency in wide-area networks where broadcast communication media are unavailable 相似文献

16.

Total ordering group communication protocol based on coordinating sequencers for multiple overlapping groups

《Journal of Parallel and Distributed Computing》2005,65(4):437-447

This paper presents a total ordering protocol for group communication systems with multiple overlapping groups. Our protocol takes advantages of the simplicity of the sequencer-based approach, but employs multiple sequencers to achieve better load balance. For a given message, the sequencer of the destination group constructs a sequencing array by requesting for relative delivery positions from the sequencers of the overlapping groups. The sequencing array is used by any receiving process to determine the delivery sequence of the message. The notion of logical clock is used for determining the relative delivery sequences of the messages. The coordination between the sequencers is performed in a simple, asynchronous and non-blocking manner. The delivery operation at a receiving process is very simple, and a message can be delivered as soon as it becomes deliverable. These factors amount to a significant saving of the computing and communication overhead for the system. As the protocol demands the minimum effort from the group members it is suitable for mobile computing environment in which mobile devices are typically tight on resources. In the paper, we also present some ways to enhance the performance of the system. Extension of the protocol to encompass the preservation of causality, the dynamic group membership and the failure recovery is included in the paper. 相似文献

17.

Fault-tolerance and self-stabilization: impossibility results and solutions using self-stabilizing failure detectors

JOFFROY BEAUQUIER SYNNÖVE KEKKONEN-MONETA 《International journal of systems science》2013,44(11):1177-1187

This paper focuses on protocols that are simultaneously resilient to permanent failures (crash faults) and transient failures (memory and message corruption). First, we show that asynchronous round-based and fault-tolerant protocols cannot be transformed into protocols that are simultaneously fault-tolerant and self-stabilizing (ftss), as is otherwise possible in the synchronous mode of computation. Secondly, we show that it is impossible to find the number of processes (i.e. the size) on a family of networks, as it has been proven for the ring network. Finally, we present a ftss protocol for solving ring size by assuming that each process accesses a failure detector. We also propose two self-stabilizing implementations for the failure detector that differ in their degree of tolerance to transient failures. 相似文献

18.

一种间歇性连接移动网络自适应路由协议

蒋庆丰门朝光田泽宇马英瑞《计算机工程与科学》2018,40(1):52-57

为提高间歇性连接移动网络的消息发送效率,提出一种基于移动自组网OLSR协议的自适应路由协议ARPBO。ARPBO在网络连通时通过OLSR协议快速转发消息;在网络中断时对OLSR协议进行扩展,从消息发送节点的局部连通网络中有效选择下一跳节点,然后通过延迟容忍网络的"存储-携带-转发"机制转发消息。实验结果表明,该路由协议能够在网络存在间歇性连接时获得较高的传递成功率和较低的传递时延。相似文献

19.

无线传感器网络中一种改进的能效数据收集协议

马晨明王万良洪榛《计算机科学》2015,42(2):65-69,75

基于连通支配集的虚拟骨干是减少支配节点数量和限制路由搜索空间的关键技术,对于优化无线传感器网络生命起到重要作用。ViTAMin协议不但能通过关闭一些非必要节点产生虚拟骨干,而且能将采集的数据沿着距离基站能耗最低的路径进行发送,以节省能量。针对ViTAMin可能会产生非连通网络且支配节点能耗不均衡的问题,提出了一种基于虚拟骨干的能效数据收集协议EEVB。理论分析证明,EEVB能够以O(n)的时间与信息复杂度构造连通支配集,仿真实验进一步证实EEVB能够以较小的能耗开销构建规模较小的连通支配集,并有效延长网络的生命时间。相似文献

20.

Bounded phase analysis of message-passing programs

Ahmed Bouajjani Michael Emmi 《International Journal on Software Tools for Technology Transfer (STTT)》2014,16(2):127-146

We describe a novel technique for bounded analysis of asynchronous message-passing programs with ordered message queues. Our bounding parameter does not limit the number of pending messages, nor the number of “context-switches” between processes. Instead, we limit the number of process communication cycles, in which an unbounded number of messages are sent to an unbounded number of processes across an unbounded number of contexts. We show that remarkably, despite the potential for such vast exploration, our bounding scheme gives rise to a simple and efficient program analysis by reduction to sequential programs. As our reduction avoids explicitly representing message queues, our analysis scales irrespectively of queue content and variation. 相似文献