期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Byzantine agreement in the presence of mixed faults on processorsand links

Hin-Sing Siu Yeh-Hao Chin Wei-Pang Yang 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(4):335-345

In early stage, the Byzantine agreement (BA) problem was studied with single faults on processors in either a fully connected network or a nonfully connected network. Subsequently, the single fault assumption was extended to mixed faults (also referred to as hybrid fault model) on processors. For the case of both processor and link failures, the problem has been examined in a fully connected network with a single faulty type, namely an arbitrary fault. To release the limitations of a fully connected network and a single faulty type, the problem is reconsidered in a general network. The processors and links in such a network can both be subjected to different types of fault simultaneously. The proposed protocol uses the minimum number of message exchanges and can tolerate the maximum number of allowable faulty components to make each fault-free processor reach an agreement 相似文献

2.

Grouping Byzantine Agreement

《Computer Standards & Interfaces》2006,28(1):75-92

The reliability of the distributed system has always been an important topic of research. Byzantine Agreement (BA) protocol, which allows the fault-free processors to agree on a common value, is one of the most fundamental problems studied in a distributed system. In previous works, the problem was visited in a fully connected network or an unfully connected network with fallible processors. In this paper, the BA problem is reexamined in a group-oriented network, which has the feature of grouping, and the network topology does not have to be fully connected. We also enlarge the fault tolerant capability by allowing dormant faults and malicious faults (also called as the dual failure mode) to exist in a group-oriented network simultaneously. The proposed protocol is more efficient than the traditional BA protocols and can tolerate the maximum number of tolerable faulty processors. 相似文献

3.

Optimal agreement protocol in malicious faulty processors andfaulty links

Kuo-Qin Yan Chin Y.H. Shu-Ching Wang 《Knowledge and Data Engineering, IEEE Transactions on》1992,4(3):266-280

Traditionally, the problems of Byzantine agreement, consensus, and interactive consistency are studied in a fully connected network with processors in malicious failure only. Such problems are reexamined with the assumption of malicious faults on both processors and links. The proposed protocols use the minimum number of message exchanges and can tolerate the maximum number of allowable faulty components to make each fault-free processor reach a common agreement for the cases of processor failure, link failure, or processor and link failure 相似文献

4.

From immediate agreement to eventual agreement: early stopping agreement protocol for dynamic networks with malicious faulty processors

Chien-Fu Cheng Kuo-Tang Tsai 《The Journal of supercomputing》2012,62(2):874-894

With the rapid advancement of wireless networking technology, networks have evolved from static to dynamic. Reliability of dynamic networks has virtually become an important issue. Fortunately, a solution to the above issue can be derived from solutions to the Byzantine Agreement (BA) problem. BA problem can be solved by protocols that make processors reach an agreement through message exchange. Protocols used to solve the problem can be divided into Immediate Byzantine Agreement (IBA) protocols and Eventual Byzantine Agreement (EBA) protocols. In IBA protocols, the number of rounds of message exchange is determined by the total number of processors in the network. Even if no faulty processor is present in the network, IBA protocols still require a fixed number of rounds of message exchange, causing a waste of time. In contrast, EBA protocols dynamically adjust the number of rounds of message exchange according to the interference of faulty processors. In terms of efficiency, EBA protocols certainly outperform IBA protocols. Due to the fact that the existing EBA protocols have been designed for static networks, they cannot work on dynamic networks. In this paper, we revisit the EBA problem in dynamic networks to increase the reliability of dynamic networks. Simulations will be conducted to validate that the proposed protocol requires the minimum rounds of message exchange and can tolerate the maximum number of malicious faulty processors compared to other existing protocols. 相似文献

5.

Byzantine Agreement under dual failure mobile network

《Computer Standards & Interfaces》2006,28(4):475-492

Networks are trending towards wireless systems that provide support for mobile computing. The Byzantine Agreement (BA) protocols used in static networks do not perform well in a dynamically changing mobile environment. Mobile commerce and related applications are necessary for wireless networks. There are numerous properties in a wireless network that play important roles. For example, the processors in a wireless network have highly mobile capabilities. Processors can immigrate into or move away from the network at any time. Although mobile technology has brought greater convenience, it is comparatively more dangerous. Wireless systems are susceptible to security flaws such as attacks by hackers. The number of allowable faulty components within the system is also decreased. To increase the number of allowable faulty components and ensure network security, a simple, secure and efficient protocol, BAM, is proposed to handle the BA problem. The fault symptoms include malicious and dormant faults. Furthermore, the proposed protocol uses the minimum number of message exchange rounds to make all healthy processors agree on a common value and can tolerate the maximum number of allowable faulty components. The proposed method will also ensure message security and increase the system's fault tolerant capability. 相似文献

6.

Optimal communication algorithms for hypercubes

D. P. Bertsekas C. zveren G. D. Stamoulis P. Tseng J. N. Tsitsiklis 《Journal of Parallel and Distributed Computing》1991,11(4)

We consider the following basic communication problems in a hypercube network of processors: the problem of a single processor sending a different packet to each of the other processors, the problem of simultaneous broadcast of the same packet from every processor to all other processors, and the problem of simultaneous exchange of different packets between every pair of processors. The algorithms proposed for these problems are optimal in terms of execution time and communication resource requirements; that is, they require the minimum possible number of time steps and packet transmissions. In contrast, algorithms in the literature are optimal only within an additive or multiplicative factor. 相似文献

7.

Broadcasting Spanning Forests on a Multiple-Access Channel

Bogdan S. Chlebus Karol Golab Dariusz R. Kowalski 《Theory of Computing Systems》2003,36(6):711-733

The problem of finding a spanning forest of a graph in a distributed-processing environment is studied. If an input graph is weighted, then the goal is to find a minimum-weight spanning forest. The processors communicate by broadcasting. The output consists of the edges that make a spanning forest and have been broadcast on the network. Input edges are distributed among the processors, with each edge held by one processor. The underlying broadcast network is implemented as a multiple-access channel. If exactly one processor attempts to perform a broadcast, then the broadcast is successful. A message broadcast successfully is delivered to all the processors in one step. If more than one processors broadcast simultaneously, then the messages interfere with each other and no processor can receive any of them. Optimality of algorithmic solutions is investigated, by way of comparing deterministic with randomized algorithms, and adaptive with oblivious ones. Lower bounds are proved that either justify the optimality of specific algorithms or show that the optimal performance depends on a class of algorithms. 相似文献

8.

Software dependability in the Tandem GUARDIAN system

Inhwan Lee Iyer R.K. 《IEEE transactions on pattern analysis and machine intelligence》1995,21(5):455-467

Based on extensive field failure data for Tandem's GUARDIAN operating system, the paper discusses evaluation of the dependability of operational software. Software faults considered are major defects that result in processor failures and invoke backup processes to take over. The paper categorizes the underlying causes of software failures and evaluates the effectiveness of the process pair technique in tolerating software faults. A model to describe the impact of software faults on the reliability of an overall system is proposed. The model is used to evaluate the significance of key factors that determine software dependability and to identify areas for improvement. An analysis of the data shows that about 77% of processor failures that are initially considered due to software are confirmed as software problems. The analysis shows that the use of process pairs to provide checkpointing and restart (originally intended for tolerating hardware faults) allows the system to tolerate about 75% of reported software faults that result in processor failures. The loose coupling between processors, which results in the backup execution (the processor state and the sequence of events) being different from the original execution, is a major reason for the measured software fault tolerance. Over two-thirds (72%) of measured software failures are recurrences of previously reported faults. Modeling, based on the data, shows that, in addition to reducing the number of software faults, software dependability can be enhanced by reducing the recurrence rate 相似文献

9.

The incremental agreement

M.L. Chiang L.Y. Tseng 《Information Processing Letters》2008,107(5):165-170

To achieve reliable distributed systems, the fault-tolerance must be studied. One of the most important problems of fault-tolerance issues lies in the Byzantine Agreement (BA) problem. The primary issue surrounding BA is that fault-free processors must obtain common agreement even in cases where faults persist. In this field, the fault diagnosis protocol has been proposed so that each fault-free processor detects/locates a common set of faulty processors. However, in this study, the incremental agreement is invoked to make each processor able to agreement upon executing the fault diagnosis protocol using minimal rounds of message exchange in the presence of dual failure characteristics of processors. 相似文献

10.

Eventual strong consensus with fault detection in the presence of dual failure mode on processors under dynamic networks

《Journal of Network and Computer Applications》2012,35(4):1260-1276

The fault tolerance capability and reliability of a distributed system can be enhanced if the Strong Consensus (SC) problem can be properly addressed. Most of the extant SC protocols are designed for static networks. Besides, the number of rounds of message exchange required by all of the extant SC protocols is determined by the total number of processors in the network rather than by the actual number of faulty processors in the network. Even if there is only a few or no faulty processor in the network, the SC protocols may waste a lot of time and memory space on many unnecessary rounds of message exchange. Thus, this paper revisits the SC problem in dynamic networks and uses two rules, Detection Rule for Malicious fault in dynamic network (DRM_dyn) and Early Stopping Rule for Strong Consensus protocol in dynamic networks (ESRSC_dyn), to reduce the time consumption and space complexity of SC protocols. DRM_dyn is a rule that detects malicious processors, and ESRSC_dyn is a rule that determines whether the messages collected are enough for reaching a strong consensus. To be succinct, the proposed SC protocol can not only work in dynamic networks consisting of both dormant processors and malicious processors (dual failure mode) but also ensure that all correct processors reach a SC value within fewer rounds of message exchange than required by the extant SC protocols. 相似文献

11.

A new solution for the Byzantine agreement problem

Hui-Ching HsiehAuthor Vitae 《Journal of Parallel and Distributed Computing》2011,71(10):1261-1277

Reliability is an important research topic in distributed computing systems consisting of a large number of processors. To achieve reliability, the fault-tolerance scheme of the distributed computing system must be revised. This kind of problem is known as a Byzantine agreement (BA) problem. It requires all fault-free processors to agree on a common value, even if some components are corrupt. Consequently, there have been significant studies of this agreement problem in distributed systems. However, the traditional BA protocols focus on running ⌊(n−1)/3⌋+1 rounds of message exchange continuously to make each fault-free processor reach an agreement. In other words, since having a large number of messages results in a large protocol overhead, those protocols are inefficient and unreasonable, especially for some network environments which have large number of processors. In this study, we propose a novel and efficient protocol to reduce the number of messages. Our protocol can collect, compare and replace the received values to find the reliable processors and replace the values sent by the unreliable processors. Subsequently, each processor can agree on a common value through three rounds of message exchange. Furthermore, the proposed protocol can use the minimum number of messages to tolerate the maximum number of faulty components in a distributed system. 相似文献

12.

The anatomy study of server-initial agreement for general hierarchy wired/wireless networks

Chien-Fu Cheng Shu-Ching Wang Tyne Liang 《Computer Standards & Interfaces》2009,31(1):219-226

The Byzantine Agreement (BA) plays a key role in fault-tolerant distributed system design. A number of solutions to the BA problem based on various network model assumptions have been proposed. However, most existing BA protocols are designed for pure wired or pure wireless networks. In practice, most current networks are combined wired and wireless environments. In this paper, we extend the BA problem over a combined wired/wireless network, consisting of both powerful computing stationary processor and low-power mobile processor. The communication overhead of BA protocol is inherently large and secure group communications are important. The protocols proposed in this paper use the hierarchical model concept to reduce the communication overhead and provide secure group communications well suited for combined wired/wireless networks. 相似文献

13.

Broadcasting Sequential Processes (BSP)

Gehani Narain H. 《IEEE transactions on pattern analysis and machine intelligence》1984,(4):343-351

Communication in a broadcast protocol multiprocessor (BPM) is inherently different from that in distributed systems formed by explicit links between processors. A message broadcast by a processor in a BPM is received directly by all other processors in the network instead of being restricted to only one processor. Broadcasting is an inexpensive way of communicating with a large number of processors on a BPM. In this paper I will describe a new approach to user-level distributed programming called broadcast programming, i.e., distributed programs written as cooperating broadcasting sequential processes (BSP). Existing concurrent programming languages do not provide facilities to exploit the broadcast capability of a BPM. The idea of distributed programs written as BSP is tailored to exploiting a BPM architecture but is not restricted to such an architecture-however, implementation of the broadcast capability may not be as efficient on other architectures. I will illustrate the utility and convenience of broadcast programming with many examples. These examples will also be used to explore the suitability and advantages of BSP and to determine appropriate facilities for BSP. 相似文献

14.

A note on consensus on dual failure modes

Hin-Sing Siu Yeh-Hao Chin Wei-Pang Yang 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(3):225-230

F.J. Meyer and D.K. Pradhan (1991) proposed the MS (for “mixed-sum”) algorithm to solve the Byzantine Agreement (BA) problem with dual failure modes: arbitrary faults (Byzantine faults) and dormant faults (essentially omission faults and timing faults). Our study indicates that this algorithm uses an inappropriate method to eliminate the effects of dormant faults and that the bound on the number of allowable faulty processors is overestimated. This paper corrects the algorithm and gives a new bound for the allowable faulty processors 相似文献

15.

Multi-Fault Tolerance for Cartesian Data Distributions

Nawab Ali Sriram Krishnamoorthy Mahantesh Halappanavar Jeff Daily 《International journal of parallel programming》2013,41(3):469-493

Faults are expected to play an increasingly important role in how algorithms and applications are designed to run on future extreme-scale systems. Algorithm-based fault tolerance is a promising approach that involves modifications to the algorithm to recover from faults with lower overheads than replicated storage and a significant reduction in lost work compared to checkpoint-restart techniques. Fault-tolerant linear algebra algorithms employ additional processors that store parities along the dimensions of a matrix to tolerate multiple, simultaneous faults. Existing approaches assume regular data distributions (blocked or block-cyclic) with the failures of each data block being independent. To match the characteristics of failures on parallel computers, we extend these approaches to mapping parity blocks in several important ways. First, we handle parity computation for generalized Cartesian data distributions with each processor holding arbitrary subsets of blocks in a Cartesian-distributed array. Second, techniques to handle correlated failures, i.e., multiple processors that can be expected to fail together, are presented. Third, we handle the colocation of parity blocks with the data blocks and do not require them to be on additional processors. Several alternative approaches, based on graph matching, are presented that attempt to balance the memory overhead on processors while guaranteeing the same fault tolerance properties as existing approaches that assume independent failures on regular blocked data distributions. Evaluation of these algorithms demonstrates that the additional desirable properties are provided by the proposed approach with minimal overhead. 相似文献

16.

The anatomy study of consensus agreement in MANETs

Mao-Lun Chiang Author Vitae Lin-Yu Tseng^{Author Vitae} 《Computers & Electrical Engineering》2010,36(1):234-253

Reliability is an important research topic of distributed systems. To achieve fault-tolerance in the distributed systems, healthy processors need to reach a common agreement before performing certain special tasks, even if faults exist in many circumstances. This problem is called as the Byzantine Agreement (BA) problem and it must be addressed. In general, the traditional BA problem is solved in well-defined networks. However, the MANETs (Mobile Ad-hoc Network) are increasing in popularity and its network topology is dynamic in nature. In this paper, the BA problem is re-examined in MANETs. Our protocol uses the minimum number of message exchanges to reach an agreement within the distributed system while tolerating the maximum number of faulty processors in MANETs. 相似文献

17.

Fault-Tolerant Scheduling for Real-Time Embedded Control Systems 总被引：8，自引：0，他引：8

下载免费PDF全文

Chun-HuaYang GeertDeconinck Wei-HuaGui 《计算机科学技术学报》2004,19(2):0-0

With the increasing complexity of industrial application, an embedded control system (ECS) requires processing a number of hard real-time tasks and needs fault-tolerance to assure high reliability. Considering the characteristics of real-time tasks in ECS, an integrated algorithm is proposed to schedule real-time tasks and to guarantee that all real-time tasks are completed before their deadlines even in the presence of faults. Based on the nonpreemptive critical-section protocol (NCSP), this paper analyzes the blocking time introduced by resource conflicts of relevancy tasks in fault-tolerant multiprocessor systems. An extended schedulability condition is presented to check the assignment feasibility of a given task to a processor. A primary/backup approach and on-line replacement of failed processors are used to tolerate processor failures. The analysis reveals that the integrated algorithm bounds the blocking time, requires limited overhead on the number of processors, and still assures good processor utilization. This is also demonstrated by simulation results. Both analysis and simulation show the effectiveness of the proposed algorithm in ECS. 相似文献

18.

The optimal generalized Byzantine Agreement in Cluster-based Wireless Sensor Networks

《Computer Standards & Interfaces》2014,36(5):821-830

A Wireless Sensor Network (WSN) is a wireless network consisting of spatially distributed autonomous devices using sensor nodes in a wide range of applications in various domains. In the future, WSNs are expected to be integrated into the “Internet of Things” (IoT), where sensor nodes join the Internet dynamically, and use them to collaborate and accomplish their tasks. Because of the communications of WSN will produce a broadcast storm, the Cluster-based Wireless Sensor Network (CWSN) was proposed to ameliorate the broadcast storm. However, the capability of the fault-tolerance and reliability of CWSNs must be carefully investigated and analyzed. To cope with the influence of faulty components, reaching a common agreement in the presence of faults before performing certain tasks is essential. Byzantine Agreement (BA) problem is a fundamental problem in fault-tolerant distributed systems. To enhance fault-tolerance and reliability of CWSN, the BA problem in CWSN is revisited in this paper. In this paper, a new BA protocol is proposed that adapts to the CWSN and derives its limit of allowable faulty components, while maintaining the minimum number of message exchanges. 相似文献

19.

Reaching Agreement among Virtual Subnets in Hybrid Failure Mode

Shu-Ching Wang Kuo-Qin Yan Shun-Sheng Wang Guang-Yan Zheng 《Parallel and Distributed Systems, IEEE Transactions on》2008,19(9):1252-1262

Fault-tolerance is an important research topic in the study of distributed systems. To cope with the influence of faulty components, reaching a common agreement in the presence of faults before performing certain tasks is essential. However, the Byzantine Agreement (BA) problem is a fundamental problem in fault-tolerant distributed systems. In previous studies, protocols dealing with the BA problem focused on static networks; however, these do not perform well in dynamically changing mobile networks. The most well known mobile network is the Mobile Ad-hoc Network (MANET). To enhance fault-tolerance and MANET reliability, the BA problem in virtual subnets of MANET is revisited in this paper. The proposed protocol is called the Hybrid Agreement Protocol (HAP). It achieves agreement on a common value among all functional mobile processors in a minimal number of message exchange rounds, and can tolerate a maximal number of allowable faulty components in the virtual subnet of MANET. 相似文献

20.

On the design of communication-aware fault-tolerant scheduling algorithms for precedence constrained tasks in grid computing systems with dedicated communication devices

Qin Zheng Bharadwaj Veeravalli 《Journal of Parallel and Distributed Computing》2009

Fault-tolerant scheduling is an imperative step for large-scale computational Grid systems, as often geographically distributed nodes co-operate to execute a task. By and large, primary-backup approach is a common methodology used for fault tolerance wherein each task has a primary and a backup on two different processors. In this paper, we address the problem of how to schedule DAGs in Grids with communication delays so that service failures can be avoided in the presence of processors faults. The challenge is, that as tasks in a DAG have dependence on each other, a task must be scheduled to make sure that it will succeed when any of its predecessors fails due to a processor failure. We first propose a communication model and determine when communications between a backup and backups of its successors are necessary. Then we determine when a backup can start and its eligible processors so as to guarantee that every DAG can complete upon any processor failure. We develop two algorithms to schedule backups, which minimize response time and replication cost, respectively. We also develop a suboptimal algorithm which targets minimizing replication cost while not affecting response time. We conduct extensive simulation experiments to quantify the performance of the proposed algorithms. 相似文献