首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 13 毫秒
1.
Backward error recovery or rollback recovery is a well-known technique used in the design of reliable uni-processor computer systems. However, use of this technique for design of reliable distributed computer systems could result in uncontrolled rollback which is known as the domino effect. Here, we formally present the reasons for the domino effect by specifically identifying the unmatched receives (backward dependency) and unmatched sends (retransmission dependency) that occur during reexecution as the sole causes. A technique is developed which selectively stores the received messages in order to account for the backward dependencies and which coordinates the checkpoints to eliminate the retransmission dependencies, thus achieving domino-free rollback. Rollback is confined only to the interacting processes by dependency tracking of the messages. Further, we present a scheme for identifying and purging possibly contaminated messages during recovery. Our scheme is compared with some of the existing schemes and its significant advantages are highlighted.  相似文献   

2.
In the rollback recovery of large‐scale long‐running applications in a distributed environment, pessimistic message logging protocols enable failed processes to recover independently, though at the expense of logging every message synchronously during fault‐free execution. In contrast, coordinated checkpointing protocols avoid message logging, but they are poor in scalability with a sharply increased coordinating overhead as the system grows. With the aim of achieving efficient rollback recovery by trading off logging overhead and coordinating overhead, this paper suggests a partitioning of the system into clusters, and then presents a scheme to implement the conversion between these overheads. Using the proposed conversion, coordination can be introduced to reduce the unbearable logging overhead found in some systems, whereas proper logging can be employed to alleviate the unacceptable coordinating overhead in others. Furthermore, heuristics are introduced to address the issue of how to partition the system into clusters in order to speed up the recovery process and to improve recovery efficiency. Performance evaluation results indicate that our scheme can lower the overall system overhead effectively. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

3.
In this paper we analyze a retrial queue that can be used to model fault-tolerant systems with checkpointing and rollback recovery. We assume that the service time of each job is decomposed into N modules, at the end of each of which a checkpoint is established. Checkpointing and rollback recovery consists, basically, of saving periodically the state of the system on a secure device so that, upon recovery from a system failure, the system can resume the computation from the most recent checkpoint, rather than from the beginning. Upon a successful service completion of a job, the server activates a timer and remains awake. If the timer expires without a request, the server departs for a vacation. Upon returning from the vacation, the server activates the timer again. Furthermore, both idle and vacation periods can be interrupted by the server in order to perform secondary jobs. Applications of this model can be found in power saving of mobile devices in a half-duplex communication system operating in wireless environment, and in long-running software applications. We investigate stability condition and steady state analysis. We also apply a mean value analysis to obtain useful performance measures, and prove that the model satisfies the stochastic decomposition property. Useful energy metrics are determined and constrained optimization problems are formulated and used to obtain extensive numerical results.  相似文献   

4.
Checkpoint and rollback recovery is a well‐known technique for providing fault tolerance to long‐running distributed applications. Performance of a checkpoint and recovery protocol depends on the characteristics of the application and the system on which it runs. However, given an application and system environment, there is no easy way to identify which checkpoint and recovery protocol will be most suitable for it. Conventional approaches require implementing the application with all the protocols under consideration, running them on the desired system, and comparing their performances. This process can be very tedious and time consuming. This paper first presents the design and implementation of a simulation environment, distributed process simulation or dPSIM, which enables easy implementation and evaluation of checkpoint and recovery protocols. The tool enables the protocols to be simulated under a wide variety of application, system, and network characteristics. The paper then presents performance evaluation of five checkpoint and recovery protocols. These protocols are implemented and executed in dPSIM under different simulated application, system, and network characteristics. Copyright © 2003 John Wiley & Sons, Ltd.  相似文献   

5.
An approach to checkpointing and rollback recovery in a distributed computing system using a common time base is proposed. A common time base is established in the system using a hardware clock synchronization algorithm. This common time base is coupled with the idea of pseudo-recovery points to develop a checkpointing algorithm that has the following advantages: reduced wait for commitment for establishing recovery lines, fewer messages to be exchanged, and less memory requirement. These advantages are assessed quantitatively by developing a probabilistic model  相似文献   

6.
The existing user‐level checkpointing schemes support only a limited portion of multithreaded programs because they are derived from the schemes for single‐threaded applications. This paper addresses the impact of thread suspension point on rollback recovery, and presents a checkpointing scheme for multithreaded processes. Unlike the existing schemes in which the checkpointer suspends every working thread, our scheme employs a distinctive strategy that every working thread suspends itself. This technique manages to avoid the suspension point in the API code or kernel code, ensuring correct rollback recovery. Our scheme supports inter‐thread synchronization and thread lifetime. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

7.
Checkpointing with rollback recovery is a well-known method for achieving fault-tolerance in distributed systems. In this work, we introduce algorithms for checkpointing and rollback recovery on asynchronous unidirectional and bi-directional ring networks. The proposed checkpointing algorithms can handle multiple concurrent initiations by different processes. While taking checkpoints, processes do not have to take into consideration any application message dependency. The synchronization is achieved by passing control messages among the processes. Application messages are acknowledged. Each process maintains a list of unacknowledged messages. Here we use a logical checkpoint, which is a standard checkpoint (i.e., snapshot of the process) plus a list of messages that have been sent by this process but are unacknowledged at the time of taking the checkpoint. The worst case message complexity of the proposed checkpointing algorithm is O(kn) when k initiators initiate concurrently. The time complexity is O(n). For the recovery algorithm, time and message complexities are both O(n).  相似文献   

8.
Checkpoint and recovery protocols are commonly used in distributed applications for providing fault tolerance. The performance of a checkpoint and recovery protocol is judged by the amount of computation it can save against the amount of overhead it incurs. This performance depends on different system and application characteristics, as well as protocol specific parameters. Hence, no single checkpoint and recovery protocol works equally well for all applications, and given a distributed application and a system it will run on, it is important to choose a protocol that will give the best performance for that system and application. In this paper, we present a scheme to automatically identify a suitable checkpoint and recovery protocol for a given distributed application running on a given system. The scheme involves a novel technique for finding the similarity between the communication pattern of two distributed applications that is of independent interest also. The similarity measure is based on a graph similarity problem. We present a heuristic for the graph similarity problem. Extensive experimental results are shown both for the graph similarity heuristic and the automatic identification scheme to show that an appropriate checkpoint and recovery protocol can be chosen automatically for a given application.  相似文献   

9.
Probabilistic memory-based collaborative filtering   总被引:4,自引:0,他引:4  
Memory-based collaborative filtering (CF) has been studied extensively in the literature and has proven to be successful in various types of personalized recommender systems. In this paper, we develop a probabilistic framework for memory-based CF (PMCF). While this framework has clear links with classical memory-based CF, it allows us to find principled solutions to known problems of CF-based recommender systems. In particular, we show that a probabilistic active learning method can be used to actively query the user, thereby solving the "new user problem." Furthermore, the probabilistic framework allows us to reduce the computational cost of memory-based CF by working on a carefully selected subset of user profiles, while retaining high accuracy. We report experimental results based on two real-world data sets, which demonstrate that our proposed PMCF framework allows an accurate and efficient prediction of user preferences.  相似文献   

10.
针对带有时间约束的长时关键应用面临的容灾需求,提出一种基于检查点的应用容灾方法,在应用运行过程中定期保存应用的中间运行状态,并将中间状态异步传输到异地,灾难发生后应用无需重新运行,可自动从最近的检查点位置接力运行。给出自适应的检查点参数设定方法,并构造原型系统验证了该技术的有效性。  相似文献   

11.
Concurrency control algorithms have traditionally been based on locking and timestamp ordering mechanisms. Recently optimistic schemes have been proposed. In this paper a distributed, multi-version, optimistic concurrency control scheme is described which is particularly advantageous in a query-dominant environment. The drawbacks of the original optimistic concurrency control scheme, namely that inconsistent views may be seen by transactions (potentially causing unpredictable behavior) and that read-only transactions must be validated and may be rolled back, have been eliminated in the proposed algorithm. Read-only transactions execute in a completely asynchronous fashion and are therefore processed with very little overhead. Furthermore, the probability that read-write transactions are rolled back has been reduced by generalizing the validation algorithm. The effects of global transactions on local transaction processing are minimized. The algorithm is also free from dedlock and cascading rollback problems. Divyakant Agrawal is currently a graduate student in the Department of Computer Science at the State University of New York at Stony Brook. He received his B.E. degree from Birla Institute of Technology and Science, Pilani, India in 1980. He worked with Tata Burroughs Limited, from 1980 to 1982. He completed his M.S. degree in Computer Science from SUNY at Stony Brook in 1984. His research interests include design of algorithms for concurrent systems, optimistic protocols and distributed systems. Arthur Bernstein is a Professor of Computer Science at the State University of New York at Stony Brook. His research is concerned with the design and verification of algorithms involving asynchronous activity and with languages for expressing such algorithms. Pankaj Gupta is currently a graduate student in the Department of Computer Science at the State University of New York at Stony Brook. He received M.S. degree in Electrical Engineering from SUNY at Stony Brook in 1982 and M.S. degree in Computer Science from SUNY at Stony Brook in 1985. His research interests include distributed systems, concurrency control, and databases. Soumitra Sengupta is currently a graduate student in the Department of Computer Science at the State University of New York at Stony Brook. He received his B.E. degree from Birla Institute of Technology and Science, Pilani, India in 1980. He worked with Tata Consultancy Services, from 1980 to 1982. He completed his M.S. degree in Computer Science from SUNY at Stony Brook in 1984. His research interests include distributed algorithms, logic databases and concurrency control.This work was supported by the National Science Foundation under grant, DCR-8502161 and the Air Force Office of Scientific Research under grant AFOSR 810197  相似文献   

12.
Many real world problems, such as circuit designing and planning, can be encoded into the maximum satisfiability problem (MAX-SAT). To solve MAX-SAT, many effective local search heuristic algorithms have been reported in the literature. This paper aims to study how useful information could be gathered during the search history and used to enhance local search heuristic algorithms. For this purpose, we present an adaptive memory-based local search heuristic (denoted by AMLS) for solving MAX-SAT. The AMLS algorithm uses several memory structures to define new rules for selecting the next variable to flip at each step and additional adaptive mechanisms and diversification strategies. The effectiveness and efficiency of our AMLS algorithm is evaluated on a large range of random and structured MAX-SAT and SAT instances, many of which are derived from real world applications. The computational results show that AMLS competes favorably, in terms of several criteria, with four state-of-the-art SAT and MAX-SAT solvers AdaptNovelty+, AdaptG2WSATp, IRoTS and RoTS.  相似文献   

13.
In this work a visual-based autonomous system capable of memorizing and recalling sensory-motor associations is presented. The robot's behaviors are based on learned associations between its sensory inputs and its motor actions. Perception is divided into two stages. The first one is functional: algorithmic procedures extract in real time visual features such as disparity and local orientation from the input images. The second stage is mnemonic: the features produced by the different functional areas are integrated with motor information and memorized or recalled. An efficient memory organization and fast information retrieval enables the robot to learn to navigate and to avoid obstacles without need of an internal metric reconstruction of the external environment. Received: 22 November 1996 / Accepted: 18 November 1997  相似文献   

14.
A route navigation method for a mobile robot with an omnidirectional image sensor is described. The route is memorized from a series of consecutive omnidirectional images of the horizon when the robot moves to its goal. While the robot is navigating to the goal point, input is matched against the memorized spatio-temporal route pattern by using dual active contour models and the exact robot position and orientation is estimated from the converged shape of the active contour models.  相似文献   

15.
16.
To appropriately utilize the rapidly growing amount of data and information is a big challenge for people and organizations. Standard information retrieval methods, using sequential processing combined with syntax-based indexing and access methods, have not been able to adequately handle this problem. We are currently investigating a different approach, based on a combination of massive parallel processing with case-based (memory-based) reasoning methods. Given the problems of purely syntax-based retrieval methods, we suggest ways of incorporating general domain knowledge into memory-based reasoning. Our approach is related to the properties of the parallel processing microchip MS160, particularly targeted at fast information retrieval from very large data sets. Within this framework different memory-based methods are studied, differing in the type and representation of cases, and in the way that the retrieval methods are supported by explicit general domain knowledge. Cases can be explicitly stored information retrieval episodes, virtually stored abstractions linked to document records, or merely the document records themselves. General domain knowledge can be a multi-relational semantic network, a set of term dependencies and relevances, or compiled into a global similarity metric. This paper presents the general framework, discusses the core issues involved, and describes three different methods illustrated by examples from the domain of medical diagnosis.  相似文献   

17.
Memory-based collaborative filtering (CF) makes recommendations based on a collection of user preferences for items. The idea underlying this approach is that the interests of an active user will more likely coincide with those of users who share similar preferences to the active user. Hence, the choice and computation of a similarity measure between users is critical to rating items. This work proposes a similarity update method that uses an iterative message passing procedure. Additionally, this work deals with a drawback of using the popular mean absolute error (MAE) for performance evaluation, namely that ignores ratings distribution. A novel modulation method and an accuracy metric are presented in order to minimize the predictive accuracy error and to evenly distribute predicted ratings over true rating scales. Preliminary results show that the proposed similarity update and prediction modulation techniques significantly improve the predicted rankings.  相似文献   

18.
A hybrid approach of neural network and memory-based learning todata mining   总被引:4,自引:0,他引:4  
We propose a hybrid prediction system of neural network and memory-based learning. Neural network (NN) and memory-based reasoning (MBR) are frequently applied to data mining with various objectives. They have common advantages over other learning strategies. NN and MBR can be directly applied to classification and regression without additional transformation mechanisms. They also have strength in learning the dynamic behavior of the system over a period of time. Unfortunately, they have shortcomings when applied to data mining tasks. Though the neural network is considered as one of the most powerful and universal predictors, the knowledge representation of NN is unreadable to humans, and this "black box" property restricts the application of NN to data mining problems, which require proper explanations for the prediction. On the other hand, MBR suffers from the feature-weighting problem. When MBR measures the distance between cases, some input features should be treated as more important than other features. Feature weighting should be executed prior to prediction in order to provide the information on the feature importance. In our hybrid system of NN and MBR, the feature weight set, which is calculated from the trained neural network, plays the core role in connecting both learning strategies, and the explanation for prediction can be given by obtaining and presenting the most similar examples from the case base. Moreover, the proposed system has advantages in the typical data mining problems such as scalability to large datasets, high dimensions, and adaptability to dynamic situations. Experimental results show that the hybrid system has a high potential in solving data mining problems.  相似文献   

19.
We show how to support efficient back traversal in a unidirectional list, using small memory and with essentially no slowdown in forward steps. Using O(lgn)O(lgn) memory for a list of size nn, the ii’th back-step from the farthest point reached so far takes O(lgi)O(lgi) time in the worst case, while the overhead per forward step is at most ?? for arbitrary small constant ?>0?>0. An arbitrary sequence of forward and back steps is allowed. A full trade-off between memory usage and time per back-step is presented: kk vs. kn1/kkn1/k and vice versa. Our algorithms are based on a novel pebbling technique which moves pebbles on a virtual binary, or n1/kn1/k-ary, tree that can only be traversed in a pre-order fashion.  相似文献   

20.
《Performance Evaluation》1987,7(2):111-124
We analyze a probabilistic model of two processors sharing a common task, such as distributed simulation, proceeding at possibly different rates and with each allowed its own clock and ‘virtual time’. When the task requires it, the processors communicate by messages which are stamped with the local time at the point of origination. This feature is also used to synchronize the processors by ‘rollback’, in which technique a processor receiving a message in its past is required to set its clock and state back as necessary to guarantee correctness. The probabilistic model incorporates for each processor: (i) a distribution for the amount by which the local clock is incremented at each state transition, and (ii) a probability indicating the likelihood of generating a message at each state transition.For the case of exponential distributions we obtain by the method of Wiener-Hopf factorization explicit formulas for the steady-state distribution of the difference in virtual times, the average rate of progress and the average amount of rollback per state transition. We introduce a performance measure which reflects both the average rate of progress and the cost of implementing rollback. This performance measure is optimized with respect to the system parameters and an explicit solution is obtained. This result provides remarkably explicit insights into optimal operations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号