首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Mobility database that stores the users’ location records is very important to connect calls to mobile users on personal communication networks. If the mobility database fails, calls to mobile users may not be set up in time. This paper studies failure restoration of mobility database. We study per-user location record checkpointing schemes that checkpoint a user’s record into a non-volatile storage from time to time on a per-user basis. When the mobility database fails, the user location records can be restored from the backup storage. Numeric analysis has been used to choose the optimum checkpointing interval so that the overall cost is minimized. The cost function that we consider includes the cost of checkpointing a user’s location record and the cost of paging a user due to an invalid location record. Our results indicate that when user registration intervals are exponentially distributed, the user record should never be checkpointed if checkpointing costs more than paging. Otherwise, if paging costs more, the user record should be always checkpointed when a user registers.  相似文献   

2.
This paper studies failure restoration of mobility databases for personal communication networks (specifically, VLRs and HLRs). We model the VLR restoration with and without checkpointing. The optimal VLR checkpointing interval is derived to balance the checkpointing cost against the paging cost. We also model GSM periodic location updating (location confirmation) to quantify the relationship between the location confirmation frequency and the number of lost calls. The HLR failure restoration procedures for IS-41 and GSM are described. We show the number of lost calls in a HLR failure. Both the procedures in IS-41 and GSM cannot identify the VLRs that need to be accessed by the HLR after a failure. An algorithm is proposed to identify the VLRs, which can be used to aggressively restore a HLR after its failure.  相似文献   

3.
4.
We herein propose a heuristic redundancy selection algorithm that combines resubmission, replication, and checkpointing redundancies to reduce the resiliency overhead in fault‐tolerant workflow scheduling. The appropriate combination of these redundancies for workflow tasks is obtained in two consecutive phases. First, to compute the replication vector (number of task replicas), we apportion the set of provisioned resources among concurrently executing tasks according to their needs. Subsequently, we obtain the optimal checkpointing interval for each task as a function of the number of replicas and characteristics of tasks and computational environment. We formulate the problem of obtaining the optimal checkpointing interval for replicated tasks in situations where checkpoint files can be exchanged among computational resources. The results of our simulation experiments, on both randomly generated workflow graphs and real‐world applications, demonstrated that both the proposed replication vector computation algorithm and the proposed checkpointing scheme reduced the resiliency overhead.  相似文献   

5.
Employing fault tolerance often introduces a time overhead, which may cause a deadline violation in real-time systems (RTS). Therefore, for RTS it is important to optimize the fault tolerance techniques such that the probability to meet the deadlines, i.e. the Level of Confidence (LoC), is maximized. Previous studies have focused on evaluating the LoC for equidistant checkpointing. However, no studies have addressed the problem of evaluating the LoC for non-equidistant checkpointing. In this work, we provide an expression to evaluate the LoC for non-equidistant checkpointing. Further, we detail an exhaustive search approach to find the distribution of a given number of checkpoints that results in the maximal LoC. Since the exhaustive search approach is very time-consuming, we propose the Clustered Checkpointing method, a heuristic that distributes checkpoints in a number of clusters with the goal to maximize the LoC. The results show that the LoC can be improved when non-equidistant checkpointing is used. Further, the results indicate that the proposed Clustered Checkpointing method is capable to find the distribution that results in the maximal LoC in much shorter time than the exhaustive search approach, while considering only few clusters.  相似文献   

6.
A commonly used model for signal fading in many types of communication channels is that the amplitude of the received signal at a given time is a Rayleigh-distributed random variable. In this paper we show how classical statistical techniques may be applied to the problem of estimating the Rayleigh distribution parameter (i.e., the mean), given samples from the distribution. In particular, we first consider the problem of estimating the population mean, given a sequence of independent samples. We derive an unbiased maximum-likelihood estimator. We show that this estimator is unique, and since it is based on a sufficient statistic, it is therefore "best" in the Blackwell-Rao sense of minimizing expected loss. Using this estimator, we then develop confidence intervals whose length can be used as a guide in selecting the required sample size. We then consider the same estimation problem when the signal samples are obtained from the output of a logarithmic receiver. We derive an interval estimator which does not require taking the antilogs of the log samples, and which is not appreciably worse than the "best" estimator.  相似文献   

7.
A commonly used model for signal fading in many types of communication channels is that the amplitude of the received signal at a given time is a Rayleigh-distributed random variable. In this paper we show how classical statistical techniques may be applied to the problem of estimating the Rayleigh distribution parameter (i.e., the mean), given samples from the distribution. In particular, we first consider the problem of estimating the population mean, given a sequence of independent samples. We derive an unbiased maximum-likelihood estimator. We show that this estimator is unique, and since it is based on a sufficient statistic, it is therefore "best" in the Blackwell-Rao sense of minimizing expected loss. Using this estimator, we then develop confidence intervals whose length can be used as a guide in selecting the required sample size. We then consider the same estimation problem when the signal samples are obtained from the output of a logarithmic receiver. We derive an interval estimator which does not require taking the antilogs of the log samples, and which is not appreciably worse than the "best" estimator.  相似文献   

8.
This paper presents selective checkpointing and rollback schemes for MT-OO (multithreaded, object-oriented) programs. There is a need for checkpointing mechanisms that are more sophisticated than the traditional process-level checkpointing. The program model, theoretical foundations, and an implementation of the selective checkpointing and rollback schemes are described. The usefulness of the schemes is demonstrated by implementing a higher level fault-tolerance scheme of conversations using them. The performance implications are studied on a prototype Internet e-commerce-server. The use of the selective schemes in the prototype server showed an appreciable reduction in the loss of work in the presence of faults. Benefits are more pronounced for a larger level of concurrency in the server. The selective scheme usually outperforms the hypothetical zero-cost global scheme in the presence of faults, vis-a-vis completion times. The experiments also show the vast difference between the sizes of selective checkpoints and global checkpoints. The concurrent sessions scheme (based on the concept of relaxed conversations) required 160 checkpoints in less than an hour. Traditionally, such a scheme would be considered outrageous, but the selective schemes still improve performance in the presence of faults  相似文献   

9.
Real-time computer systems are often used in harsh environments, such as aerospace, and in industry. Such systems are subject to many transient faults while in operation. Checkpointing enables a reduction in the recovery time from a transient fault by saving intermediate states of a task in a reliable storage facility, and then, on detection of a fault, restoring from a previously stored state. The interval between checkpoints affects the execution time of the task. Whereas inserting more checkpoints and reducing the interval between them reduces the reprocessing time after faults, checkpoints have associated execution costs, and inserting extra checkpoints increases the overall task execution time. Thus, a trade-off between the reprocessing time and the checkpointing overhead leads to an optimal checkpoint placement strategy that optimizes certain performance measures. Real-time control systems are characterized by a timely, and correct, execution of iterative tasks within deadlines. The reliability is the probability that a system functions according to its specification over a period of time. This paper reports on the reliability of a checkpointed real-time control system, where any errors are detected at the checkpointing time. The reliability is used as a performance measure to find the optimal checkpointing strategy. For a single-task control system, the reliability equation over a mission time is derived using the Markov model. Detecting errors at the checkpointing time makes reliability jitter with the number of checkpoints. This forces the need to apply other search algorithms to find the optimal number of checkpoints. By considering the properties of the reliability jittering, a simple algorithm is provided to find the optimal checkpoints effectively. Finally, the reliability model is extended to include multiple tasks by a task allocation algorithm  相似文献   

10.
This paper studies the failure restoration of mobility database for Universal Mobile Telecommunications System (UMTS). We consider a per-user checkpointing approach for the home location register (HLR) database. In this approach, individual HLR records are saved into a backup database from time to time. When a failure occurs, the backup record is restored back to the mobility database. We first describe a commonly used basic checkpoint algorithm. Then, we propose a new checkpoint algorithm. An analytic model is developed to compare these two algorithms in terms of the checkpoint cost and the probability that an HLR backup record is obsolete. This analytic model is validated against simulation experiments. Numerical examples indicate that our new algorithm may significantly outperform the basic algorithm in terms of both performance measures.  相似文献   

11.
In this paper we develop a mathematical model for determining a periodic inspection schedule in a preventive maintenance program for a single machine subject to random failure. We formulate the problem as a profit maximization model with general failure time distribution. We show that under certain conditions on the probability density function of failure, a unique optimal inspection interval can be obtained. When the failure times are exponentially distributed, we propose alternative optimal and heuristic procedures to find exact and approximate inspection intervals. Our heuristic solution method is shown numerically to be more efficient than an earlier published heuristic procedure. We also investigated the sensitivity of the optimal inspection interval and expected profit per unit of time with respect to the changes in the two parameters of the Weibull time to failure distribution.  相似文献   

12.
Power-neutral system design avoids energy buffers by directly powering the load by the energy harvester. In case of a power loss, checkpointing methods ensure forward progress by preserving the volatile system state using non-volatile memories. The timely detection of upcoming power losses is essential for a reliable checkpointing process. Moreover, various applications require early detections to, e.g., ensure the finalization of atomic operations. However, common voltage threshold-based methods only allow short-term detections.In this paper we propose a new methodology that allows early detections by exploiting physical characteristics of the harvester. To this end, small-scale kinetic energy harvesters are considered that employ rotatably mounted mechanical masses to drive electromagnetic generators. Due to the inertia of these masses, the power output does not stop abruptly, but gradually decays after the excitation of the harvester is over.We investigate the relationship between the initial excitation intensity as it is reflected in the output frequency, the load current and the remaining period of power availability. Our results indicate that this relationship allows to predict the power duration based on the output frequency of the harvester. We show that power losses can be detected up to one order of magnitude earlier with our frequency-based method than with state-of-the-art voltage-based methods.  相似文献   

13.
刘勇鹏  王锋  卢凯  刘勇燕 《电子学报》2012,40(2):223-229
在大规模并行计算系统中,并行检查点触发大量结点同时保存计算状态,造成巨大文件存储空间开销,以及对通信和存储系统的巨大访问压力.数据压缩可以缩小检查点文件尺寸,从而降低存储空间开销以及对通信和存储系统的访问压力.但是,它也带来额外的压缩计算开销.本文针对异构并行计算系统,提出流水线式并行压缩检查点技术,采用一系列优化技术来降低压缩引入的计算延时,包括:流水线式双重写缓存队列、文件写操作的合并、GPU加速的流水压缩算法和GPU资源的多进程调度,等等.本文介绍了该技术在天河一号系统中的实现,并对所实现的检查点系统进行综合评测.实验数据表明该方法在大规模异构并行计算系统中是可行、高效、实用的.  相似文献   

14.
Bandwidth aggregation is a key research issue in integrating heterogeneous wireless networks, since it can substantially increase the throughput and reliability for enhancing streaming video quality. However, the burst loss in the unreliable wireless channels is a severely challenging problem which significantly degrades the effectiveness of bandwidth aggregation. Previous studies mainly address the critical problem by reactively increasing the forward error correction (FEC) redundancy. In this paper, we propose a loss tolerant bandwidth aggregation approach (LTBA), which proactively leverages the channel diversity in heterogeneous wireless networks to overcome the burst loss. First, we allocate the FEC packets according to the ‘loss-free’ bandwidth of each wireless network to the multihomed client. Second, we deliberately insert intervals between the FEC packets’ departures while still respecting the delay constraint. The proposed LTBA is able to reduce the consecutive packet loss under burst loss assumption. We carry out analysis to prove that the proposed LTBA outperforms the existing ‘back-to-back’ transmission schemes based on Gilbert loss model and continuous time Markov chain. We conduct the performance evaluation in Exata and emulation results show that LTBA outperforms the existing approaches in improving the video quality in terms of PSNR (Peak Signal-to-Noise Ratio).  相似文献   

15.
The selection of an optimal checkpointing strategy has most often been considered in the transaction processing environment where systems are allowed unlimited repairs. In this environment an optimal strategy maximizes the time spent in the normal operating state and consequently the rate of transaction processing. This paper seeks a checkpoint strategy which maximizes the probability of critical-task completion on a system with limited repairs. These systems can undergo failure and repair only until a repair time exceeds a specified threshold, at which time the system is deemed to have failed completely. For such systems, a model is derived which yields the probability of completing the critical task when each checkpoint operation has fixed cost. The optimal number of checkpoints can increase as system reliability improves. The model is extended to include a constraint which enforces timely completion of the critical task  相似文献   

16.
The growing trend towards VLSI implementation of crucial tasks in critical applications has increased both the demand for and the scope of fault-tolerant VLSI systems. In this paper, we present a self-recovering microarchitecture synthesis system. In a self-recovering microarchitecture, intermediate results are compared at regular intervals, and if correct saved in registers (checkpointing). On the other hand, on detecting a fault, the self-recovering microarchitecture rolls back to a previous checkpoint and retries. The proposed synthesis system comprises of a heuristic and an optimal subsystem. The heuristic synthesis subsystem has two components. Whereas the checkpoint insertion algorithm identifies good checkpoints by successively eliminating clock cycle boundaries that either have a high checkpoint overhead or violate the retry period constraint, the novel edge-based schedule, assigns edges to clock cycle boundaries, in addition to scheduling nodes to clock cycles. Also, checkpoint insertion and edge-based scheduling are intertwined using a flexible synthesis methodology. We additionally show an Integer Linear Programming model for the self-recovering microarchitecture synthesis problem. The resulting ILP formulation can minimize either the number of voters or the overall hardware, subject to constraints on the number of clock cycles the retry period, and the number of checkpoints  相似文献   

17.
Time-based coordinated checkpointing protocols are well suited for mobile computing systems because no explicit coordination message is needed while the advantages of coordinated checkpointing are kept. However, without coordination, every process has to take a checkpoint during a checkpointing process. In this paper, an efficient time-based coordinated checkpointing protocol for mobile computing systems over Mobile IP is proposed. The protocol reduces the number of checkpoints per checkpointing process to nearly minimum, so that fewer checkpoints need to be transmitted through the costly wireless link. Our protocol also performs very well in the aspects of minimizing the number and size of messages transmitted in the wireless network. In addition, the protocol is nonblocking because inconsistencies can be avoided by the piggybacked information in every message. Therefore, the protocol brings very little overhead to a mobile host with limited resource. Additionally, by taking advantage of reliable timers in mobile support stations, the time-based checkpointing protocol can adapt to wide area networks.  相似文献   

18.
This paper describes the reliability MicroKernel (RMK) framework, a loadable kernel module (or a device driver) for providing application-aware reliability, and dynamically configuring reliability mechanisms. Characteristics of application/system execution are exploited transparently through application-aware reliability techniques to achieve low-latency detection, and low-overhead checkpointing. The RMK prototype is implemented in both Linux, and Windows; and it supports detection of application/OS failures, and transparent application checkpointing. Experiment results show that the system hang detection and application hang detection, which exploit characteristics of application, and system behavior, can achieve high coverage (100% observed in our experiments) with a low false positive rate. Moreover, the performance overhead of RMK, and its detection/checkpointing mechanisms, is small: 0.6% for application hang detection, and 0.1% for transparent application checkpointing in the experiments.  相似文献   

19.
工作站机群系统自动重构机制   总被引:7,自引:0,他引:7       下载免费PDF全文
工作站机群系统已成为并行处理发展的主流方向之一.随着机群系统应用领域的逐渐拓展和规模的不断扩大,人们对其可用性的要求日益提高.设计高可用的机群系统,需要着重研究其系统重构技术.本文主要论述工作站机群系统重构模型、系统状态的保存及恢复、故障的检测等关键技术;并结合我们开发研制的ChaRM(Checkpoint-based Rollback Recovery and Migration System)系统, 介绍工作站机群重构机制的设计与实现技术.  相似文献   

20.
We present the unequal loss protection (ULP) framework in which unequal amounts of forward error correction are applied to progressive data to provide graceful degradation of image quality as packet losses increase. We develop a simple algorithm that can find a good assignment within the ULP framework. We use the set partitioning in hierarchical trees coder in this work, but our algorithm can protect any progressive compression scheme. In addition, we promote the use of a PMF of expected channel conditions so that our system can work with almost any model or estimate of packet losses. We find that when optimizing for an exponential packet loss model with a mean loss rate of 20% and using a total rate of 0.2 bits per pixel on the Lenna image, good image quality can be obtained even when 40% of transmitted packets are lost  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号