期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An analysis of location record checkpointing interval for mobility database in PCS networks

Hung-Hsin Chang Ming-Feng Chang Chien-Chao Tseng 《Wireless Networks》2009,15(8):991-998

Mobility database that stores the users’ location records is very important to connect calls to mobile users on personal communication networks. If the mobility database fails, calls to mobile users may not be set up in time. This paper studies failure restoration of mobility database. We study per-user location record checkpointing schemes that checkpoint a user’s record into a non-volatile storage from time to time on a per-user basis. When the mobility database fails, the user location records can be restored from the backup storage. Numeric analysis has been used to choose the optimum checkpointing interval so that the overall cost is minimized. The cost function that we consider includes the cost of checkpointing a user’s location record and the cost of paging a user due to an invalid location record. Our results indicate that when user registration intervals are exponentially distributed, the user record should never be checkpointed if checkpointing costs more than paging. Otherwise, if paging costs more, the user record should be always checkpointed when a user registers. 相似文献

2.

Failure restoration of mobility databases for personal communication networks 总被引：3，自引：0，他引：3

Yi-Bing Lin 《Wireless Networks》1995,1(3):365-372

This paper studies failure restoration of mobility databases for personal communication networks (specifically, VLRs and HLRs). We model the VLR restoration with and without checkpointing. The optimal VLR checkpointing interval is derived to balance the checkpointing cost against the paging cost. We also model GSM periodic location updating (location confirmation) to quantify the relationship between the location confirmation frequency and the number of lost calls. The HLR failure restoration procedures for IS-41 and GSM are described. We show the number of lost calls in a HLR failure. Both the procedures in IS-41 and GSM cannot identify the VLRs that need to be accessed by the HLR after a failure. An algorithm is proposed to identify the VLRs, which can be used to aggressively restore a HLR after its failure. 相似文献

3.

Efficient hardware task migration for heterogeneous FPGA computing using HDL-based checkpointing

《Integration, the VLSI Journal》2021

相似文献

4.

Combining replication and checkpointing redundancies for reducing resiliency overhead

Hassan Motallebi 《ETRI Journal》2020,42(3):388-398

We herein propose a heuristic redundancy selection algorithm that combines resubmission, replication, and checkpointing redundancies to reduce the resiliency overhead in fault‐tolerant workflow scheduling. The appropriate combination of these redundancies for workflow tasks is obtained in two consecutive phases. First, to compute the replication vector (number of task replicas), we apportion the set of provisioned resources among concurrently executing tasks according to their needs. Subsequently, we obtain the optimal checkpointing interval for each task as a function of the number of replicas and characteristics of tasks and computational environment. We formulate the problem of obtaining the optimal checkpointing interval for replicated tasks in situations where checkpoint files can be exchanged among computational resources. The results of our simulation experiments, on both randomly generated workflow graphs and real‐world applications, demonstrated that both the proposed replication vector computation algorithm and the proposed checkpointing scheme reduced the resiliency overhead. 相似文献

5.

Clustered checkpointing: Maximizing the level of confidence for non-equidistant checkpointing

《Integration, the VLSI Journal》2017

Employing fault tolerance often introduces a time overhead, which may cause a deadline violation in real-time systems (RTS). Therefore, for RTS it is important to optimize the fault tolerance techniques such that the probability to meet the deadlines, i.e. the Level of Confidence (LoC), is maximized. Previous studies have focused on evaluating the LoC for equidistant checkpointing. However, no studies have addressed the problem of evaluating the LoC for non-equidistant checkpointing. In this work, we provide an expression to evaluate the LoC for non-equidistant checkpointing. Further, we detail an exhaustive search approach to find the distribution of a given number of checkpoints that results in the maximal LoC. Since the exhaustive search approach is very time-consuming, we propose the Clustered Checkpointing method, a heuristic that distributes checkpoints in a number of clusters with the goal to maximize the LoC. The results show that the LoC can be improved when non-equidistant checkpointing is used. Further, the results indicate that the proposed Clustered Checkpointing method is capable to find the distribution that results in the maximal LoC in much shorter time than the exhaustive search approach, while considering only few clusters. 相似文献

6.

Statistical Estimation of Mean Signal Strength in a Rayleigh-Fading Environment 总被引：1，自引：0，他引：1

Peritsky M. 《Communications, IEEE Transactions on》1973,21(11):1207-1213

A commonly used model for signal fading in many types of communication channels is that the amplitude of the received signal at a given time is a Rayleigh-distributed random variable. In this paper we show how classical statistical techniques may be applied to the problem of estimating the Rayleigh distribution parameter (i.e., the mean), given samples from the distribution. In particular, we first consider the problem of estimating the population mean, given a sequence of independent samples. We derive an unbiased maximum-likelihood estimator. We show that this estimator is unique, and since it is based on a sufficient statistic, it is therefore "best" in the Blackwell-Rao sense of minimizing expected loss. Using this estimator, we then develop confidence intervals whose length can be used as a guide in selecting the required sample size. We then consider the same estimation problem when the signal samples are obtained from the output of a logarithmic receiver. We derive an interval estimator which does not require taking the antilogs of the log samples, and which is not appreciably worse than the "best" estimator. 相似文献

7.

Statistical estimation of mean signal strength in a Rayleigh-fading environment

《Vehicular Technology, IEEE Transactions on》1973,22(4):123-129

A commonly used model for signal fading in many types of communication channels is that the amplitude of the received signal at a given time is a Rayleigh-distributed random variable. In this paper we show how classical statistical techniques may be applied to the problem of estimating the Rayleigh distribution parameter (i.e., the mean), given samples from the distribution. In particular, we first consider the problem of estimating the population mean, given a sequence of independent samples. We derive an unbiased maximum-likelihood estimator. We show that this estimator is unique, and since it is based on a sufficient statistic, it is therefore "best" in the Blackwell-Rao sense of minimizing expected loss. Using this estimator, we then develop confidence intervals whose length can be used as a guide in selecting the required sample size. We then consider the same estimation problem when the signal samples are obtained from the output of a logarithmic receiver. We derive an interval estimator which does not require taking the antilogs of the log samples, and which is not appreciably worse than the "best" estimator. 相似文献

8.

Selective checkpointing and rollbacks in multi-threadedobject-oriented environment

Kasbekar M. Narayanan C. Das C.R. 《Reliability, IEEE Transactions on》1999,48(4):325-337

This paper presents selective checkpointing and rollback schemes for MT-OO (multithreaded, object-oriented) programs. There is a need for checkpointing mechanisms that are more sophisticated than the traditional process-level checkpointing. The program model, theoretical foundations, and an implementation of the selective checkpointing and rollback schemes are described. The usefulness of the schemes is demonstrated by implementing a higher level fault-tolerance scheme of conversations using them. The performance implications are studied on a prototype Internet e-commerce-server. The use of the selective schemes in the prototype server showed an appreciable reduction in the loss of work in the presence of faults. Benefits are more pronounced for a larger level of concurrency in the server. The selective scheme usually outperforms the hypothetical zero-cost global scheme in the presence of faults, vis-a-vis completion times. The experiments also show the vast difference between the sizes of selective checkpoints and global checkpoints. The concurrent sessions scheme (based on the concept of relaxed conversations) required 160 checkpoints in less than an hour. Traditionally, such a scheme would be considered outrageous, but the selective schemes still improve performance in the presence of faults 相似文献

9.

An optimal checkpointing-strategy for real-time control systemsunder transient faults

Seong Woo Kwak Byung Jae Choi Byung Kook Kim 《Reliability, IEEE Transactions on》2001,50(3):293-301

Real-time computer systems are often used in harsh environments, such as aerospace, and in industry. Such systems are subject to many transient faults while in operation. Checkpointing enables a reduction in the recovery time from a transient fault by saving intermediate states of a task in a reliable storage facility, and then, on detection of a fault, restoring from a previously stored state. The interval between checkpoints affects the execution time of the task. Whereas inserting more checkpoints and reducing the interval between them reduces the reprocessing time after faults, checkpoints have associated execution costs, and inserting extra checkpoints increases the overall task execution time. Thus, a trade-off between the reprocessing time and the checkpointing overhead leads to an optimal checkpoint placement strategy that optimizes certain performance measures. Real-time control systems are characterized by a timely, and correct, execution of iterative tasks within deadlines. The reliability is the probability that a system functions according to its specification over a period of time. This paper reports on the reliability of a checkpointed real-time control system, where any errors are detected at the checkpointing time. The reliability is used as a performance measure to find the optimal checkpointing strategy. For a single-task control system, the reliability equation over a mission time is derived using the Markov model. Detecting errors at the checkpointing time makes reliability jitter with the number of checkpoints. This forces the need to apply other search algorithms to find the optimal number of checkpoints. By considering the properties of the reliability jittering, a simple algorithm is provided to find the optimal checkpoints effectively. Finally, the reliability model is extended to include multiple tasks by a task allocation algorithm 相似文献

10.

Per-user checkpointing for mobility database failure restoration

Yi-Bing Lin 《Mobile Computing, IEEE Transactions on》2005,4(2):189-194

This paper studies the failure restoration of mobility database for Universal Mobile Telecommunications System (UMTS). We consider a per-user checkpointing approach for the home location register (HLR) database. In this approach, individual HLR records are saved into a backup database from time to time. When a failure occurs, the backup record is restored back to the mobility database. We first describe a commonly used basic checkpoint algorithm. Then, we propose a new checkpoint algorithm. An analytic model is developed to compare these two algorithms in terms of the checkpoint cost and the probability that an HLR backup record is obsolete. This analytic model is validated against simulation experiments. Numerical examples indicate that our new algorithm may significantly outperform the basic algorithm in terms of both performance measures. 相似文献

11.

A maintenance inspection model for a single machine with general failure distribution

M.A. Hariga 《Microelectronics Reliability》1996,36(3):353-358

In this paper we develop a mathematical model for determining a periodic inspection schedule in a preventive maintenance program for a single machine subject to random failure. We formulate the problem as a profit maximization model with general failure time distribution. We show that under certain conditions on the probability density function of failure, a unique optimal inspection interval can be obtained. When the failure times are exponentially distributed, we propose alternative optimal and heuristic procedures to find exact and approximate inspection intervals. Our heuristic solution method is shown numerically to be more efficient than an earlier published heuristic procedure. We also investigated the sensitivity of the optimal inspection interval and expected profit per unit of time with respect to the changes in the two parameters of the Weibull time to failure distribution. 相似文献

12.

Harvester-aware transient computing: Utilizing the mechanical inertia of kinetic energy harvesters for a proactive frequency-based power loss detection

《Integration, the VLSI Journal》2020

Power-neutral system design avoids energy buffers by directly powering the load by the energy harvester. In case of a power loss, checkpointing methods ensure forward progress by preserving the volatile system state using non-volatile memories. The timely detection of upcoming power losses is essential for a reliable checkpointing process. Moreover, various applications require early detections to, e.g., ensure the finalization of atomic operations. However, common voltage threshold-based methods only allow short-term detections.In this paper we propose a new methodology that allows early detections by exploiting physical characteristics of the harvester. To this end, small-scale kinetic energy harvesters are considered that employ rotatably mounted mechanical masses to drive electromagnetic generators. Due to the inertia of these masses, the power output does not stop abruptly, but gradually decays after the excitation of the harvester is over.We investigate the relationship between the initial excitation intensity as it is reflected in the output frequency, the load current and the remaining period of power availability. Our results indicate that this relationship allows to predict the power duration based on the output frequency of the harvester. We show that power losses can be detected up to one order of magnitude earlier with our frequency-based method than with state-of-the-art voltage-based methods. 相似文献

13.

面向异构并行计算系统的流水线式压缩检查点

下载免费PDF全文

刘勇鹏王锋卢凯刘勇燕《电子学报》2012,40(2):223-229

在大规模并行计算系统中,并行检查点触发大量结点同时保存计算状态,造成巨大文件存储空间开销,以及对通信和存储系统的巨大访问压力.数据压缩可以缩小检查点文件尺寸,从而降低存储空间开销以及对通信和存储系统的访问压力.但是,它也带来额外的压缩计算开销.本文针对异构并行计算系统,提出流水线式并行压缩检查点技术,采用一系列优化技术来降低压缩引入的计算延时,包括:流水线式双重写缓存队列、文件写操作的合并、GPU加速的流水压缩算法和GPU资源的多进程调度,等等.本文介绍了该技术在天河一号系统中的实现,并对所实现的检查点系统进行综合评测.实验数据表明该方法在大规模异构并行计算系统中是可行、高效、实用的. 相似文献

14.

Loss Tolerant Bandwidth Aggregation for Multihomed Video Streaming over Heterogeneous Wireless Networks

Jiyan Wu Yanlei Shang Bo Cheng Budan Wu Junliang Chen 《Wireless Personal Communications》2014,75(2):1265-1282

Bandwidth aggregation is a key research issue in integrating heterogeneous wireless networks, since it can substantially increase the throughput and reliability for enhancing streaming video quality. However, the burst loss in the unreliable wireless channels is a severely challenging problem which significantly degrades the effectiveness of bandwidth aggregation. Previous studies mainly address the critical problem by reactively increasing the forward error correction (FEC) redundancy. In this paper, we propose a loss tolerant bandwidth aggregation approach (LTBA), which proactively leverages the channel diversity in heterogeneous wireless networks to overcome the burst loss. First, we allocate the FEC packets according to the ‘loss-free’ bandwidth of each wireless network to the multihomed client. Second, we deliberately insert intervals between the FEC packets’ departures while still respecting the delay constraint. The proposed LTBA is able to reduce the consecutive packet loss under burst loss assumption. We carry out analysis to prove that the proposed LTBA outperforms the existing ‘back-to-back’ transmission schemes based on Gilbert loss model and continuous time Markov chain. We conduct the performance evaluation in Exata and emulation results show that LTBA outperforms the existing approaches in improving the video quality in terms of PSNR (Peak Signal-to-Noise Ratio). 相似文献

15.

Selection of a checkpoint interval in a critical-task environment

Geist R. Reynolds R. Westall J. 《Reliability, IEEE Transactions on》1988,37(4):395-400

The selection of an optimal checkpointing strategy has most often been considered in the transaction processing environment where systems are allowed unlimited repairs. In this environment an optimal strategy maximizes the time spent in the normal operating state and consequently the rate of transaction processing. This paper seeks a checkpoint strategy which maximizes the probability of critical-task completion on a system with limited repairs. These systems can undergo failure and repair only until a repair time exceeds a specified threshold, at which time the system is deemed to have failed completely. For such systems, a model is derived which yields the probability of completing the critical task when each checkpoint operation has fixed cost. The optimal number of checkpoints can increase as system reliability improves. The model is extended to include a constraint which enforces timely completion of the critical task 相似文献

16.

Coactive scheduling and checkpoint determination during high levelsynthesis of self-recovering microarchitectures

Orailoglu A. Karri R. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1994,2(3):304-311

The growing trend towards VLSI implementation of crucial tasks in critical applications has increased both the demand for and the scope of fault-tolerant VLSI systems. In this paper, we present a self-recovering microarchitecture synthesis system. In a self-recovering microarchitecture, intermediate results are compared at regular intervals, and if correct saved in registers (checkpointing). On the other hand, on detecting a fault, the self-recovering microarchitecture rolls back to a previous checkpoint and retries. The proposed synthesis system comprises of a heuristic and an optimal subsystem. The heuristic synthesis subsystem has two components. Whereas the checkpoint insertion algorithm identifies good checkpoints by successively eliminating clock cycle boundaries that either have a high checkpoint overhead or violate the retry period constraint, the novel edge-based schedule, assigns edges to clock cycle boundaries, in addition to scheduling nodes to clock cycles. Also, checkpoint insertion and edge-based scheduling are intertwined using a flexible synthesis methodology. We additionally show an Integer Linear Programming model for the self-recovering microarchitecture synthesis problem. The resulting ILP formulation can minimize either the number of voters or the overall hardware, subject to constraints on the number of clock cycles the retry period, and the number of checkpoints 相似文献

17.

An Efficient Time-Based Checkpointing Protocol for Mobile Computing Systems over Mobile IP

Lin Chi-Yi Wang Szu-Chi Kuo Sy-Yen 《Mobile Networks and Applications》2003,8(6):687-697

Time-based coordinated checkpointing protocols are well suited for mobile computing systems because no explicit coordination message is needed while the advantages of coordinated checkpointing are kept. However, without coordination, every process has to take a checkpoint during a checkpointing process. In this paper, an efficient time-based coordinated checkpointing protocol for mobile computing systems over Mobile IP is proposed. The protocol reduces the number of checkpoints per checkpointing process to nearly minimum, so that fewer checkpoints need to be transmitted through the costly wireless link. Our protocol also performs very well in the aspects of minimizing the number and size of messages transmitted in the wireless network. In addition, the protocol is nonblocking because inconsistencies can be avoided by the piggybacked information in every message. Therefore, the protocol brings very little overhead to a mobile host with limited resource. Additionally, by taking advantage of reliable timers in mobile support stations, the time-based checkpointing protocol can adapt to wide area networks. 相似文献

18.

Reliability MicroKernel: Providing Application-Aware Reliability in the OS

Long Wang Kalbarczyk Z. Weining Gu Iyer R.K. 《Reliability, IEEE Transactions on》2007,56(4):597-614

This paper describes the reliability MicroKernel (RMK) framework, a loadable kernel module (or a device driver) for providing application-aware reliability, and dynamically configuring reliability mechanisms. Characteristics of application/system execution are exploited transparently through application-aware reliability techniques to achieve low-latency detection, and low-overhead checkpointing. The RMK prototype is implemented in both Linux, and Windows; and it supports detection of application/OS failures, and transparent application checkpointing. Experiment results show that the system hang detection and application hang detection, which exploit characteristics of application, and system behavior, can achieve high coverage (100% observed in our experiments) with a low false positive rate. Moreover, the performance overhead of RMK, and its detection/checkpointing mechanisms, is small: 0.6% for application hang detection, and 0.1% for transparent application checkpointing in the experiments. 相似文献

19.

工作站机群系统自动重构机制 总被引：7，自引：0，他引：7

下载免费PDF全文

张悠慧汪东升郑纬民《电子学报》2000,28(5):13-16

工作站机群系统已成为并行处理发展的主流方向之一.随着机群系统应用领域的逐渐拓展和规模的不断扩大,人们对其可用性的要求日益提高.设计高可用的机群系统,需要着重研究其系统重构技术.本文主要论述工作站机群系统重构模型、系统状态的保存及恢复、故障的检测等关键技术;并结合我们开发研制的ChaRM(Checkpoint-based Rollback Recovery and Migration System)系统, 介绍工作站机群重构机制的设计与实现技术. 相似文献

20.

Unequal loss protection: graceful degradation of image quality overpacket erasure channels through forward error correction

Mohr A.E. Riskin E.A. Ladner R.E. 《Selected Areas in Communications, IEEE Journal on》2000,18(6):819-828

We present the unequal loss protection (ULP) framework in which unequal amounts of forward error correction are applied to progressive data to provide graceful degradation of image quality as packet losses increase. We develop a simple algorithm that can find a good assignment within the ULP framework. We use the set partitioning in hierarchical trees coder in this work, but our algorithm can protect any progressive compression scheme. In addition, we promote the use of a PMF of expected channel conditions so that our system can work with almost any model or estimate of packet losses. We find that when optimizing for an exponential packet loss model with a mean loss rate of 20% and using a total rate of 0.2 bits per pixel on the Lenna image, good image quality can be obtained even when 40% of transmitted packets are lost 相似文献