首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Real-time systems often have very high reliability requirements and are therefore prime candidates for the inclusion of fault tolerance techniques. In order to provide tolerance to software faults, some form of state restoration is usually advocated as a means of recovery. State restoration can be expensive and the cost is exacerbated for systems which utilize concurrent processes. The concurrency present in most real-time systems and the further difficulties introduced by timing constraints suggest that providing tolerance for software faults may be inordinately expensive or complex. We believe that this need not be the case, and propose a straightforward pragmatic approach to software fault tolerance'which is believed to be applicable to many real-time systems. The approach takes advantage of the structure of real-time systems to simplify error recovery, and a classification scheme for errors is introduced. Responses to each type of error are proposed which allow service to be maintained.  相似文献   

2.
Large scale video servers are typically based on disk arrays that comprise multiple nodes and many hard disks. Due to the large number of components, disk arrays are susceptible to disk and node failures that can affect the server reliability. Therefore, fault tolerance must be already addressed in the design of the video server. For fault tolerance, we consider parity-based as well as mirroring-based techniques with various distribution granularities of the redundant data. We identify several reliability schemes and compare them in terms of the server reliability and per stream cost. To compute the server reliability, we use continuous time Markov chains that are evaluated using the SHARPE software package. Our study covers independent disk failures and dependent component failures. We propose a new mirroring scheme called Grouped One-to-One scheme that achieves the highest reliability among all schemes considered. The results of this paper indicate that dividing the server into independent groups achieves the best compromise between the server reliability and the cost per stream. We further find that the smaller the group size, the better the trade-off between a high server reliability and a low per stream cost  相似文献   

3.
Issues in the design of a storage server for video-on-demand   总被引:2,自引:0,他引:2  
We examine issues related to the design of a storage server for video-on-demand (VOD) applications. The storage medium considered is magnetic disks or arrays of disks. We investigate disk scheduling policies, buffer management policies and I/O bus protocol issues. We derive the number of sessions that can be supported from a single disk or an array of disks and determine the amount of buffering required to support a given number of users. Furthermore, we propose a scheduling mechanism for disk accesses that significantly lowers the buffer-size requirements in the case of disk arrays. The buffer size required under the proposed scheme is independent of the number of disks in the array. This property allows for striping video content over a large number of disks to achieve higher concurrency in access to a particular video object. This enables the server to satisfy hundreds of independent requests to the same video object or to hundreds of different objects while storing only one copy of each video object. The reliability implications of striping content over a large number of disks are addressed and two solutions are proposed. Finally, we examine various policies for dealing with disk thermal calibration and the placement of videos on disks and disk arrays.  相似文献   

4.
Advances in networking and storage technology have made it possible to deliver on-demand services over networks such as the emerging video-on-demand (VOD) applications. A variety of studies have been focused on designing a video server suitable for VOD applications. However, the number of concurrent on-demand services supported by the server is often limited by the I/O bandwidth of the storage systems. This paper describes a discrete buffer sharing model which uses batching and buffer sharing techniques in video servers to support a large number of VOD services. Two operations, splitting and merging, enable the model to fully utilize system resources such as buffers and disk bandwidths. Moreover, this paper also introduces the concept of imprecise video viewing which assumes that a limited amount of quality loss is acceptable during video playback. Based upon this assumption, three shrinking strategies are explored to reduce buffer requirements. Finally, the results of experiments show that our methods perform better than traditional buffer management techniques for VOD systems.  相似文献   

5.
One of the main advantages of a redundant disk array architecture is that it provides fault tolerance against disk drive failures. This improvement in reliability can be further enhanced if spare drives are also added to the array since a failed drive can be expeditiously replaced. Furthermore, even though data can be reconstructed from the other drives of a redundant array in the event of a drive failure, performance is degraded substantially in this mode of operation. Clearly it is desirable to get out of this degraded mode of operation as quickly as possible. Again, having spare drives in the array will facilitate that. The purpose of this paper is to study some of the issues related to sparing in a redundant disk array. In particular, we will try to understand the effect on reliability of different sparing schemes. We will also examine the economic trade-offs of having spare drives in a system. Recommended by: M. Kitsuregawa  相似文献   

6.
A number of recent technological trends have made data intensive applications such as continuous media (audio and video) servers a reality. These servers store and retrieve large volumes of data using magnetic disks. Servers consisting of multiple nodes and large arrays of heterogeneous disk drives have become a fact of life for several reasons. First, magnetic disks might fail. Failed disks are almost always replaced with newer disk models because the current technological trend for these devices is one of annual increase in both performance and storage capacity. Second, storage requirements are ever increasing, forcing servers to be scaled up progressively. In this study, we present a framework to enable parity-based data protection for heterogeneous storage systems and to compute their mean lifetime. We describe the tradeoffs associated with three alternative techniques: independent subservers, dependent subservers, and disk merging. The disk merging approach provides a solution for systems that require highly available secondary storage in environments that also necessitate maximum flexibility.  相似文献   

7.
Recent advances in computer technologies have made it feasible to provide multimedia services, such as news distribution and entertainment, via high-bandwidth networks. The storage and retrieval of large multimedia objects (e.g., video) becomes a major design issue of the multimedia information system. While most other works on multimedia storage servers assume an on-line disk storage system, we consider a two-tier storage architecture with a robotic tape library as the vast near-line storage and an on-line disk system as the front-line storage. Magnetic tapes are cheaper, more robust, and have a larger capacity; hence, they are more cost effective for large scale storage systems (e.g., video-on-demand (VOD) systems may store tens of thousands of videos). We study in detail the design issues of the tape subsystem and propose some novel tape-scheduling algorithms which give faster response and require less disk buffer space. We also study the disk-striping policy and the data layout on the tape cartridge in order to fully utilize the throughput of the robotic tape system and to minimize the on-line disk storage space.  相似文献   

8.
Large-scale distributed applications such as online information retrieval and collaboration over computational elements demand an approach to self-managed computing systems with a minimum of human interference. However, large scales and full distribution often lead to poor system dependability and security, and increase the difficulty in managing and controlling redundancy for fault tolerance. In particular, fault tolerance schemes for mobile agents to survive agent server crash failures in an autonomie environment are complex since developers normally have no control over remote agent servers. Some solutions inject a replica into stable storage upon its arrival at an agent server. But in the event of an agent server crash the replica is unavailable until the agent server recovers. In this paper we present a failure model and an exception handling framework for mobile agent systems. An exception handling scheme is developed for mobile agents to survive agent server crash failures. A replica mobile agent operates at the agent server visited prior to its master's current location. If a master crashes its replica is available as a replacement. The proposed scheme is examined in comparison with a simple time-out scheme. Experimental evaluation is performed, and performance results show that the scheme leads to some overhead in the round trip time when fault tolerance measures are exercised. However the scheme offers the advantage that fault tolerance is provided during the mobile agent trip, i.e. in the event of an agent server crash all agent servers are not revisited.  相似文献   

9.
In designing cost-effective video-on-demand (VOD) servers, efficient resource management and proper system sizing are of great importance. In addition to large storage and I/O bandwidth requirements, support of interactive VCR functionality imposes additional resource requirements on the VOD system in terms of storage space, as well as disk and network bandwidth. Previous works have used data sharing techniques (such as batching, buffering, and adaptive piggybacking) to reduce the I/O demand on the storage server. However, such data sharing techniques complicate the provision of VCR functions and diminish the amount of benefit that can be obtained from data sharing techniques. The main contribution of this paper is a simple, yet powerful, analytical modeling approach which allows for analysis, system sizing, resource allocation, and parameter setting for a fairly general class of data sharing techniques which are used in conjunction with the providing of VCR-type functionality. Using this mathematical model, we can determine the proper amount of resources to be allocated for normal playback as well as for service of VCR functionality requests while satisfying predefined system performance requirements. To illustrate the usefulness of our model, we focus on a specific data sharing scheme which combines the use of batching, buffering, and adaptive piggybacking, as well as allows for the use of VCR functions. We show how to utilize this mathematical model for system sizing and resource allocation purposes  相似文献   

10.
The significant frame size variability exhibited in the compressed videos imposes a great challenge on network delivery. In this paper we propose an efficient flow control scheme, employed in the peer stations (i.e., servers and clients), for delivery of prestored compressed videos in a video-on-demand (VOD) system. This scheme resorts to an off-line analysis on the video frame sizes and server properties for figuring out the necessary buffer space and network bandwidth. The server platform of particular interest obeys a cycle-based data-block retrieval discipline, which is an essential technique to reduce the disk seek time for leveraging the disk throughput for supporting a large number of concurrent video accesses. Such a discipline is taken into account here to guarantee smooth delivery of variable-bit-rate videos. In run-time a server-driven control model is in use, where a server performs the primary flow control task, without relying on any feedback from clients. The scheme has been implemented in our prototype VOD system to support both unicast- and multicast communication paradigms under an RSVP-enabled network.  相似文献   

11.
In a video-on-demand (VOD) environment, disk arrays are often used to support the disk bandwidth requirement. This can pose serious problems on available disk bandwidth upon disk failure. In this paper, we explore the approach of replicating frequently accessed movies to provide high data bandwidth and fault tolerance required in a disk-array-based video server. An isochronous continuous video stream imposes different requirements from a random access pattern on databases or files. Explicitly, we propose a new replica placement method, called rotational mirrored declustering (RMD), to support high data availability for disk arrays in a VOD environment. In essence, RMD is similar to the conventional mirrored declustering in that replicas are stored in different disk arrays. However, it is different from the latter in that the replica placements in different disk arrays under RMD are properly rotated. Combining the merits of prior chained and mirrored declustering methods, RMD is particularly suitable for storing multiple movie copies to support VOD applications. To assess the performance of RMD, we conduct a series of experiments by emulating the storage and delivery of movies in a VOD system. Our results show that RMD consistently outperforms the conventional methods in terms of load-balancing and fault-tolerance capability after disk failure, and is deemed a viable approach to supporting replica placement in a disk-array-based video server.  相似文献   

12.
This paper investigates into fault tolerance of cluster of servers and their energy efficiency to realize a reliable and energy aware server cluster system. A client issues a request to one server in a server cluster and the server sends a reply to the client in information systems. Once the server stops by fault, the client does not receive a reply of the request. Even if the request is performed on another server on detection of fault of the server, some QoS requirements like response time may not be satisfied. Hence, each request has to be redundantly performed on multiple servers to be tolerant of server faults. The redundant power consumption laxity-based (RPCLB) algorithm is discussed where multiple servers are selected to redundantly and energy-efficiently perform a request process in our previous studies. Since each application process is redundantly performed on more than one server, the larger amount of electric power is consumed. In this paper, we propose a novel and improved RPCLB (IRPCLB) algorithm to reduce the power consumption of servers, where once a process successfully terminates on one server, meaningless redundant processes are forced to terminate on the other servers. In the evaluation, we show the total power consumption of servers and total execution time of processes are reduced in homogeneous and heterogeneous types of clusters by the IRPCLB algorithm than the RPCLB and RR algorithms.  相似文献   

13.
存储系统中的纠删码研究综述   总被引:5,自引:0,他引:5  
随着海量存储系统的发展和在复杂环境中的应用,存储系统的可靠性受到了严重的挑战.纠删码作为存储系统容错的主要方法越来越受到重视.首先介绍了当前典型和常见的纠删码技术的发展现状,从评价纠删码性能的各项重要指标的角度详细地对比和分析了现有的纠删码技术,给出了不同纠删码在容错能力与磁盘要求、空间利用率、编码效率、更新效率、重构效率等方面的不足和可能的改进见解,并讨论了磁盘阵列系统、P2P存储系统、分布式存储系统、归档存储系统等不同存储系统对于纠删码各类性能的差别要求,并进一步指明了当前存储系统纠删码研究中尚未解决的一些难题和未来纠删码可能的发展方向.通过分析得出,目前不同纠删码在容错能力、计算效率、存储利用率等方面都存在不同程度的缺陷,如何平衡这些影响纠删码性能的因素,设计出更高容错能力、更高计算效率及更高存储利用率的纠删码,仍是未来很长一段时间内值得不断深入研究的问题.  相似文献   

14.
Complex real-time system design needs to address dependability requirements, such as safety, reliability, and security. We introduce a modelling and simulation based approach which allows for the analysis and prediction of dependability constraints. Dependability can be improved by making use of fault tolerance techniques. The de-facto example, in the real-time system literature, of a pump control system in a mining environment is used to demonstrate our model-based approach. In particular, the system is modelled using the Discrete EVent system Specification (DEVS) formalism, and then extended to incorporate fault tolerance mechanisms. The modularity of the DEVS formalism facilitates this extension. The simulation demonstrates that the employed fault tolerance techniques are effective. That is, the system performs satisfactorily despite the presence of faults. This approach also makes it possible to make an informed choice between different fault tolerance techniques. Performance metrics are used to measure the reliability and safety of the system, and to evaluate the dependability achieved by the design. In our model-based development process, modelling, simulation and eventual deployment of the system are seamlessly integrated.  相似文献   

15.
连续媒体服务器(如VOD服务器)要对大量连续媒体数据(如声频、视频)进行管理,按一定速率为用户提供连续的媒体服务。因此,在这样的系统中,作为存储设备的磁盘阵列要具有高可靠性和一定的容错能力。文章提出一种基于奇偶检验的数据重构恢复算法,以保证系统中只有一个盘出现故障时,能使服务器及时重构出故障盘上的数据,并且算法充分利用了媒体流内在特性———回放时数据的连续性,与目前使用的标准故障恢复算法相比,大大减少了磁盘在线故障后数据重构过程的系统开销。最后通过分析、比较证明了算法的有效性。  相似文献   

16.
基于协作缓存的VOD服务器端Cache设计   总被引:1,自引:0,他引:1  
近年来,随着计算机网络技术和多媒体技术的发展,视频点播服务已逐渐成为现实。分布式VOD(VideoOnDemand)服务器系统的提出是为了支持更多的大量并发数据流,和单一服务器相比,这样的结构拥有更好的使用效率、可靠性和可扩展性。协作缓存CC(cooperativecache)技术将各服务器的内存协调工作,形成全局的cache。这样的结构不仅充分发挥了分布式VOD服务器结构的特点,同时也增大cache容量,提高系统全局命中率,从而提高了系统效率。该文在协作缓存技术基础上,针对流媒体和VOD系统的特点,提出了GBBcache替换算法。该算法以数据块的生命周期作为出发点,充分考虑了现有用户和请求接入用户的服务需求,提高了内存使用效率。笔者对该算法进行了理论分析,并证明了它在性能上与传统的cache替换算法相比的优越性。  相似文献   

17.
李静  罗金飞  李炳超 《计算机应用》2021,41(4):1113-1121
主动容错机制通过预先发现即将故障的硬盘来提醒系统提前迁移备份危险数据,从而显著提高存储系统的可靠性。针对现有研究无法准确评价主动容错副本存储系统可靠性的问题,提出几种副本存储系统的状态转换模型,然后利用蒙特卡洛仿真算法实现了该模型,从而模拟主动容错副本存储系统的运行,最后统计系统在某个运行时期内发生数据丢失事件的期望次数。采用韦布分布函数模拟设备故障和故障修复事件的时间分布,并定量评价了主动容错机制、节点故障、节点故障修复、硬盘故障以及硬盘故障修复事件对存储系统可靠性的影响。实验结果表明,当预测模型的准确率达到50%时,系统的可靠性可以提高1~3倍;与二副本系统相比,三副本系统对系统参数更敏感。所提模型可以帮助系统管理者比较权衡不同的容错方式以及系统参数下的系统可靠性水平,从而搭建高可靠和高可用的存储系统。  相似文献   

18.
宋健  王培元  孙为 《微计算机信息》2007,23(33):246-248
本文利用Linux下DNS服务器软件Bind的view功能和WEB服务器软件Apache的反向代理功能,提出了一种基于多ISP链路的服务器对外访问的设计方案,从而实现了内部服务器访问的链路负载均衡和冗余机制,并从整体上提高了服务器的可用性和可靠性。  相似文献   

19.
《Information Sciences》1987,42(3):255-282
The paper proposes a technique for providing software fault tolerance in real-time applications demanding fast response and a high degree of reliability. It is shown that a reasonably flexible interprocess communication can be supported with only a small increase in complexity and overhead. The two most prominent features of the proposed scheme are (1) it attempts to exploit fault-avoidance techniques as much as possible to reduce the overhead of fault tolerance and (2) it controls the propagation of errors so as to enable efficient recovery. Formal proofs of the system operation are developed. Besides showing that the scheme works as expected, the arguments serve to highlight the assumptions needed for provably correct operation. Some issues relating to hardware fault tolerance are also considered.  相似文献   

20.
Emerging applications like C3I systems, real-time databases, data acquisition systems and multimedia servers require access to secondary storage devices under timing constraints. In this paper, we focus on operating system support needed for managing real-time disk traffic with hard deadlines. We present the design and implementation of a preemptive deadline-driven disk I/O subsystem suitable for real-time disk traffic management. Preemptibility is achieved with a granularity that is automatically controlled by the I/O subsystem according to current workload and filesystem data layout. An admission control test checks the current resource availability for a given workload. We show that contiguous layout is necessary to maintain hard real-time guarantees and a reasonable level of disk throughput. Finally, we show how buffering can be used to obtain utilization factors close to the maximum disk bandwidth possible.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号