首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Information security management has become an important research issue in distributed systems, and the detection of failures is a fundamental issue for fault tolerance in large distributed systems. Recently, many people have come to realize that failure detection ought to be provided as some form of generic service, similar to IP address lookup. However, this has not been successful so far; one of the reasons being the fact that classical failure detectors were not designed to satisfy several application requirements simultaneously. More specifically, traditional implementations of failure detectors are often tuned for running over local networks and fail to address some important problems found in wide-area distributed systems with a large number of monitored components, such as Grid systems. In this paper, we study the security management scheme for failure detector distributed systems. We first identify some of the most important QoS problems raised in the context of large wide-area distributed systems. Then we present a novel failure detector scheme combined with self-tuning control theory that can help in solving or optimizing some of these problems. Furthermore, this paper discusses the design and analysis of implementing a scalable failure detection service for such large wide-area distributed systems considering dynamically adjusting the heartbeat streams, so that it satisfies the bottleneck router requirements. The basic z-transformation stability test is used to achieve the stability criterion, which ensures the bounded rate allocation without steady state oscillation. We further show how the online failure detector control algorithm can be used to design a controller, analyze the theoretical aspects of the proposed algorithm and verify its agreement with the simulations in the LAN and WAN case. Simulation results show the efficiency of our scheme in terms of high utilization of the bottleneck link, fast response and good stability of the bottleneck router buffer occupancy as well as of the controlled sending rates. In conclusion, the new security management failure detector algorithm provides a better QoS than an algorithm that is proposed by Stelling et al. (Proceedings of 7th IEEE symposium on high performance distributed computing, pp. 268–278, 1998), Foster et al. (Int J Supercomput Appl, 2001).  相似文献   

2.
在分布式存储系统存储数据时,如果一个或几个设备出现故障,不仅该设备中的数据不能使用,而且会导致用户无法完整地访问资源。针对该问题,提出一种基于RS码的错误容忍存储方案,当系统中错误设备的数量不超过m时,就可以对其进行恢复,实现容错。该方案具有较高的安全性与执行效率,能满足存储系统容错的要求,可以利用其构造对可靠性要求较高的存储系统。  相似文献   

3.
In a Grid computing system, many distributed scientific and engineering applications often require multi-institutional collaboration, large-scale resource sharing, wide-area communication, etc. Applications executing in such systems inevitably encounter different types of failures such as hardware failure, program failure, and storage failure. One way of taking failures into account is to employ a reliable scheduling algorithm. However, most existing Grid scheduling algorithms do not adequately consider the reliability requirements of an application. In recognition of this problem, we design a hierarchical reliability-driven scheduling architecture that includes both a local scheduler and a global scheduler. The local scheduler aims to effectively measure task reliability of an application in a Grid virtual node and incorporate the precedence constrained tasks’ reliability overhead into a heuristic scheduling algorithm. In the global scheduler, we propose a hierarchical reliability-driven scheduling algorithm based on quantitative evaluation of independent application reliability. Our experiments, based on both randomly generated graphs and the graphs of some real applications, show that our hierarchical scheduling algorithm performs much better than the existing scheduling algorithms in terms of system reliability, schedule length, and speedup.  相似文献   

4.
杜洪涛  李战怀  冯泳 《微处理机》2007,28(5):114-117
根据灾难的类型和对恢复技术的要求,针对当前恢复技术灵活性差的缺点,提出了一种结合镜像技术和快照技术的灾难恢复系统。它是一种数据块级的存储应用,提供了一组机制,可以用于调节数据保护能力和管理开销的关系,适用于创建高级存储管理策略。  相似文献   

5.
Cloud computing is widely used to provide today’s Internet services. Since its service scope is being extended to a wide range of business applications, the security of network communications between clients and clouds are becoming important. Several cloud vendors support virtual private networks (VPNs) for connecting their clouds. Unfortunately, cloud services become unavailable when a VPN failure occurred in a VPN gateway or networks. We propose a transparent VPN failure recovery scheme that can hide VPN failures from users and operating systems (OSs). This scheme transparently recovers from VPN failures by establishing VPN connections in a virtualization layer. When a VPN failure occurs, a client virtual machine monitor (VMM) automatically reconnects to an available VPN gateway which is geographically distributed and connected via leased lines in clouds. IP address changes are hidden from client OSs and servers via a packet relay system implemented by a relay client in the client VMM and a relay server. We implemented a prototype system based on BitVisor, a small client VMM supporting IPsec VPN, and evaluated the prototype system in a wide-area distributed Internet environment in Japan. Experimental results show that our scheme can maintain TCP connections on VPN failures, and performance overhead with the virtualization layer is around 0.6 ms to latency and 8%-30% to throughput.  相似文献   

6.
With the advent of next-generation scientific applications, the workflow approach that integrates various computing and networking technologies has provided a viable solution to managing and optimizing large-scale distributed data transfer, processing, and analysis. This paper investigates a problem of mapping distributed scientific workflows for maximum throughput in faulty networks where nodes and links are subject to probabilistic failures. We formulate this problem as a bi-objective optimization problem to maximize both throughput and reliability. By adapting and modifying a centralized fault-free workflow mapping scheme, we propose a new mapping algorithm to achieve high throughput for smooth data flow in a distributed manner while satisfying a pre-specified bound of the overall failure rate for a guaranteed level of reliability. The performance superiority of the proposed solution is illustrated by both extensive simulation-based comparisons with existing algorithms and experimental results from a real-life scientific workflow deployed in wide-area networks.  相似文献   

7.
医疗信息化的广泛应用使得数据急速增长,传统的医院信息的存储方案已不能满足对大型医疗信息数据的存储要求.采用高可靠性备份系统保证数据的安全性,采用“2+2”集群模式保证关键业务的容灾性.通过实践证明,高可靠性高可用的存储方案在医疗信息化起到了重大的作用.  相似文献   

8.
In recent years, with the impressive rapid development of integrated circuit and networking technologies, computers, devices and networking have become highly pervasive, incurring the introduction, development and deployment of the Internet of Things (IoT). The tiny identifying devices and wearables in IoT have transformed daily life in human society, as they generate, process and store the amount of data increasing at exponential rate all over the world. Due to high demand on data mining and analytics activities in IoT, secure and scalable mass storage systems are highly demanded for aggregate data in efficient processing. In this paper, we propose such a secure and scalable IoT storage system based on revised secret sharing scheme with support of scalability, flexibility and reliability at both data and system levels. Shamir’s secret sharing scheme is applied to achieve data security without complex key management associated with traditional cryptographic algorithms. The original secret sharing scheme is revised to utilize all coefficients in polynomials for larger data capacity at data level. Flexible data insert and delete operations are supported. Moreover, a distributed IoT storage infrastructure is deployed to provide scalability and reliability at system level. Multiple IoT storage servers are aggregated for large storage capacity whereas individual servers can join and leave freely for flexibility at system level. Experimental results have demonstrated the feasibility and benefits of the proposed system as well as tangible performance gains.  相似文献   

9.
This paper considers modelling the performance and reliability of a secure electronic voting scheme. The scheme provides secure verifiable blind voting, however there is a considerable administration overhead to this level of security. A Markovian process algebra is used to build a performance model of a basic system of n distributed voters. This model is shown to suffer from the familiar state space explosion problem. A simpler model is therefore developed to allow larger and more practically relevant systems to be studied. The original model is then extended to include the possibility that voters may fail and two modes of recovery are considered. The models are evaluated numerically using data obtained from measuring an implementation of this scheme in order to determine the accuracy of the approximations.  相似文献   

10.
在大规模分布式存储系统中,为了保证数据的可用性和可靠性,需要对数据进行一定的冗余存储。当节点失效后,有必要对失效节点所存储的数据进行修复以提供数据的可用性保证。然而,由于节点失效行为的不可预测性,何时对数据进行修复成为难题。目前,许多系统采用了立即修复的策略,但是这种方式会给系统负载带来大量不必要的浪费。通过对节点失效行为和副本数量的分析,提出了基于平均偏移的两阶段数据修复策略。实验证明,该策略在保证系统副本可用性的前提下,有效地降低了数据修复过程对系统的负载压力,提高了集群系统的系统稳定性。  相似文献   

11.
端到端校验是一种有效的数据完整性检测手段,可为分布式存储系统提供基本的可靠性保证。Glusterfs 是一种常用的堆叠式分布式文件系统,但缺乏有效的数据完整性检测机制,存在用户数据遭受破坏而无法被发现的风险,即返回错误数据给用户。这种风险在某些情况还会扩散,造成多副本或灾备、双活情况下的数据丢失。针对这一问题,该文提出了一种高性价比的基于 Glusterfs 的端到端校验方案(命名为 Glusterfs-E2E),可以有效解决 Glusterfs 文件系统中存在的数据完整性风险。该方案不但可以提供全路径的保护,具备 2%~8% 的高性能开销,而且还可以提供软件故障的定位功能。  相似文献   

12.
国网95598客服中心是国家电网公司供电服务的重要窗口,数据库分区分域的异地双活模式可以有效提高95598系统的可靠性。为保障双活系统的数据安全,95598核心业务系统容灾建设方案采用数据库分区分域的异地双活技术路线,通过数据库拆分和重新存储实现数据库的改造,并通过奇偶序列及自动生成主键的方式实现数据库的区分及快速存储,提高数据保护能力,保障其在正常情况及灾备状态下的数据一致性及可靠性。  相似文献   

13.
We have developed a distributed parallel storage system that employs the aggregate bandwidth of multiple data servers connected by a high-speed wide-area network to achieve scalability and high data throughput. This paper studies different schemes to enhance the reliability and availability of such network-based distributed storage systems. The general approach of this paper employs “erasure” error-correcting codes that can be used to reconstruct missing information caused by hardware, software, or human faults. The paper describes the approach and develops optimized algorithms for the encoding and decoding operations. Moreover, the paper presents techniques for reducing the communication and computation overhead incurred while reconstructing missing data from the redundant information. These techniques include clustering, multidimensional coding, and the full two-dimensional parity schemes. The paper considers trade-offs between redundancy, fault tolerance, and complexity of error recovery  相似文献   

14.
鉴于信息的重要性,为信息安全提供保障的容灾系统发展迅速,多种容灾方案相继被提出。典型的容灾系统由SAN和集群技术组建,适用于大型系统。Oracle分布式数据库有着良好的数据冗余机制和数据复制机制,能够保证全局数据的安全性和一致性。集合了两者优点的基于分布式数据库的容灾系统还能够适应各种规模的应用系统,并具有管理方便,部署简单的特点。  相似文献   

15.
云存储服务,作为云计算的衍生产物,目的是为网络海量数据的存储提供有效的解决方案,节约存储成本和系统资源,提供一个完善的备份、容灾的数据中心,并能够保证数据安全性、容错性.现阶段云灾备模型局限于有限的网络位置,使用虚拟化技术,依托本地服务器实现,与传统云灾备模型不同,介绍了一种基于DHT的云灾备模型,可适用于广域网的、普适的数据级灾备解决方案;最后,在本地云计算集群中对该方案进行模拟,验证该模型的可行性.  相似文献   

16.
针对传统灾难备份系统安全性不高的问题,提出一种安全的灾难备份系统,使用差错校验、安全传输和加密存储,保障了备份数据在传输、存储过程中的安全性。在完全同步完成后,通过卷过滤驱动实时监控磁盘数据变化,采用高速缓存存储捕获到的数据,并通过网络发送到远程备份中心。该方法实现了数据的高效、实时、异地备份,同时兼顾了安全性和高效性。  相似文献   

17.
Modern database systems employ Snapshot Isolation to implement concurrency control and isolationbecause it promises superior query performance compared to lock-based alternatives. Furthermore, Snapshot Isolation never blocks readers, which is an important property for modern information systems, which have mixed workloads of heavy OLAP queries and short update transactions. This paper revisits the problem of implementing Snapshot Isolation in a distributed database system and makes three important contributions. First, a complete definition of Distributed Snapshot Isolation is given, thereby extending existing definitions from the literature. Based on this definition, a set of criteria is proposed to efficiently implement Snapshot Isolation in a distributed system. Second, the design space of alternative methods to implement Distributed Snapshot Isolation is presented based on this set of criteria. Third, a new approach to implement Distributed Snapshot Isolation is devised; we refer to this approach as Incremental. The results of comprehensive performance experiments with the TPC-C benchmark show that the Incremental approach significantly outperforms any other known method from the literature. Furthermore, the Incremental approach requires no a priori knowledge of which nodes of a distributed system are involved in executing a transaction. Also, the Incremental approach can execute transactions that involve data from a single node only with the same efficiency as a centralized database system. This way, the Incremental approach takes advantage of sharding or other ways to improve data locality. The cost for synchronizing transactions in a distributed system is only paid by transactions that actually involve data from several nodes. All these properties make the Incremental approach more practical than related methods proposed in the literature.  相似文献   

18.
In this paper, we propose a practical disk error recovery scheme tolerating multiple simultaneous disk failures in a typical RAID system, resulting in improvement in availability and reliability. The scheme is composed of the encoding and the decoding processes. The encoding process is defined by making one horizontal parity and a number of vertical parities. The decoding process is defined by a data recovering method for multiple disk failures including the parity disks. The proposed error recovery scheme is proven to correctly recover the original data for multiple simultaneous disk failures regardless of the positions of the failed disks. The proposed error recovery scheme only uses exclusive OR operations and simple arithmetic operations, which can be easily implemented on current RAID systems without hardware changes.  相似文献   

19.
刘超  张明安 《软件》2014,(3):125-128
Oracle数据库系统是目前世界的主流数据库,是一种集业务量大、存储量大、极其灵活等特点的关系型数据库。它可以运行在多种硬件平台和操作系统上,在我国工商业、军事、航空等诸多领域发挥积极作用。它承担着业务数据的存储和处理任务,对于整个应用系统极为重要。数据库系统的可靠性和可用性是首要的需求,为确保数据库系统持续稳定高效的运行,必须保证较高的可靠性、稳定性。由于计算机出现的安全问题(如设备故障、操作系统故障、网络通信中断、病毒攻击、木马、间谍软件、可疑代码、端口扫描、DoS/DDo等),这些异常的数据库操作都会造成数据的不完整和数据丢失。所以需要建立相应的备份和容灾机制,对数据的集中管控、容灾与保护。本文探讨Oracle数据库的备份与恢复相关技术,并给出了相应的解决方案。  相似文献   

20.
吴志军  刘中  胡涛涛 《计算机科学》2017,44(Z11):366-371
广域信息管理(System Wide Information Management,SWIM)是“航空云(Aeronautical Cloud)”的基础设施,用于航空交通运输相关信息的传输与共享。SWIM系统的可靠性和生存能力对航空交通运输的安全运行具有重大影响。设计了面向SWIM系统生存能力的弹性灾难恢复方案。该方案采用Linux虚拟服务的组织架构,改进了Linux虚拟服务中的加权最小连接WLC(Weighted Least-Connection)调度算法,提高了SWIM服务的连续性。实验结果表明,改进后的加权最小连接算法可以有效提升SWIM系统的灾难恢复能力,满足SWIM系统的弹性灾难恢复需求。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号