期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

李军国黄罡邹键梅宏《计算机学报》2007,30(10):1696-1704

提出一种基于运行时刻软件体系结构的容错管理方法,支持开发者和管理员针对不同中间件服务失效定制合适的故障检测和修复机制.首先,运行时刻软件体系结构自动构造构件依赖视图和错误传播①视图,为理解和分析整个系统的可靠性提供全局视图;然后,操作运行时刻软件体系结构配置容错机制;最后利用AOP技术将容错机制插装到中间件中,使其具备指定的容错能力.上述过程在一个可视化工具的辅助下半自动实施,并在J2EE中间件上得到验证. 相似文献

2.

软件双冗余容错系统的容错能力和性能分析 总被引：1，自引：0，他引：1

吴斌高珑《计算机研究与发展》2009,46(Z2)

双冗余是比较常用的冗余容错设计方法.软件双冗余容错系统通过冗余执行完成相同功能的两个软件副本,并检查它们的结果,根据两者结果是否一致来判断是否出现了错误.建立了软件双冗余容错系统的运行时模型,并引入了软件双冗余容错系统的容错能力的概念.根据该模型分析了单个软件副本的容错能力对软件双冗余容错系统的容错能力和性能的影响.分析结果显示,提高单个软件副本的容错能力不仅能够提高软件双冗余容错系统的容错能力,还能够提高系统的性能.但在极端情况下,双冗余容错系统的容错能力也可能会小于单个软件副本的容错能力. 相似文献

3.

Fault tolerance in supervisory control systems: a knowledge-based approach

Dimitris Th. Askounis Vassilis Assimakopoulos John Psarras 《Journal of Intelligent Manufacturing》1994,5(5):323-331

Fault tolerance in computerized systems involved in production has become an ever more important requirement. Existing fault tolerance approaches, wherever used, deal mainly with hardware faults. Nevertheless, the vast majority of contemporary system failures are software related. This paper introduces a knowledge-based approach to handling software related faults occurring in supervisory control systems. These systems are event driven and use data, stored in complex databases, to react to events coming from different kinds of devices by identifying, scheduling, initiating and monitoring operations. Failure of part of the supervisory control system's software to behave rationally when unexpected events occur is called an application fault. The approach introduced in this paper is based on a supervisory control system reference model which reveals the set of all possible application faults together with the major functions of the recovery processes associated with each fault, and leads to a high-level knowledge-based system architecture capable of handling every fault-related condition. This system is called PROFIT (Intelligent PROduction systems Fault Tolerance) and consists of three main components: the fault diagnosis module, the instant fault correction module and the learning module, co-ordinated by a PROFIT meta-level module. The prototype version of PROFIT is analysed and the development as well as the run-time environment that prove the applicability and effectiveness of the system are presented. 相似文献

4.

基于行为协议的构件软件静态测试研究

于素萍杨偱杰《微机发展》2008,18(3):128-131

系统的静态分析能在设计开发阶段发现错误,从而避免了在运行时错误检测技术在系统执行期间带来的负面影响。基于尽可能避免静态错误这一构件测试策略的基本思想提出了一种对构件化软件系统进行静态测试的方法。采用通信模型对数据库服务构件系统进行抽象建模,并结合用于描述构件系统中构件之间交互的形式化方法行为协议,通过对构件系统构件行为协议的一致性验证,从而测试构件交互的正确性。相似文献

5.

Self-adaptable media service architecture for guaranteeing reliable multimedia services

G. Maria Kalavathy N. Edison Rathinam P. Seethalakshmi 《Multimedia Tools and Applications》2012,57(3):633-650

The main objective of this paper is to design and develop a Self-Adaptable Media Service Architecture (SAMSA) for providing reliable dynamic composite multimedia service through policy-based actions. The multimedia services such as media retrieval, transcoding, scaling and display services are combined based on the preferences of the user to create a dynamic composite multimedia service called as Video-on-Demand service. Such distributed multimedia services deployed using Service Oriented Architecture (SOA) can be accessed in heterogeneous environments that are prone to changes during run-time. To provide reliable and adaptive multimedia services, a powerful self-adaptable architecture with dynamic compositions of multimedia services is necessary to adapt during run-time and react to the environment. The adaptability in this proposed architecture is achieved by enabling the service providers to Monitor, Analyze and Act on the defined policies that support customization of compositions of multimedia services and guarantee the Quality of Service (QoS) provisioning. The Media Service Monitor (MSM) observes the business and quality metrics associated with the multimedia services during run-time. The monitored results are analyzed by Monitored Results Analyzer (MRA) which identifies the type and location of the fault. The Adaptive Media Service Manager (AMSM) takes corrective actions based on the monitored results, through the policies defined as an extension of WS-Policy (Web Service—Policy framework). The effectiveness of the proposed Self-Adaptable Media Service Architecture (SAMSA) has been evaluated on Dynamic Composite Real-time Video-on-Demand Web Service (DCRVoDWS) for a maximum of 200 simultaneous client’s requests. The analysis of results shows that the proposed architecture provides better improvement on reliability, response time and user satisfaction. 相似文献

6.

Component-based tailorability: Enabling highly flexible software applications

《International journal of human-computer studies》2008,66(1):1-22

Component technologies are perceived as an important means to keep software architectures flexible. Flexibility offered by component technologies typically addresses software developers at design time. However, the design of software which should support social systems, such as work groups or communities, also demands ‘use-time’, or technically spoken, ‘run-time’ flexibility. In this paper, we summarize a decade of research efforts on component-based approaches to flexibilize groupware applications at run-time. We address the user as a ‘casual programmer’ who develops and individualizes software for his work context. To deal with the challenges of run-time flexibility, we developed a design approach which covers three levels: software architecture, user interface, and collaboration support. With regard to the software architecture, a component model, called FlexiBeans, has been developed. The FreEvolve platform serves as an environment in which component-based applications can be tailored at run-time. Additionally, we have developed three different types of graphical user interfaces, enabling users to tailor their applications by recomposing components. To enable collaborative tailoring activities, we have integrated functions that allow sharing component structures among users. We also present different types of support techniques which are integrated into the user interface in order to enable users’ individual and collaborative tailoring activities. We conclude by elaborating on the notion of ‘software infrastructure’ which offers a holistic approach to support design activities of professional and non-professional programmers. 相似文献

7.

Heuristics-based mediation for building smart architectures at run-time

《Computer Standards & Interfaces》2021

Smart architectures are increasingly being used in current software development. Smart user interfaces, smart homes, or smart buildings are becoming common examples in the new era of smart cities. Software architectures usually related to these domains need to be adapted and reconfigured at run-time, for example, to provide new services, react to user interaction, or due to changes decided from the business logic of the application. Component-based techniques are a suitable way to carry out this kind of adaptation, as dynamic reconfiguration operations can be applied to the architecture. In this paper, we address run-time generation of component-based applications, taking the abstract definitions of their architecture as a reference, in addition to a set of available components. The process calculates the best configuration of components from the abstract definition by applying a trading approach based on an adapted A* algorithm. This algorithm uses heuristics based on syntactic and semantic information obtained from the component definitions. A case study related to mashup user interfaces formed by coarse-grained components is also explained. In short, the results show the usefulness of heuristics and suitable execution times for building the best configurations. 相似文献

8.

中间件二进制兼容技术的设计和实现

裴睿陈志成杨维康张素琴《计算机工程与设计》2006,27(3):361-364

CAR构件技术是新近发展起来的一种新的构件化编程技术。详细阐述了CAR构件平台二进制兼容技术的设计原理,着重介绍了CAR构件平台虚拟机在Linux操作系统上运行的系统架构和重要技术环节的实现。通过与微软．NET和SUN Java虚拟机等相关技术在跨平台兼容性方面的比较,分析了这种二进制级别兼容技术的特点和对于软件产业的现实意义。相似文献

9.

Software approaches for resilience of high performance computing systems: a survey

Jie JIA Yi LIU Guozhen ZHANG Yulin GAO Depei QIAN 《Frontiers of Computer Science》2023,17(4):174105

With the scaling up of high-performance computing systems in recent years, their reliability has been descending continuously. Therefore, system resilience has been regarded as one of the critical challenges for large-scale HPC systems. Various techniques and systems have been proposed to ensure the correct execution and completion of parallel programs. This paper provides a comprehensive survey of existing software resilience approaches. Firstly, a classification of software resilience approaches is presented; then we introduce major approaches and techniques, including checkpointing, replication, soft error resilience, algorithm-based fault tolerance, fault detection and prediction. In addition, challenges exposed by system-scale and heterogeneous architecture are also discussed. 相似文献

10.

An empirical comparison of software fault tolerance and faultelimination

Shimeall T.J. Leveson N.G. 《IEEE transactions on pattern analysis and machine intelligence》1991,17(2):173-182

The authors compared two major approaches to the improvement of software-software fault elimination and software fault tolerance-by examination of the fault detection (and tolerance, where applicable) of five techniques: run-time assertions, multiversion voting, functional testing augmented by structural testing, code reading by stepwise abstraction, and static data-flow analysis. The focus was on characterizing the sets of faults detected by the techniques and on characterizing the relationships between these sets of faults. Two categories of questions were investigated: (1) comparison between fault elimination and fault tolerance techniques and (2) comparisons among various testing techniques. The results provide information useful for making decisions about the allocation of project resources, show strengths and weaknesses of the techniques studies, and indicate directions for future research 相似文献

11.

Helenic fault tolerance for robots

George Toye Larry J. Leifer 《Computers & Electrical Engineering》1994,20(6):479-497

In robot applications where the consequences of system failure are unbearable, fault tolerance is mandatory. Fault tolerant robots continue to function correctly despite component failures. Fault tolerant robots can be designed using the Helenic architecture. This architecture uses non-homogeneous functional modular redundancy and a democratic dynamic weighted voting algorithm for redundancy management to achieve fault tolerance. The benefits offered are increased reliability, maintainability, common mode failure resistance, and significant cost reductions. To demonstrate the fault tolerance capabilities of this system architecture, a 5 wheel omnidirectional mobile robot with sensors, computing elements and actuators was designed and simulated. Simulation results verify the robot's ability to continue ‘correct’ operation despite internal subsystem failures. 相似文献

12.

Event-driven configuration of a neural network CMP system over an homogeneous interconnect fabric

M.M. Khan J. NavaridasX. Jin L.A. PlanaM. Luján S. TempleC. Patterson D. RichardsJ.V. Woods J. Miguel-AlonsoS.B. Furber 《Parallel Computing》2011,37(8):392-409

Configuring a million-core parallel system at boot time is a difficult process when the system has neither specialised hardware support for the configuration process nor a preconfigured default state that puts it in operating condition. The architecture of SpiNNaker, a parallel chip multiprocessor (CMP) system for neural network simulation, is in this class. To function as a universal neural chip, SpiNNaker uses an event-driven model with complete system virtualisation so that all components are generic and identical. Where most large CMP systems feature a sideband network to complete the boot process, SpiNNaker has a single homogeneous network interconnect for both application inter-processor communications and system control functions. This network improves fault tolerance and makes it easier to support dynamic run-time reconfiguration, however, it requires a boot process compatible with the application’s communications model. Here, we present such a boot loader, capable of bringing a generic, initially unconfigured parallel system into a working configuration. Since SpiNNaker uses event-driven asynchronous communications throughout, the loader operates with purely local control: there is no global synchronisation, state information, or transition sequence. A novel two-stage “unfolding” boot-up process efficiently configures the SpiNNaker hardware and loads the application using a high-speed flood-fill technique with support for run-time reconfiguration. SystemC simulation of a multi-CMP SpiNNaker system indicates an error-free CMP configuration time of ∼1.37 ms, while a high-level simulation of a full-scale system (64 K CMPs) indicates a mean application-loading time of ∼20 ms (for a 100 KB application), which is virtually independent of the size of the system. Further hardware-level Verilog simulation verified the cycle-accurate functionality of CMP configuration. The complete process illustrates a useful method for configuring large-scale event-driven parallel systems without having to provide dedicated hardware boot support or rely on system state assumptions. 相似文献

13.

NCSWT: An integrated modeling and simulation tool for networked control systems

《Simulation Modelling Practice and Theory》2012

Networked Control Systems (NCS) are becoming increasingly ubiquitous in a growing number of applications, such as groups of unmanned aerial vehicles and industrial control systems. The evaluation of NCS properties such as stability and performance is very important given that these systems are typically deployed in critical settings. This paper presents the Networked Control Systems Wind Tunnel (NCSWT), an integrated modeling and simulation tool for the evaluation of Networked Control Systems (NCS). NCSWT integrates Matlab/Simulink and ns-2 for modeling and simulation of NCS using the High Level Architecture (HLA) standard. The tool is composed of two parts, the design-time models and the run-time components. The design-time models use Model Integrated Computing (MIC) to define HLA-based model constructs such as federates representing the simulators and interactions representing the communication between the simulators. MIC techniques facilitate the modeling and design of complex systems by using abstractions defined in domain-specific modeling languages (DSMLs) to describe the systems. The design-time models represent the control system dynamics and networking system behaviors in order to facilitate the run-time simulation of a NCS. The run-time components represent the main software components and interfaces for the actual realization of a NCS simulation using the HLA framework. Our implementation of the NCSWT based on HLA guarantees accurate time synchronization and data communication. Two case studies are presented to demonstrate the capabilities of the tool as well as evaluate the impact of network effects on NCS. 相似文献

14.

Parallel software development in the DISC programming environment

G. Iannello A. Mazzeo C. Savy G. Ventre 《Future Generation Computer Systems》1990,5(4):365-372

This paper describes the architecture of DISC, a system for parallel software development. The system is designed for programming computer systems having several autonomous units, not memory-sharing, and linked by means of a communication network.

The system consists of three parts. The concurrent programming language DISC (DIStributed C), which is an extension of the C language based on the concurrent mechanisms envisaged by the CSP computational model. The programming environment, designed to promote software engineering techniques in the development of distributed-programs. The language run-time support, which provides for the distributed execution of programs. 相似文献

15.

一种面向图的分布式软件动态配置和容错方法 总被引：1，自引：0，他引：1

宋毅刘云超《计算机应用》2003,23(12):37-41

提出一种新的方法，通过动态配置对基于组件的分布式软件的容错提供支持。此方法采用面向图的GOP编程模型，将整个分布式软件的体系结构用一张逻辑图来描述，系统的动态配置可以通过执行图上预定义的一组操作来完成。检测到故障或异常的时候实施这种动态配置能够支持系统的容错。文中描述了此方法的基本模型、系统结构和基于CORBA的原型实现。相似文献

16.

Fault-tolerant grid architecture and practice 总被引：10，自引：0，他引：10

下载免费PDF全文

金海邹德清陈汉华孙建华吴松《计算机科学技术学报》2003,18(4):0-0

Grid computing emerges as effective technologies to couple geographically dis-tributed resources and solve large-scale computational problems in wide area networks. The fault tolerance is a significant and complex issue in grid computing systems. Various techniques have been investigated to detect and correct faults in distributed computing systems. Unreliable fault detection is one of the most effective techniques. Globus as a grid middleware manages resources in a wide area network. The Globns fault detection service uses the well-known techniques basedon unreliable fault detectors to detect and report component failures. However, more powerful techniques are required to detect and correct both system-level and application-level faults in agrid system, and a convenient toolkit is also needed to maintain the consistency in the grid. Afault-tolerant grid platform (FTGP) based on an unreliable fault detector and the Globus faultdetection service is presented in this paper. The platform offers effective strategies in such threeaspects as grid key components, user tasks, and high-level applications. 相似文献

17.

HyDB:集成MapReduce和数据库的高效SaaS架构 总被引：1，自引：0，他引：1

覃左言朱青李伏《小型微型计算机系统》2012,33(3):512-518

随着数据的快速增长和云计算的兴起,软件作为服务(SaaS)标志着计算机系统按需服务的应用的兴起.高效经济SaaS使得许多企业将大规模数据分析服务从部署在并行数据库的高端服务器转移至更便宜的无共享体系结构的低端服务器集群上.论文提出了集成MapReduce和数据库的高效经济SaaS架构—HyDB系统,解决海量结构化,半结构化与非结构化数据的高效查询服务,通过对数据的存储模型和查询模型进行研究,提出了完整的数据存储和查询服务方案,给出基于队列的作业调度算法,并支持针对简约数据查询的快速响应模式.最后通过可扩展实验,证明了该系统架构具有良好的加载性能、查询性能和容错能力,可以为用户提供优质的数据服务. 相似文献

18.

一种针对VxWorks系统的通用软件故障注入方法

范文豪马捷中孙姜燕《测控技术》2011,30(4):100-103

针对VxWorks下的系统提出了一种通用的基于软件的故障注入方法,给出了故障注入工具的架构和故障模型.故障注入工具使用TCP/IP协议与目标机进行通信,并在目标机上驻留一小部分程序,利用中断的性质完成故障的注入.实验在VMware虚拟机上完成,表明该方法能有效地将故障注入到VxWorks平台下的设备中,以供系统容错性能... 相似文献

19.

Design and evaluation of a fault-tolerant mobile-agent system

Lyu M.R. Xinyu Chen Tsz Yeung Wong 《Intelligent Systems, IEEE》2004,19(5):32-38

The mobile agents create a new paradigm for data exchange and resource sharing in rapidly growing and continually changing computer networks. In a distributed system, failures can occur in any software or hardware component. A mobile agent can get lost when its hosting server crashes during execution, or it can get dropped in a congested network. Therefore, survivability and fault tolerance are vital issues for deploying mobile-agent systems. This fault tolerance approach deploys three kinds of cooperating agents to detect server and agent failures and recover services in mobile-agent systems. An actual agent is a common mobile agent that performs specific computations for its owner. Witness agents monitor the actual agent and detect whether it's lost. A probe recovers the failed actual agent and the witness agents. A peer-to-peer message-passing mechanism stands between each actual agent and its witness agents to perform failure detection and recovery through time-bounded information exchange; a log records the actual agent's actions. When failures occur, the system performs rollback recovery to abort uncommitted actions. Moreover, our method uses checkpointed data to recover the lost actual agent. 相似文献

20.

大型压缩机组设备群的故障预测与健康管理 总被引：1，自引：0，他引：1

徐光华高建民刘弹张庆梁霖温广瑞《控制工程》2010,(Z1)

介绍了压缩机组设备群—复杂系统的故障预示和健康状态管理技术和系统的研究。融合早期故障预示、综合故障诊断和远程专家诊断技术,研究设备群的健康状态管理技术,并研发了压缩机组设备群故障诊断和健康状态管理系统,实现主辅机故障诊断和健康状态管理一体化、监测控制信息利用共享化、生产企业和设备制造企业知识资源利用最大化,为石化生产企业和设备制造企业提供一套全生命周期的压缩机组设备群故障诊断和健康管理的系统解决方案。相似文献