首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A Survey of Fault Management in Wireless Sensor Networks   总被引:4,自引:0,他引:4  
Wireless sensor networks are resource-constrained self-organizing systems that are often deployed in inaccessible and inhospitable environments in order to collect data about some outside world phenomenon. For most sensor network applications, point-to-point reliability is not the main objective; instead, reliable event-of-interest delivery to the server needs to be guaranteed (possibly with a certain probability). The nature of communication in sensor networks is unpredictable and failure-prone, even more so than in regular wireless ad hoc networks. Therefore, it is essential to provide fault tolerant techniques for distributed sensor applications. Many recent studies in this area take drastically different approaches to addressing the fault tolerance issue in routing, transport and/or application layers. In this paper, we summarize and compare existing fault tolerant techniques to support sensor applications. We also discuss several interesting open research directions. Lilia Paradis is currently a graduate student in the Department of Mathematical and Computer Sciences, Colorado School of Mines. She is also part of the Toilers Ad Hoc Networking research group. She is interested in distributed communication protocols for wireless sensor networks. Qi Han received the PhD degree in computer science from the University of California, Irvine in 2005. She is currently an assistant professor in the Department of Mathematical and Computer Sciences, Colorado School of Mines. Her research interests include distributed systems, middleware, mobile and pervasive computing, systems support for sensor applications, and dynamic data management. She is specifically interested in developing adaptive middleware techniques for next generation distributed systems. She is a member of the IEEE and the ACM.  相似文献   

2.
Cloud Computing has evolved to become an enabler for delivering access to large scale distributed applications running on managed network-connected computing systems. This makes possible hosting Distributed Enterprise Information Systems (dEISs) in cloud environments, while enforcing strict performance and quality of service requirements, defined using Service Level Agreements (SLAs). SLAs define the performance boundaries of distributed applications, and are enforced by a cloud management system (CMS) dynamically allocating the available computing resources to the cloud services. We present two novel VM-scaling algorithms focused on dEIS systems, which optimally detect most appropriate scaling conditions using performance-models of distributed applications derived from constant-workload benchmarks, together with SLA-specified performance constraints. We simulate the VM-scaling algorithms in a cloud simulator and compare against trace-based performance models of dEISs. We compare a total of three SLA-based VM-scaling algorithms (one using prediction mechanisms) based on a real-world application scenario involving a large variable number of users. Our results show that it is beneficial to use autoregressive predictive SLA-driven scaling algorithms in cloud management systems for guaranteeing performance invariants of distributed cloud applications, as opposed to using only reactive SLA-based VM-scaling algorithms.  相似文献   

3.
4.
Delivering Internet-scale services and IT-enabled capabilities as computing utilities has been made feasible through the emergence of Cloud environments. While current approaches address a number of challenges such as quality of service, live migration and fault tolerance, which is of increasing importance, refers to the embedding of users’ and applications’ behaviour in the management processes of Clouds. The latter will allow for accurate estimation of the resource provision (for certain levels of service quality) with respect to the anticipated users’ and applications’ requirements. In this paper we present a two-level generic black-box approach for behavioral-based management across the Cloud layers (i.e., Software, Platform, Infrastructure): it provides estimates for resource attributes at a low level by analyzing information at a high level related to application terms (Translation level) while it predicts the anticipated user behaviour (Behavioral level). Patterns in high-level information are identified through a time series analysis, and are afterwards translated to low-level resource attributes with the use of Artificial Neural Networks. We demonstrate the added value and effectiveness of the Translation level through different application scenarios: namely FFMPEG encoding, real-time interactive e-Learning and a Wikipedia-type server. For the latter, we also validate the combined level model through a trace-driven simulation for identifying the overall error of the two-level approach.  相似文献   

5.
Research on planning support systems (PSS) is increasingly paying attention to the added value that PSS applications have for planning practice. Whereas early studies tended to have a rather conceptual focus, recent studies have paid more attention to empirics. Although this is a step forward, there is still a notable gap in the literature: a dearth of empirical evaluations of PSS applications from a comparative perspective. This paper addresses this gap, based on an earlier published conceptual framework that identifies the potential added values of PSS applications. The paper also tentatively explores the effect of three explanatory factors: support capabilities of the PSS, usability, and the context. In doing so, it reports on research of four PSS applications in The Netherlands. The research method consisted of questionnaires completed directly after the session, open interviews and conversations with stakeholders, and observations. With regard to added value as perceived by the participants, the findings indicate that learning, both about the object and about others, was a key perceived added value in all four cases, despite differences in context, support capabilities and usability scores. Moreover, although usability perceptions of the PSS applications varied, overall they were relatively positive. Context appears to have a substantial effect on the perceived added value of the PSS application, making it hard to distil the exact effect of the support capabilities and usability perceptions. The effect of context is one of the topics that could be picked up in further studies into the added value of PSS. One way to accomplish this in future research is by comparing a larger number of different PSS applications in different contexts, resulting in a higher n in order to enable correlational analyses and cross-national comparisons to better grasp the influence of the institutional context.  相似文献   

6.
Automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compelling case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. The paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.  相似文献   

7.
Characterizing and Predicting Resource Demand by Periodicity Mining   总被引:2,自引:0,他引:2  
We present algorithms for characterizing the demand behavior of applications and predicting demand by mining periodicities in historical data. Our algorithms are change-adaptive, automatically adjusting to new regularities in demand patterns while maintaining low algorithm running time. They are intended for applications in scientific computing clusters, enterprise data centers, and Grid and Utility environments that exhibit periodical behavior and may benefit significantly from automation. A case study incorporating data from an enterprise data center is used to evaluate the effectiveness of our technique.Artur Andrzejak received the PhD degree in computer science from the Swiss Federal Institute of Technology (ETH Zurich) in 2000. He is currently a researcher at Zuse-Institute Berlin, Germany. He was a postdoctoral researcher at the Hewlett-Packard Labs in Palo Alto, California, from 2001 to 2002. His research interests include systems management and modeling, and Grids.Mehmet Ceyran is working toward his Masters Degree in Computer Science at the Freie Universität Berlin, Germany. He has been employed as a student programmer at Zuse-Institute Berlin since 2003. His research interests include software engineering, systems management, and artificial intelligence.  相似文献   

8.
9.
Open environments like the Internet or corporate intranets enable a large number of interested enterprises to access, filter, process and present information on an as-needed basis. These environments support modern applications, such as virtual enterprises and interorganisational workflow management systems, which involve a number of heterogeneous resources, services and processes. However, any execution of a virtual enterprise system would yield to disjoining and error-prone behaviour without appropriate techniques to coordinate the various business processes. This paper reports on the design and implementation of a flexible agent-based framework for supporting the coordination of virtual enterprises and workflow management systems. The paper also shows how an agent coordination infrastructure, which is explained by social constraints, can impact on the engineering of highly dynamic virtual enterprises and workflow management systems by presenting a simple case study.  相似文献   

10.
Modern automation systems have to cope with large amounts of sensor data to be processed, stricter security requirements, heterogeneous hardware, and an increasing need for flexibility. The challenges for tomorrow’s automation systems need software architectures of today’s real-time controllers to evolve.This article presents FASA, a modern software architecture for next-generation automation systems. FASA provides concepts for scalable, flexible, and platform-independent real-time execution frameworks, which also provide advanced features such as software-based fault tolerance and high degrees of isolation and security. We show that FASA caters for robust execution of time-critical applications even in parallel execution environments such as multi-core processors.We present a reference implementation of FASA that controls a magnetic levitation device. This device is sensitive to any disturbance in its real-time control and thus, provides a suitable validation scenario. Our results show that FASA can sustain its advanced features even in high-speed control scenarios at 1 kHz.  相似文献   

11.
Failures are normal rather than exceptional in cloud computing environments, high fault tolerance issue is one of the major obstacles for opening up a new era of high serviceability cloud computing as fault tolerance plays a key role in ensuring cloud serviceability. Fault tolerant service is an essential part of Service Level Objectives (SLOs) in clouds. To achieve high level of cloud serviceability and to meet high level of cloud SLOs, a foolproof fault tolerance strategy is needed. In this paper, the definitions of fault, error, and failure in a cloud are given, and the principles for high fault tolerance objectives are systematically analyzed by referring to the fault tolerance theories suitable for large-scale distributed computing environments. Based on the principles and semantics of cloud fault tolerance, a dynamic adaptive fault tolerance strategy DAFT is put forward. It includes: (i) analyzing the mathematical relationship between different failure rates and two different fault tolerance strategies, which are checkpointing fault tolerance strategy and data replication fault tolerance strategy; (ii) building a dynamic adaptive checkpointing fault tolerance model and a dynamic adaptive replication fault tolerance model by combining the two fault tolerance models together to maximize the serviceability and meet the SLOs; and (iii) evaluating the dynamic adaptive fault tolerance strategy under various conditions in large-scale cloud data centers and consider different system centric parameters, such as fault tolerance degree, fault tolerance overhead, response time, etc. Theoretical as well as experimental results conclusively demonstrate that the dynamic adaptive fault tolerance strategy DAFT has high potential as it provides efficient fault tolerance enhancements, significant cloud serviceability improvement, and great SLOs satisfaction. It efficiently and effectively achieves a trade-off for fault tolerance objectives in cloud computing environments.  相似文献   

12.
While offering many practical benefits for distributed applications, mobile agent systems pose some fundamental security challenges. In this paper, we present a new approach to mobile agent security which helps to address some of these challenges. We present a new technique, which we refer to as trust enhanced security, and apply it to mobile agent-based systems; this new technique advocates a shift in security solutions from security-centric to trust-centric. This extends the traditional security mechanisms by enabling trust decisions through explicit specification and management of security-related trust relationships. The integration of the trust decisions into security decision-making process leads to our trust enhanced security performance. A formal trust model is proposed and is incorporated into the development of a novel trust management architecture—MobileTrust for mobile agent-based applications. We have conducted detailed practical investigations to evaluate and validate the emergent properties of the trust enhanced security technique. We present and discuss the key results in this paper.  相似文献   

13.
Pierre Sens  Bertil Folliot 《Software》1998,28(10):1079-1099
This paper presents the design, implementation and performance evaluation of a software fault manager for distributed applications. Dubbed Star, it uses the natural redundancy existing in networks of workstations to offer a high level of fault tolerance. Fault management is transparent to the supported parallel applications. To improve the response time of fault-tolerant applications, Star implements non-blocking and incremental checkpointing to perform an efficient backup of process state. Moreover, Star is application independent, highly configurable. Star actually runs on top of SunOs and is easily portable to UNIX™-like operating systems. The current implementation is based on independent checkpointing and message logging. Measurements show the efficiency and the limits of this implementation. The challenge is to show that a software approach to fault tolerance can efficiently be implemented in a standard networked environment. © 1998 John Wiley & Sons, Ltd.  相似文献   

14.
Montgomery  Jami 《Real-Time Systems》2004,27(2):169-189
Updating application software is a common occurrence for modern computing systems. Software updates stem from the need to correct coding errors or to enhance the functionality of an application. Updating an application typically requires taking the current application offline and restarting a new application. This method of updating an application is perfectly acceptable for many general purpose-computing environments. However, in real-time environments that require high availability and have stringent timing constraints, taking a process offline for updates may be unacceptable or pose unnecessary risks. Some examples of these environments include telecommunications, air traffic control, railway control and medical patient monitoring. We present a new method to dynamically update a real-time application without having to take it offline. Our new method, which we call dynamic update for real-time systems, can be used to update real-time applications using rate-monotonic scheduling, while preserving the original deadline guarantees.  相似文献   

15.
Ubiquitous systems and applications involve interactions between multiple autonomous entities—for example, robots in a mobile ad-hoc network collaborating to achieve a goal, communications between teams of emergency workers involved in disaster relief operations or interactions between patients’ and healthcare workers’ mobile devices. We have previously proposed the Self-Managed Cell (SMC) as an architectural pattern for managing autonomous ubiquitous systems that comprise both hardware and software components and that implement policy-based adaptation strategies. We have also shown how basic management interactions between autonomous SMCs can be realised through exchanges of notifications and policies, to effectively program management and context-aware adaptations. We present here how autonomous SMCs can be composed and federated into complex structures through the systematic composition of interaction patterns. By composing simpler abstractions as building blocks of more complex interactions it is possible to leverage commonalities across the structural, control and communication views to manage a broad variety of composite autonomous systems including peer-to-peer collaborations, federations and aggregations with varying degrees of devolution of control. Although the approach is more broadly applicable, we focus on systems where declarative policies are used to specify adaptation and on context-aware ubiquitous systems that present some degree of autonomy in the physical world, such as body sensor networks and autonomous vehicles. Finally, we present a formalisation of our model that allows a rigorous verification of the properties satisfied by the SMC interactions before policies are deployed in physical devices.  相似文献   

16.
Few, distributed software-implemented fault tolerance (SIFT) environments have been experimentally evaluated using substantial applications to show that they protect both themselves and the applications from errors. We present an experimental evaluation of a SIFT environment used to oversee spaceborne applications as part of the Remote Exploration and Experimentation (REE) program at the Jet Propulsion Laboratory. The SIFT environment is built around a set of self-checking ARMOR processes running on different machines that provide error detection and recovery services to themselves and to the REE applications. An evaluation methodology is presented in which over 28,000 errors were injected into both the SIFT processes and two representative REE applications. The experiments were split into three groups of error injections, with each group successively stressing the SIFT error detection and recovery more than the previous group. The results show that the SIFT environment added negligible overhead to the application's execution time during failure-free runs. Correlated failures affecting a SIFT process and application process are possible, but the division of detection and recovery responsibilities in the SIFT environment allows it to recover from these multiple failure scenarios. Only 28 cases were observed in which either the application failed to start or the SIFT environment failed to recognize that the application had completed. Further investigations showed that assertions within the SIFT processes-coupled with object-based incremental checkpointing-were effective in preventing system failures by protecting dynamic data within the SIFT processes.  相似文献   

17.
Handheld devices like smartphones and tablets have emerged as one of the most promising platforms for Augmented Reality (AR). The increased usage of these portable handheld devices has enabled handheld AR applications to reach the end-users; hence, it is timely and important to seriously consider the user experience of such applications. AR visualizations for occluded objects enable an observer to look through objects. AR visualizations have been predominantly evaluated using Head-Worn Displays (HWDs), handheld devices have rarely been used. However, unless we gain a better understanding of the perceptual and cognitive effects of handheld AR systems, effective interfaces for handheld devices cannot be designed. Similarly, human perception of AR systems in outdoor environments, which provide a higher degree of variation than indoor environments, has only been insufficiently explored.In this paper, we present insights acquired from five experiments we performed using handheld devices in outdoor locations. We provide design recommendations for handheld AR systems equipped with visualizations for occluded objects. Our key conclusions are the following: (1) Use of visualizations for occluded objects improves the depth perception of occluded objects akin to non-occluded objects. (2) To support different scenarios, handheld AR systems should provide multiple visualizations for occluded objects to complement each other. (3) Visual clutter in AR visualizations reduces the visibility of occluded objects and deteriorates depth judgment; depth judgment can be improved by providing clear visibility of the occluded objects. (4) Similar to virtual reality interfaces, both egocentric and exocentric distances are underestimated in handheld AR. (5) Depth perception will improve if handheld AR systems can dynamically adapt their geometric field of view (GFOV) to match the display field of view (DFOV). (6) Large handheld displays are hard to carry and use; however, they enable users to better grasp the depth of multiple graphical objects that are presented simultaneously.  相似文献   

18.
This paper studies the performance of Peer-to-Peer storage and backup systems (P2PSS). These systems are based on three pillars: data fragmentation and dissemination among the peers, redundancy mechanisms to cope with peers churn and repair mechanisms to recover lost or temporarily unavailable data. Usually, redundancy is achieved either by using replication or by using erasure codes. A new class of network coding (regenerating codes) has been proposed recently. Therefore, we will adapt our work to these three redundancy schemes. We introduce two mechanisms for recovering lost data and evaluate their performance by modeling them through absorbing Markov chains. Specifically, we evaluate the quality of service provided to users in terms of durability and availability of stored data for each recovery mechanism and deduce the impact of its parameters on the system performance. The first mechanism is centralized and based on the use of a single server that can recover multiple losses at once. The second mechanism is distributed: reconstruction of lost fragments is iterated sequentially on many peers until that the required level of redundancy is attained. The key assumptions made in this work, in particular, the assumptions made on the recovery process and peer on-times distribution, are in agreement with the analysis in [1] and in [2] respectively. The models are thereby general enough to be applicable to many distributed environments as shown through numerical computations. We find that, in stable environments such as local area or research institute networks where machines are usually highly available, the distributed-repair scheme in erasure-coded systems offers a reliable, scalable and cheap storage/backup solution. For the case of highly dynamic environments, in general, the distributed-repair scheme is inefficient, in particular to maintain high data availability, unless the data redundancy is high. Using regenerating codes overcomes this limitation of the distributed-repair scheme. P2PSS with centralized-repair scheme are efficient in any environment but have the disadvantage of relying on a centralized authority. However, the analysis of the overhead cost (e.g. computation, bandwidth and complexity cost) resulting from the different redundancy schemes with respect to their advantages (e.g. simplicity), is left for future work.  相似文献   

19.
Data‐intensive applications process large volumes of data using a parallel processing method. MapReduce is a programming model designed for data‐intensive applications for massive data sets and an execution framework for large‐scale data processing on clusters of commodity servers. While fault tolerance, easy programming structure, and high scalability are considered strong points of MapReduce; however its configuration parameters must be fine‐tuned to the specific deployment, which makes it more complex in configuration and performance. This paper explains tuning of the Hadoop configuration parameters, which directly affect MapReduce's job workflow performance under various conditions to achieve maximum performance. On the basis of the empirical data we collected, it became apparent that three main methodologies can affect the execution time of MapReduce running on cluster systems. Therefore, in this paper, we present a model that consists of three main modules: (1) Extending a data redistribution technique in order to find the high‐performance nodes, (2) Utilizing the number of map/reduce slots in order to make it more efficient in terms of execution time, and (3) Developing a new hybrid routing schedule shuffle phase in order to define the scheduler task while memory management level is reduced.  相似文献   

20.
We present an implementation of a policy-based management architecture for emerging communications and computing paradigms such as Active Networks and the Grid. To manage such open, highly distributed and decentralized environments, an approach based on policy concepts is adopted, allowing support for active, dynamic adaptability in network elements, services and end-user applications, as well as achieving decentralization and distribution. We present our flexible, extensible policy and event specifications in XML, and describe our management architecture. One key feature of our approach is the distributed infrastructure: the Directory and the Management Information Distribution system. The second feature is the Resource and Security Management elements residing on the multi-node managed systems. These combine to provide a light-weight, self-organizing management architecture. As an applications example, we describe the implementation of our management system applied to the Application Level Active Networking (ALAN) environment, implemented in the European Commission Information Society Technologies (IST) project ANDROID.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号