共查询到20条相似文献,搜索用时 117 毫秒
1.
In many domains, the previous decade was characterized by increasing data volumes and growing complexity of data analyses, creating new demands for batch processing on distributed systems. Effective operation of these systems is challenging when facing uncertainties about the performance of jobs and tasks under varying resource configurations, e. g., for scheduling and resource allocation. We survey predictive performance modeling (PPM) approaches to estimate performance metrics such as execution duration, required memory or wait times of future jobs and tasks based on past performance observations. We focus on non-intrusive methods, i. e., methods that can be applied to any workload without modification, since the workload is usually a black box from the perspective of the systems managing the computational infrastructure. We classify and compare sources of performance variation, predicted performance metrics, limitations and challenges, required training data, use cases, and the underlying prediction techniques. We conclude by identifying several open problems and pressing research needs in the field. 相似文献
2.
3.
Alberto Núñez Javier Fernández Rosa Filgueira Félix García Jesús Carretero 《Simulation Modelling Practice and Theory》2012,20(1):12-32
In this paper we propose a new simulation platform called SIMCAN, for analyzing parallel and distributed systems. This platform is aimed to test parallel and distributed architectures and applications. The main characteristics of SIMCAN are flexibility, accuracy, performance, and scalability. Thence, the proposed platform has a modular design that eases the integration of different basic systems on a single architecture. Its design follows a hierarchical schema that includes simple modules, basic systems (computing, memory managing, I/O, and networking), physical components (nodes, switches, …), and aggregations of components. New modules may also be incorporated as well to include new strategies and components. Also, a graphical configuration tool has been developed to help untrained users with the task of modelling new architectures. Finally, a validation process and some evaluation tests have been performed to evaluate the SIMCAN platform. 相似文献
4.
5.
软件再生理论认为,计算系统运行过程中的系统资源损耗是影响系统性能的主要因素。设计一个性能监控系统,通过采集和分析资源使用情况,适时释放被损耗的资源可以有效保证系统的持续高性能。监控系统采用C/S模式以减轻监控端的负载,保证监控端的轻量级,同时实现对监控端的异步监控;基于自组织映射网络对数据的分析,实现对监控端监控参数的自适应调节;提供多种数学模型对系统性能变化进行分析和预测;设计了简单有效的决策方法支持系统的重启控制;最后通过实验证明自适应采集策略有效减少了数据采集和传输量,保证了监控端的轻量级、低负载,尽可能地降低了监控系统本身对被监控系统的影响。 相似文献
6.
《International journal of human-computer studies》2014,72(1):126-139
Emergency planning is an ongoing activity in which a multidisciplinary group of experts intermittently collaborate to define the most appropriate response to risks. One of the most important tasks of emergency planning is the review of plans as a way of maintaining, refining, and improving them. This review of plans is based on exchanging knowledge and experiences in order to take into account different perspectives and generate alternative solutions. An exploratory case study carried out within municipal organizations has disclosed how the application of rigid plan reviewing practices hinders team creativity and, consequently, effective decision-making. This paper presents a computer-based collaborative environment aimed at supporting unstructured team discussion during the post-hoc review of emergency plan. This collaborative environment allows emergency planning team members to share their view in a free manner by interacting with user interface components distributed across several input and output dimensions. The usage of the environment has proved how the application of new interactive technologies can create more dynamic work settings, fostering team creativity. 相似文献
7.
A wireless sensor network (WSN) can be construed as an intelligent, largely autonomous, instrument for scientific observation at fine temporal and spatial granularities and over large areas. The ability to perform spatial analyses over sensor data has often been highlighted as desirable in areas such as environmental monitoring. Whilst there exists research on computing topological changes of dynamic phenomena, existing proposals do not allow for more expressive in-network spatial analysis. This paper addresses the challenges involved in using WSNs to identify, track and report topological relationships between dynamic, transient spatial phenomena and permanent application-specific geometries focusing on cases where the geometries involved can be characterized by sets of nodes embedded in a finite 2-dimensional space. The approach taken is algebraic, i.e., analyses are expressed as algebraic expressions that compose primitive operations (such as Adjacent, or AreaInside). The main contributions are distributed algorithms for the operations in the proposed algebra and an empirical evaluation of their performance in terms of bit complexity, response time, and energy consumption. 相似文献
8.
Rakesh Kushwaha 《Performance Evaluation》1993,18(3):189-204
This paper describes an accurate and efficient method to model and predict the performance of distributed/parallel systems. Various performance measures, such as the expected user response time, the system throughput and the average server utilization, can be easily estimated using this method. The methodology is based on known product form queueing network methods, with some additional approximations. The method is illustrated by evaluating performance of a multi-client multi-server distributed system. A system model is constructed and mapped to a probabilistic queueing network model which is used to predict its behavior. The effects of user think time and various design parameters on the performance of the system are investigated by both the analytical method and computer simulation. The accuracy of the former is verified. The methodology is applied to identify the bottleneck server and to establish proper balance between clients and servers in distributed/parallel systems. 相似文献
9.
The network weather service: a distributed resource performance forecasting service for metacomputing 总被引:39,自引:0,他引:39
The goal of the Network Weather Service is to provide accurate forecasts of dynamically changing performance characteristics from a distributed set of metacomputing resources. Providing a ubiquitous service that can both track dynamic performance changes and remain stable in spite of them requires adaptive programming techniques, an architectural design that supports extensibility, and internal abstractions that can be implemented efficiently and portably. In this paper, we describe the current implementation of the NWS for Unix and TCP/IP sockets and provide examples of its performance monitoring and forecasting capabilities. 相似文献
10.
11.
Large-scale plant-wide processes have become more common and monitoring of such processes is imperative. This work focuses on establishing a distributed monitoring scheme incorporating multivariate statistical analysis and Bayesian method for large-scale plant-wide processes. First, the necessity of distributed monitoring is demonstrated by theoretical analysis on the impact of process decomposition on multivariate statistical process monitoring performance. Second, a stochastic optimization algorithm-based performance-driven process decomposition method is proposed which aims to achieve the best possible monitoring performance from process decomposition aspect. Based on the obtained sub-blocks, local monitors are established to characterize local process behaviors, and then a Bayesian fault diagnosis system is established to identify the underlying process status of the entire process. The proposed distributed monitoring scheme is applied on a numerical example and the Tennessee Eastman benchmark process. Comparison results to some state-of-the-art methods indicate the efficiency and feasibility. 相似文献
12.
Monitoring resource consumptions is fundamental in software engineering, e.g., in validation of quality requirements, performance engineering, or adaptive software systems. However, resource monitoring does not come for free as it typically leads to overhead in the observed program. Minimizing this overhead and increasing the reliability of the monitored data is a major goal in realizing resource monitoring tools. Typically, this is achieved by limiting capabilities, e.g., supported resources, granularity of the monitoring focus, or runtime access to results. Thus, in practice often several approaches must be combined to obtain relevant information.We describe SPASS-meter, a novel resource monitoring approach for Java and Android Apps, which combines these conflicting capabilities with low overhead. SPASS-meter supports a large set of resources, flexible configuration of the monitoring scope even for user-defined semantic units (components), runtime analysis and online access to monitoring results in a platform-independent way. We discuss the concepts of SPASS-meter, its architecture, realization and validation, the latter in terms of case studies and an overhead analysis based on performance experiments with SPASS-meter, OpenCore and Kieker. SPASS-meter provides a detailed view of the runtime resource consumption at reasonable overhead of less than 3% processing power and 0.5% memory consumption in our experiments. 相似文献
13.
14.
James R. Larus 《Software》1990,20(12):1241-1258
Many areas of computer performance analysis require detailed traces of events that occur during a program's execution. Collecting traces is expensive. The additional code required to record events greatly slows a program's execution. In addition, the resulting trace files can grow unmanageably large. This paper describes a technique called abstract execution that alleviates both problems. Abstract execution records a small set of events during the traced program's execution. These events serve as input to an abstract version of the program that generates a full trace by re-executing selected portions of the original program. This process greatly reduces both the cost of tracing the original program and the size of the trace files. The cost of regenerating a trace is insignificant in comparison to the cost of applications that use it. This paper also describes a system called AE that implements Abstract Execution. The paper contains measurements that demonstrate that AE can efficiently trace large programs. 相似文献
15.
T. Fahringer P. Blaha A. Hssinger J. Luitz E. Mehofer H. Moritsch B. Scholz 《Concurrency and Computation》2001,13(10):841-868
Several large real‐world applications have been developed for distributed and parallel architectures. We examine two different program development approaches. First, the usage of a high‐level programming paradigm which reduces the time to create a parallel program dramatically but sometimes at the cost of a reduced performance; a source‐to‐source compiler, has been employed to automatically compile programs—written in a high‐level programming paradigm—into message passing codes. Second, a manual program development by using a low‐level programming paradigm—such as message passing—enables the programmer to fully exploit a given architecture at the cost of a time‐consuming and error‐prone effort. Performance tools play a central role in supporting the performance‐oriented development of applications for distributed and parallel architectures. SCALA—a portable instrumentation, measurement, and post‐execution performance analysis system for distributed and parallel programs—has been used to analyze and to guide the application development, by selectively instrumenting and measuring the code versions, by comparing performance information of several program executions, by computing a variety of important performance metrics, by detecting performance bottlenecks, and by relating performance information back to the input program. We show several experiments of SCALA when applied to real‐world applications. These experiments are conducted for a NEC Cenju‐4 distributed‐memory machine and a cluster of heterogeneous workstations and networks. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献
16.
In this paper we propose a new control performance monitoring method based on subspace projections. We begin with a state space model of a generally non-square process and derive the minimum variance control (MVC) law and minimum achievable variance in a state feedback form. We derive a multivariate time delay (MTD) matrix for use with our extended state space formulation, which implicitly is equivalent to the interactor matrix. We show how the minimum variance output space can be considered an optimal subspace of the general closed-loop output space and propose a simple control performance calculation which uses orthogonal projection of filtered output data onto past closed-loop data. Finally, we propose a control performance monitoring technique based on the output covariance and diagnose the cause of suboptimal control performance using generalized eigenvector analysis. The proposed methods are demonstrated on a few simulated examples and an industrial wood waste burning power boiler. 相似文献
17.
Communication protocol for monitoring a large number of remotely distributed hazardous material detection devices 总被引:2,自引:0,他引:2
Modern wireless communication technologies opened up new avenues of data collection from remotely distributed environmental sensors. Global Mobile Communications (GSM) and satellite telephone services currently cover almost all parts of the world. With this development, it became feasible to place and collect data from remote sensors at locations which were previously inaccessible due to distance or extremely restrictive conditions. Although data collection through wireless devices is becoming popular by the day, there is no unified protocol for sending and receiving information from remote devices. In this study, a communication protocol is developed for monitoring a large number of remotely distributed environmental devices. The protocol is being implemented as a part of a project which aims to place a large number of environmental monitoring devices throughout the United Arab Emirates (UAE). 相似文献
18.
Accurate, continuous resource monitoring and profiling are critical for enabling performance tuning and scheduling optimization. In desktop grid systems that employ sandboxing, these issues are challenging because (1) subjobs inside sandboxes are executed in a virtual computing environment and (2) the state of this virtual environment within the sandboxes is reset to an initial empty state after a subjob completion.DGMonitor is a monitoring tool which builds a global, accurate, and continuous view of real resource utilization for desktop grids with sandboxing. Our monitoring tool measures performance unobtrusively and reliably, uses a simple performance data model, and is easy to use. Our measurements demonstrate that DGMonitor can scale to large desktop grids (up to 12000 PCs) with low monitoring overhead in terms of resource consumption (less than 0.1% per machine).Though we originally developed DGMonitor with the Entropia DCGrid platform, our tool is easily portable and integrated into other desktop grid systems. In all of these systems, DGMonitor data can support existing and novel information services, particularly for performance tuning and scheduling. In this paper, the high scalability and monitoring power of DGMonitor are demonstrated with the Entropia DCGrid platform and the BOINC platform respectively. 相似文献
19.
Xuehai Zhang Jeffrey L. Freschl Jennifer M. Schopf 《Journal of Parallel and Distributed Computing》2007
Monitoring and information system (MIS) implementations provide data about available resources and services within a distributed system, or Grid. A comprehensive performance evaluation of an MIS can aid in detecting potential bottlenecks, advise in deployment, and help improve future system development. In this paper, we analyze and compare the performance of three implementations in a quantitative manner: the Globus Toolkit® Monitoring and Discovery Service (MDS2), the European DataGrid Relational Grid Monitoring Architecture (R-GMA), and the Condor project's Hawkeye. We use the NetLogger toolkit to instrument the main service components of each MIS and conduct four sets of experiments to benchmark their scalability with respect to the number of users, the number of resources, and the amount of data collected. Our study provides quantitative measurements comparable across all systems. We also find performance bottlenecks and identify how they relate to the design goals, underlying architectures, and implementation technologies of the corresponding MIS, and we present guidelines for deploying MISs in practice. 相似文献
20.
Deadlock detection in distributed database systems: a new algorithm and a comparative performance analysis 总被引:4,自引:0,他引:4
Natalija Krivokapić Alfons Kemper Ehud Gudes 《The VLDB Journal The International Journal on Very Large Data Bases》1999,8(2):79-100
This paper attempts a comprehensive study of deadlock detection in distributed database systems. First, the two predominant
deadlock models in these systems and the four different distributed deadlock detection approaches are discussed. Afterwards,
a new deadlock detection algorithm is presented. The algorithm is based on dynamically creating deadlock detection agents (DDAs), each being responsible for detecting deadlocks in one connected component of the global wait-for-graph (WFG). The
DDA scheme is a “self-tuning” system: after an initial warm-up phase, dedicated DDAs will be formed for “centers of locality”,
i.e., parts of the system where many conflicts occur. A dynamic shift in locality of the distributed system will be responded
to by automatically creating new DDAs while the obsolete ones terminate. In this paper, we also compare the most competitive
representative of each class of algorithms suitable for distributed database systems based on a simulation model, and point
out their relative strengths and weaknesses. The extensive experiments we carried out indicate that our newly proposed deadlock
detection algorithm outperforms the other algorithms in the vast majority of configurations and workloads and, in contrast
to all other algorithms, is very robust with respect to differing load and access profiles.
Received December 4, 1997 / Accepted February 2, 1999 相似文献