首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 171 毫秒
1.
Large-scale systems increasingly exhibit a differential between intra-chip and inter-chip communication performance especially in hybrid systems using accelerators. Processor-cores on the same socket are able to communicate at lower latencies, and with higher bandwidths, than cores on different sockets either within the same node or between nodes. A key challenge is to efficiently use this communication hierarchy and hence optimize performance. We consider here the class of applications that contains wave-front processing. In these applications data can only be processed after their upstream neighbors have been processed. Similar dependencies result between processors in which communication is required to pass boundary data downstream and whose cost is typically impacted by the slowest communication channel in use. In this work we develop a novel hierarchical wave-front approach that reduces the use of slower communications in the hierarchy but at the cost of additional steps in the parallel computation and higher use of on-chip communications. This tradeoff is explored using a performance model. An implementation using the reverse-acceleration programming model on the petascale Roadrunner system demonstrates a 27% performance improvement at full system-scale on a kernel application. The approach is generally applicable to large-scale multi-core and accelerated systems where a differential in communication performance exists.  相似文献   

2.
Being a popular runtime infrastructure in the era of Internet, middleware provides more and more services to support the development, deployment and management of distributed systems. At the same time, the reliability of middleware services has a significant impact on the overall reliability of the system. Different services have different impacts and different service fault-tolerance solutions have different costs and risks. Therefore, the identification of the services that greatly affect the whole system reliability is the major obstacle to achieving reliable middleware-based systems. In this paper, we present an analytical framework to automatically reason and quantify such impacts when deploying the target system. In this framework, faults are represented by exceptions in modern programming languages; service failures are simulated by software fault injection; reliability impacts are measured by scenarios. This framework is demonstrated on multiple JEE application servers, including JBoss, JonAS and PKUAS. The experiments on two JEE blueprint applications, namely JPS and ECperf, show the feasibility, the applicability and the usability of this framework.  相似文献   

3.
Automatic performance debugging of parallel applications includes two main steps: locating performance bottlenecks and uncovering their root causes for performance optimization. Previous work fails to resolve this challenging issue in two ways: first, several previous efforts automate locating bottlenecks, but present results in a confined way that only identifies performance problems with a priori knowledge; second, several tools take exploratory or confirmatory data analysis to automatically discover relevant performance data relationships, but these efforts do not focus on locating performance bottlenecks or uncovering their root causes.The simple program and multiple data (SPMD) programming model is widely used for both high performance computing and Cloud computing. In this paper, we design and implement an innovative system, AutoAnalyzer, that automates the process of debugging performance problems of SPMD-style parallel programs, including data collection, performance behavior analysis, locating bottlenecks, and uncovering their root causes. AutoAnalyzer is unique in terms of two features: first, without any prior knowledge, it automatically locates bottlenecks and uncovers their root causes for performance optimization; second, it is lightweight in terms of the size of performance data to be collected and analyzed. Our contributions are three-fold: first, we propose two effective clustering algorithms to investigate the existence of performance bottlenecks that cause process behavior dissimilarity or code region behavior disparity, respectively; meanwhile, we present two searching algorithms to locate bottlenecks; second, on the basis of the rough set theory, we propose an innovative approach to automatically uncover root causes of bottlenecks; third, on the cluster systems with two different configurations, we use two production applications, written in Fortran 77, and one open source code—MPIBZIP2 (http://compression.ca/mpibzip2/), written in C++, to verify the effectiveness and correctness of our methods. For three applications, we also propose an experimental approach to investigating the effects of different metrics on locating bottlenecks.  相似文献   

4.
Heating, Ventilation and Air-Conditioning (HVAC) systems account for more than 15% of the total energy consumption in the US. In order to improve the energy efficiency of HVAC systems, researchers have developed hundreds of algorithms to automatically analyze their performance. However, the complex information, such as configurations of HVAC systems, layouts and materials of building elements and dynamic data from the control systems, required by these algorithms inhibits the process of deploying them in real-world facilities. To address this challenge, we envision a framework that automatically integrates the required information items and provides them to the performance analysis algorithms for HVAC systems. This paper presents an approach to identify and document the information requirements from the publications that describe these algorithms. We extend the Information Delivery Manual (IDM) approach so that the identified information requirements can be mapped to multiple information sources that use various formats and schemas. This paper presents the extensions to the IDM approach and the results of using it to identify information requirements for performance analysis algorithms of HVAC systems.  相似文献   

5.
Fault Management in Distributed Systems: A Policy-Driven Approach   总被引:1,自引:0,他引:1  
Managing the availability and performance of a distributed system involves monitoring the behavior of the system, identifying system problems, and correcting those problems. Each of these tasks requires some expertise, such as an understanding of the mechanics of the underlying system components. As the size and complexity of these systems increases, and the number of distributed applications executing on these systems increases, managing the availability and performance of distributed systems becomes more difficult. Little research has focused on embedding systems management expertise into a management application for a distributed system. In this paper we describe a rule-based management application for a commercially available distributed computing environment that is capable of monitoring the distributed system, detecting system service-related performance and availability problems, and generating corrective actions to correct the problems.  相似文献   

6.
This paper describes an approach to carry out performance analysis of parallel embedded applications. The approach is based on measurement, but in addition, the idea of driving the measurement process (application instrumentation and monitoring) by a behavioral model is introduced. Using this model, highly comprehensible performance information can be collected. The whole approach is based on this behavioral model, one instrumentation method and two tools, one for monitoring and the other for visualization and analysis. Each of these is briefly described, and the steps to carry out performance analysis using them are clearly defined. They are explained by means of a case study. Finally, one method to evaluate the intrusiveness of the monitoring approach is proposed, and the intrusiveness results for the case study are presented.  相似文献   

7.
With Moore’s law supplying billions of transistors on-chip, embedded systems are undergoing a transition from single-core to multi-core to exploit this high transistor density for high performance. However, the optimal layout of these multiple cores along with the memory subsystem (caches and main memory) to satisfy power, area, and stringent real-time constraints is a challenging design endeavor. The short time-to-market constraint of embedded systems exacerbates this design challenge and necessitates the architectural modeling of embedded systems to reduce the time-to-market by expediting target applications to device/architecture mapping. In this paper, we present a queueing theoretic approach for modeling multi-core embedded systems that provides a quick and inexpensive performance evaluation both in terms of time and resources as compared to the development of multi-core simulators and running benchmarks on these simulators. We verify our queueing theoretic modeling approach by running SPLASH-2 benchmarks on the SuperESCalar simulator (SESC). Results reveal that our queueing theoretic model qualitatively evaluates multi-core architectures accurately with an average difference of 5.6% as compared to the architectures’ evaluations from the SESC simulator. Our modeling approach can be used for performance per watt and performance per unit area characterizations of multi-core embedded architectures, with varying number of processor cores and cache configurations, to provide a comparative analysis.  相似文献   

8.
Parallel/distributed systems are continuously growing. This allows and enables the scalability of the applications, either by considering bigger problems in the same period of time or by solving the problem in a shorter time. In consequence, the methodologies, approaches and tools related to parallel paradigm should be brought up to date to support the increasing requirements of the applications and the users. MATE (Monitoring, Analysis and Tuning Environment) provides automatic and dynamic tuning for parallel/distributed applications. The tuning decisions are made according to performance models, which provide a fast means to decide what to improve in the execution. However, MATE presents some bottlenecks as the application grows, due to the fact that the analysis process is made in a full centralized manner. In this work, we propose a new approach to make MATE scalable. In addition, we present the experimental results and the analysis to validate the proposed approach against the original one.  相似文献   

9.
In this paper, we demonstrate the impact of using a dynamic (on-the-fly) performance prediction tool-set, PACE, for optimising application execution on dynamic systems. The need for steering the application execution arises from the ever-growing use of distributed and GRID systems. The unquestionable aim to overcome bottleneck problems, allocation, and performance degradation due to shared CPU time has prompted many investigations into the best way in which the performance of an application can be enhanced. In this work, we present a novel approach to dynamically optimise the performance of an application.

An example application, the FFTW (the fastest Fourier transform in the west), is used to illustrate the approach which itself is a novel method that optimises the execution of an FFT. It is shown that performance prediction can provide the same quality of information as a measurement process for application optimisation but in a fraction of the time and thus improving the overall application performance.  相似文献   


10.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号