共查询到20条相似文献,搜索用时 15 毫秒
1.
Haibo Mi Huaimin Wang Yangfan Zhou Michael Rung-Tsong Lyu Hua Cai Gang Yin 《Frontiers of Computer Science》2013,7(3):431-445
The growing scale and complexity of component interactions in cloud computing systems post great challenges for operators to understand the characteristics of system performance. Profiling has long been proved to be an effective approach to performance analysis; however, existing approaches confront new challenges that emerge in cloud computing systems. First, the efficiency of the profiling becomes of critical concern; second, service-oriented profiling should be considered to support separation-of-concerns performance analysis. To address the above issues, in this paper, we present P-Tracer, an online performance profiling tool specifically tailored for cloud computing systems. P-Tracer constructs a specific search engine that proactively processes performance logs and generates a particular index for fast queries; second, for each service, P-Tracer retrieves a statistical insight of performance characteristics from multi-dimensions and provides operators with a suite of web-based interfaces to query the critical information. We evaluate P-Tracer in the aspects of tracing overheads, data preprocessing scalability and querying efficiency. Three real-world case studies that happened in Alibaba cloud computing platform demonstrate that P-Tracer can help operators understand software behaviors and localize the primary causes of performance anomalies effectively and efficiently. 相似文献
2.
Tomorrow's systems will need high-bandwidth and dense communication paths at various levels. Commercial high-performance computers are now beginning to use optical interconnections at the inter-cabinet level. These connections usually consist of optical fiber ribbons, with each fiber carrying signals at 1 to 2 Gbit/s over distances of 200 to 300 m. The aggregate bandwidth is as much as 30 Gbit/s. The authors propose integrating suitable optoelectronic devices with silicon electronics, which will allow designers to use optical communication channels to transfer data on and off chips. The authors describe an optically interconnected architecture for high-speed computation, image processing and robotic vision systems. They conclude that optoelectronic parallel processing systems will overcome some of the interconnection problems facing conventional electronic technology-allowing high-speed computers powerful enough for vision and image processing applications 相似文献
3.
Haoqiang Jin Dennis JespersenPiyush Mehrotra Rupak BiswasLei Huang Barbara Chapman 《Parallel Computing》2011,37(9):562-575
The rapidly increasing number of cores in modern microprocessors is pushing the current high performance computing (HPC) systems into the petascale and exascale era. The hybrid nature of these systems - distributed memory across nodes and shared memory with non-uniform memory access within each node - poses a challenge to application developers. In this paper, we study a hybrid approach to programming such systems - a combination of two traditional programming models, MPI and OpenMP. We present the performance of standard benchmarks from the multi-zone NAS Parallel Benchmarks and two full applications using this approach on several multi-core based systems including an SGI Altix 4700, an IBM p575+ and an SGI Altix ICE 8200EX. We also present new data locality extensions to OpenMP to better match the hierarchical memory structure of multi-core architectures. 相似文献
4.
PSEE (Parallel System Evaluation Environment) is a software tool that provides a multiprocessor system for research into alternative architectural decisions and experimentation, with such issues as selection, design, tuning, scheduling, clustering and routing policies. PSEE facilitates simulation and performance evaluation as well as a prediction environment for the design and tuning of parallel systems. These tasks involve cycles through programming, simulation, measurement, visualization and modification of parallel system parameters. PSEE includes a parallel programming tool, a simulator for link oriented parallel systems, BOLAS, and a performance evaluation tool, GRAPH. These PSEE modules are tools oriented to support the above tasks in user-friendly, interactive and animated graphical form. PSEE provides quantitative information in a graphical tailored form. This numerical/graphical output helps the user make decisions about his/her particular development. 相似文献
5.
MRPC is an RPC system that is designed and optimized for MPMD parallel computing. Existing systems based on standard RPC incur an unnecessarily high cost when used on high‐performance multi‐computers, limiting the appeal of RPC‐based languages in the parallel computing community. MRPC combines the efficient control and data transfer provided by Active Messages (AM) with a minimal multithreaded runtime system that extends AM with the features required to support MPMD. This approach introduces only the necessary RPC overheads for an MPMD environment. MRPC has been integrated into Compositional C++ (CC++), a parallel extension of C++ that offers an MPMD programming model. Basic performance in MRPC is within a factor of two from those of Split‐C, a highly tuned SPMD language, and other messaging layers. CC++ applications perform within a factor of two to six from comparable Split‐C versions, which represent an order of magnitude improvement over previous CC++ implementations. Copyright © 1999 John Wiley & Sons, Ltd. 相似文献
6.
Wei-Jen Wang Yue-Shan Chang Cheng-Hui Wu Wei-Xiang Kang 《The Journal of supercomputing》2012,61(1):67-83
Many scientific disciplines use maximum likelihood evaluation (MLE) as an analytical tool. As the data to be analyzed grows increasingly, MLE demands more parallelism to improve analysis efficiency. Unfortunately, it is difficult for scientists and engineers to develop their own distributed/parallelized MLE applications. In addition, self-adaptability is an important characteristic for computing-intensive application for improving efficiency. This paper presents a self-adaptive and parallelized MLE framework that consists of a master process and a set of worker processes on a distributed environment. The workers are responsible to compute tasks, while the master needs to merge the computing results, to initiate or to terminate another computing iteration, and to decide how to re-distribute the computing tasks to workers. The proposed approach uses neither any monitoring mechanism to collect system state nor load-balancing-decision mechanism to balancing the workload. Instead, it measures the performance of each worker for computing an iteration, and uses the information to adjust the workload of workers accordingly. The experimental results show that not only the proposed framework can adapt to environmental changes, but also the proposed framework is effective; even in a stable environment that is dedicated for one application, the proposed framework still demonstrates its significant improvement in self-adaptability. The self-adaptability will be significantly improved while the workload of computing machines unbalanced. 相似文献
7.
We report on practical experience using the Oxford BSP Library to parallelize a large electromagnetic code, the British Aerospace finite-difference time-domain code EMMA T:FD3D. The Oxford BS Library is one of the first realizations of the Bulk Synchronous Parallel computational model to be targeted at numerically intensive scientific (typically Fortran) computing. The BAe EMMA code is one of the first large-scale applications to be parallelized using this library, and it is an important demonstration of the cost effectiveness of the BSP approach. We illustrate how BSP cost-modelling techniques can be used to predict and optimize performance for single-source programs across different parallel platforms. We provide predicted and observed performance figures for an industrial-strength, single-source parallel code for a variety of real parallel architectures: shared memory multiprocessors, workstation clusters and massively parallel platforms. 相似文献
8.
9.
10.
11.
The Paradyn parallel performance measurement tool 总被引:1,自引:0,他引:1
Miller B.P. Callaghan M.D. Cargille J.M. Hollingsworth J.K. Irvin R.B. Karavanic K.L. Kunchithapadam K. Newhall T. 《Computer》1995,28(11):37-46
Paradyn is a tool for measuring the performance of large-scale parallel programs. Our goal in designing a new performance tool was to provide detailed, flexible performance information without incurring the space (and time) overhead typically associated with trace-based tools. Paradyn achieves this goal by dynamically instrumenting the application and automatically controlling this instrumentation in search of performance problems. Dynamic instrumentation lets us defer insertion until the moment it is needed (and remove it when it is no longer needed); Paradyn's Performance Consultant decides when and where to insert instrumentation 相似文献
12.
13.
Zeyao MO Aiqing ZHANG Xiaolin CAO Qingkai LIU Xiaowen XU Hengbin AN Wenbing PEI Shaoping ZHU 《Frontiers of Computer Science in China》2010,4(4):480-488
The exponential growth of computer power in the last 10 years is now creating a great challenge for parallel programming toward achieving realistic performance in the field of scientific computing. To improve on the traditional program for numerical simulations of laser fusion in inertial confinement fusion (ICF), the Institute of Applied Physics and Computational Mathematics (IAPCM) initializes a software infrastructure named J Adaptive Structured Meshes applications INfrastructure (JASMIN) in 2004. The main objective of JASMIN is to accelerate the development of parallel programs for large scale simulations of complex applications on parallel computers. Now, JASMIN has released version 1.8 and has achieved its original objectives. Tens of parallel programs have been reconstructed or developed on thousands of processors. JASMIN promotes a new paradigm of parallel programming for scientific computing. In this paper, JASMIN is briefly introduced. 相似文献
14.
Huanliang Xiong Guosun Zeng Yuan Zeng Wei Wang Canghai Wu 《The Journal of supercomputing》2014,68(2):652-671
Scalability is an important performance metric of parallel computing, but the traditional scalability metrics only try to reflect the scalability for parallel computing from one side, which makes it difficult to fully measure its overall performance. This paper studies scalability metrics intensively and completely. From lots of performance parameters of parallel computing, a group of key ones is chosen and normalized. Further the area of Kiviat graph is used to characterize the overall performance of parallel computing. Thereby a novel scalability metric about iso-area of performance for parallel computing is proposed and the relationship between the new metric and the traditional ones is analyzed. Finally the novel metric is applied to address the scalability of the matrix multiplication Cannon’s algorithm under LogP model. The proposed metric is significant to improve parallel computing architecture and to tune parallel algorithm design. 相似文献
15.
We present the free open source plugin execution framework ViennaX for modularizing and parallelizing scientific simulations. In general, functionality is abstracted by the notion of a task, which is implemented as a plugin. The plugin system facilitates the utilization of both, already available functionality as well as new implementations. Each task can define arbitrary data dependencies which are used by ViennaX to build a task graph. The framework supports the execution of this dependence graph based on the message passing interface in either a serial or a parallel fashion. The applied modular approach allows for defining highly flexible simulations, as plugins can be easily exchanged. The framework’s general design as well as implementation details are discussed. Applications based on the Mandelbrot set and the solution of a partial differential equation are investigated, and performance results are shown. 相似文献
16.
A parallel bi-objective hybrid metaheuristic for energy-aware scheduling for cloud computing systems 总被引:3,自引:0,他引:3
M. Mezmaz N. Melab Y. Kessaci Y.C. Lee E.-G. Talbi A.Y. Zomaya D. TuyttensAuthor vitae 《Journal of Parallel and Distributed Computing》2011,71(11):1497-1508
In this paper, we investigate the problem of scheduling precedence-constrained parallel applications on heterogeneous computing systems (HCSs) like cloud computing infrastructures. This kind of application was studied and used in many research works. Most of these works propose algorithms to minimize the completion time (makespan) without paying much attention to energy consumption.We propose a new parallel bi-objective hybrid genetic algorithm that takes into account, not only makespan, but also energy consumption. We particularly focus on the island parallel model and the multi-start parallel model. Our new method is based on dynamic voltage scaling (DVS) to minimize energy consumption.In terms of energy consumption, the obtained results show that our approach outperforms previous scheduling methods by a significant margin. In terms of completion time, the obtained schedules are also shorter than those of other algorithms. Furthermore, our study demonstrates the potential of DVS. 相似文献
17.
Cámara Jesús Cano José-Carlos Cuenca Javier Saura-Sánchez Mariano 《The Journal of supercomputing》2022,78(15):17231-17246
The Journal of Supercomputing - PARCSIM is a parallel software simulator that allows a user to capture, through a graphical interface, matrix algorithm schemes that solve scientific problems. With... 相似文献
18.
The performance achieved by a parallel architecture over a complete application is determined by the combination of the hardware and software modules. When we talk about hardware we mean node processing power and network parameters, while software entails all from the optimization capabilities of the compiler to the high level programming model. They interact in a non-simple way delivering variable results for different problem sizes and making the task of predicting performance a very difficult one. Performance is predictable once, given an algorithm, you can parameterize it in terms of floating-point operations needed, bandwidth and latency requirements, granularity of the problem itself and few parameters, obviously machine dependent. We attack the issue of predicting performance for a large class of regular synchronous problems on rectangular grids (only 2D in this paper). The aim of the paper is to determine, by means of dedicated small benchmarking kernels, all the machine dependent parameters. These will be used to predict and compare, over a very wide range of data set sizes, the performances of the Connection Machine CM-5, the Cray T3D and the IBM SP2 for a simple but complete application like the Conjugate Gradient solution for the Poisson equation. We show that the parameterization can be done quite accurately for all of the studied platforms, thus predicting, from measurements performed on extremely simple kernels and some algorithmic understanding, the behavior of an MPP over a very wide range of parameters. We argue in favor of adopting this methodology to produce meaningful benchmarks of MPP platforms. © 1998 John Wiley & Sons, Ltd. 相似文献
19.
Adrian K. Clear Thomas Holland Simon Dobson Aaron Quigley Ross Shannon Paddy Nixon 《Pervasive and Mobile Computing》2010,6(5):575-589
Pervasive systems are large-scale systems consisting of many sensors capturing numerous types of information. As this data is highly voluminous and dimensional, data analysis tasks can be extremely cumbersome and time-consuming. Enabling computers to recognise real-world situations is an even more difficult problem, involving not only data analysis, but also consistency checking. Here we present Situvis, an interactive visualisation tool for representing sensor data and creating higher-level abstractions from the data. This paper builds on previous work, Clear et al. (2009) [8] through evolved tool functionality and an evaluation of Situvis. A user-trial consisting of 10 participants shows that Situvis can be used to complete the key tasks in the development process of situation specifications in over 50% less time than an improvised alternative toolset. 相似文献
20.
Zomaya A.Y. Ward C. Macey B. 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(8):795-812
Task scheduling is essential for the proper functioning of parallel processor systems. Scheduling of tasks onto networks of parallel processors is an interesting problem that is well-defined and documented in the literature. However, most of the available techniques are based on heuristics that solve certain instances of the scheduling problem very efficiently and in reasonable amounts of time. This paper investigates an alternative paradigm, based on genetic algorithms, to efficiently solve the scheduling problem without the need to apply any restricted assumptions that are problem-specific, such is the case when using heuristics. Genetic algorithms are powerful search techniques based on the principles of evolution and natural selection. The performance of the genetic approach will be compared to the well-known list scheduling heuristics. The conditions under which a genetic algorithm performs best will also be highlighted. This will be accompanied by a number of examples and case studies 相似文献