期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

HPC on the Grid: The Theophys Experience

Roberto Alfieri Silvia Arezzini Alberto Ciampa Roberto De Pietri Enrico Mazzoni 《Journal of Grid Computing》2013,11(2):265-280

The Grid Virtual Organization (VO) “Theophys”, associated to the INFN (Istituto Nazionale di Fisica Nucleare), is a theoretical physics community with various computational demands, spreading from serial, SMP, MPI and hybrid jobs. That has led, in the past 20 years, towards the use of the Grid infrastructure for serial jobs, while the execution of multi-threaded, MPI and hybrid jobs has been performed in several small-medium size clusters installed in different sites, with access through standard local submission methods. This work analyzes the support for parallel jobs in the scientific Grid middlewares, then describes how the community unified the management of most of its computational need (serial and parallel ones) using the Grid through the development of a specific project which integrates serial e parallel resources in a common Grid based framework. A centralized national cluster is deployed inside this framework, providing “Wholenodes” reservations, CPU affinity, and other new features supporting our High Performance Computing (HPC) applications in the Grid environment. Examples of the cluster performance for relevant parallel applications in theoretical physics are reported, focusing on the different kinds of parallel jobs that can be served by the new features introduced in the Grid. 相似文献

2.

P-GRADE: A Grid Programming Environment

P. Kacsuk G. Dózsa J. Kovács R. Lovas N. Podhorszki Z. Balaton G. Gombás 《Journal of Grid Computing》2003,1(2):171-197

P-GRADE provides a high-level graphical environment to develop parallel applications transparently both for parallel systems and the Grid. P-GRADE supports the interactive execution of parallel programs as well as the creation of a Condor, Condor-G or Globus job to execute parallel programs in the Grid. In P-GRADE, the user can generate either PVM or MPI code according to the underlying Grid where the parallel application should be executed. PVM applications generated by P-GRADE can migrate between different Grid sites and as a result P-GRADE guarantees reliable, fault-tolerant parallel program execution in the Grid. The GRM/PROVE performance monitoring and visualisation toolset has been extended towards the Grid and connected to a general Grid monitor (Mercury) developed in the EU GridLab project. Using the Mercury/GRM/PROVE Grid application monitoring infrastructure any parallel application launched by P-GRADE can be remotely monitored and analysed at run time even if the application migrates among Grid sites. P-GRADE supports workflow definition and co-ordinated multi-job execution for the Grid. Such workflow management can provide parallel execution at both inter-job and intra-job level. Automatic checkpoint mechanism for parallel programs supports the migration of parallel jobs inside the workflow providing a fault-tolerant workflow execution mechanism. The paper describes all of these features of P-GRADE and their implementation concepts. 相似文献

3.

LogGPO: An accurate communication model for performance prediction of MPI programs

WenGuang Chen JiDong Zhai Jin Zhang WeiMin Zheng 《中国科学F辑(英文版)》2009,52(10):1785-1791

Message passing interface (MPI) is the de facto standard in writing parallel scientific applications on distributed memory systems. Performance prediction of MPI programs on current or future parallel systems can help to find system bottleneck or optimize programs. To effectively analyze and predict performance of a large and complex MPI program, an efficient and accurate communication model is highly needed. A series of communication models have been proposed, such as the LogP model family, which assume th... 相似文献

4.

Application-oriented ping-pong benchmarking: how to assess the real communication overheads

Timo Schneider Robert Gerstenberger Torsten Hoefler 《Computing》2014,96(4):279-292

Moving data between processes has often been discussed as one of the major bottlenecks in parallel computing—there is a large body of research, striving to improve communication latency and bandwidth on different networks, measured with ping-pong benchmarks of different message sizes. In practice, the data to be communicated generally originates from application data structures and needs to be serialized before communicating it over serial network channels. This serialization is often done by explicitly copying the data to communication buffers. The message passing interface (MPI) standard defines derived datatypes to allow zero-copy formulations of non-contiguous data access patterns. However, many applications still choose to implement manual pack/unpack loops, partly because they are more efficient than some MPI implementations. MPI implementers on the other hand do not have good benchmarks that represent important application access patterns. We demonstrate that the data serialization can consume up to 80 % of the total communication overhead for important applications. This indicates that most of the current research on optimizing serial network transfer times may be targeted at the smaller fraction of the communication overhead. To support the scientific community, we extracted the send/recv-buffer access patterns of a representative set of scientific applications to build a benchmark that includes serialization and communication of application data and thus reflects all communication overheads. This can be used like traditional ping-pong benchmarks to determine the holistic communication latency and bandwidth as observed by an application. It supports serialization loops in C and Fortran as well as MPI datatypes for representative application access patterns. Our benchmark, consisting of seven micro-applications, unveils significant performance discrepancies between the MPI datatype implementations of state of the art MPI implementations. Our micro-applications aim to provide a standard benchmark for MPI datatype implementations to guide optimizations similarly to the established benchmarks SPEC CPU and Livermore Loops. 相似文献

5.

A comparative study of Java and C performance in two large‐scale parallel applications

Aamir Shafi Bryan Carpenter Mark Baker Aftab Hussain 《Concurrency and Computation》2009,21(15):1882-1906

In the 1990s the Message Passing Interface Forum defined MPI bindings for Fortran, C, and C++. With the success of MPI these relatively conservative languages have continued to dominate in the parallel computing community. There are compelling arguments in favour of more modern languages like Java. These include portability, better runtime error checking, modularity, and multi‐threading. But these arguments have not converted many HPC programmers, perhaps due to the scarcity of full‐scale scientific Java codes, and the lack of evidence for performance competitive with C or Fortran. This paper tries to redress this situation by porting two scientific applications to Java. Both of these applications are parallelized using our thread‐safe Java messaging system—MPJ Express. The first application is the Gadget‐2 code, which is a massively parallel structure formation code for cosmological simulations. The second application uses the finite‐domain time‐difference method for simulations in the area of computational electromagnetics. We evaluate and compare the performance of the Java and C versions of these two scientific applications, and demonstrate that the Java codes can achieve performance comparable with legacy applications written in conventional HPC languages. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

6.

Performance Modeling and Evaluation of MPI

《Journal of Parallel and Distributed Computing》2001,61(2):202-223

Users of parallel machines need to have a good grasp for how different communication patterns and styles affect the performance of message-passing applications. LogGP is a simple performance model that reflects the most important parameters required to estimate the communication performance of parallel computers. The message passing interface (MPI) standard provides new opportunities for developing high performance parallel and distributed applications. In this paper, we use LogGP as a conceptual framework for evaluating the performance of MPI communications on three platforms: Cray-Research T3D, Convex Exemplar 1600SP, and a network of workstations (NOW). We develop a simple set of communication benchmarks to extract the LogGP parameters. Our objective in this is to compare the performance of MPI communication on several platforms and to identify a performance model suitable for MPI performance characterization. In particular, two problems are addressed: how LogGP quantifies MPI performance and what extra features are required for modeling MPI, and how MPI performance compare on the three computing platforms: Cray Research T3D, Convex Exemplar 1600SP, and workstations clusters. 相似文献

7.

A Speculative and Adaptive MPI Rendezvous Protocol Over RDMA-enabled Interconnects

Mohammad J. Rashti Ahmad Afsahi 《International journal of parallel programming》2009,37(2):223-246

Overlapping computation with communication is a key technique to conceal the effect of communication latency on the performance of parallel applications. Message Passing Interface (MPI) is a widely used message passing standard for high performance computing. One of the most important factors in achieving a good level of overlap is the MPI ability to make progress on outstanding communication operations. In this paper, we propose a novel speculative MPI Rendezvous protocol that uses RDMA Read and RDMA Write to effectively improve communication progress and consequently the overlap ability. Performance results based on a modified MPICH2 implementation over 10-Gigabit iWARP Ethernet reveal a significant (80–100%) improvement in receiver side overlap and progress ability. We have also observed up to 30% improvement in application wait time for some NPB applications as well as the RADIX application. For applications that do not benefit from this protocol, an adaptation mechanism is used to stop the speculation to effectively reduce the protocol overhead. 相似文献

8.

G‐BLAST: a Grid‐based solution for mpiBLAST on computational Grids

Chao‐Tung Yang Tsu‐Fen Han Heng‐Chuan Kan 《Concurrency and Computation》2009,21(2):225-255

Over the past few years, research and development in bioinformatics (e.g. genomic sequence alignment) has grown with each passing day fueling continuing demands for vast computing power to support better performance. This trend usually requires solutions involving parallel computing techniques because cluster computing technology reduces execution times and increases genomic sequence alignment efficiency. One example, mpiBLAST is a parallel version of NCBI BLAST that combines NCBI BLAST with message passing interface (MPI) standards. However, as most laboratories cannot build up powerful cluster computing environments, Grid computing framework concepts have been designed to meet the need. Grid computing environments coordinate the resources of distributed virtual organizations and satisfy the various computational demands of bioinformatics applications. In this paper, we report on designing and implementing a BioGrid framework, called G‐BLAST, that performs genomic sequence alignments using Grid computing environments and accessible mpiBLAST applications. G‐BLAST is also suitable for cluster computing environments with a server node and several client nodes. G‐BLAST is able to select the most appropriate work nodes, dynamically fragment genomic databases, and self‐adjust according to performance data. To enhance G‐BLAST capability and usability, we also employ a WSRF Grid Service Portal and a Grid Service GUI desk application for general users to submit jobs and host administrators to maintain work nodes. Copyright © 2008 John Wiley & Sons, Ltd. 相似文献

9.

End-to-End QoS Support for a Medical Grid Service Infrastructure

Siegfried Benkner Gerhard Engelbrecht Stuart E. Middleton Ivona Brandic Rainer Schmidt 《New Generation Computing》2007,25(4):355-372

Quality of Service support is an important prerequisite for the adoption of Grid technologies for medical applications. The GEMSS Grid infrastructure addressed this issue by offering end-to-end QoS in the form of explicit timeliness guarantees for compute-intensive medical simulation services. Within GEMSS, parallel applications installed on clusters or other HPC hardware may be exposed as QoS-aware Grid services for which clients may dynamically negotiate QoS constraints with respect to response time and price using Service Level Agreements. The GEMSS infrastructure and middleware is based on standard Web services technology and relies on a reservation based approach to QoS coupled with application specific performance models. In this paper we present an overview of the GEMSS infrastructure, describe the available QoS and security mechanisms, and demonstrate the effectiveness of our methods with a Grid-enabled medical imaging service. 相似文献

10.

GridBLAST: a Globus‐based high‐throughput implementation of BLAST in a Grid computing framework

Arun Krishnan 《Concurrency and Computation》2005,17(13):1607-1623

Improvements in the performance of processors and networks have made it feasible to treat collections of workstations, servers, clusters and supercomputers as integrated computing resources or Grids. However, the very heterogeneity that is the strength of computational and data Grids can also make application development for such an environment extremely difficult. Application development in a Grid computing environment faces significant challenges in the form of problem granularity, latency and bandwidth issues as well as job scheduling. Currently existing Grid technologies limit the development of Grid applications to certain classes, namely, embarrassingly parallel, hierarchical parallelism, work flow and database applications. Of all these classes, embarrassingly parallel applications are the easiest to develop in a Grid computing framework. The work presented here deals with creating a Grid‐enabled, high‐throughput, standalone version of a bioinformatics application, BLAST, using Globus as the Grid middleware. BLAST is a sequence alignment and search technique that is embarrassingly parallel in nature and thus amenable to adaptation to a Grid environment. A detailed methodology for creating the Grid‐enabled application is presented, which can be used as a template for the development of similar applications. The application has been tested on a ‘mini‐Grid’ testbed and the results presented here show that for large problem sizes, a distributed, Grid‐enabled version can help in significantly reducing execution times. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献

11.

Programming environments for high-performance Grid computing: the Albatross project

Thilo Henri E. Jason Rob Lionel Rutger Kees 《Future Generation Computer Systems》2002,18(8)

The aim of the Albatross project is to study applications and programming environments for computational Grids. We focus on high-performance applications, running in parallel on multiple clusters or MPPs that are connected by wide-area networks (WANs). We briefly present three Grid programming environments developed in the context of the Albatross project: the MagPIe library for collective communication with MPI, the replicated method invocation (RepMI) mechanism for Java, and the Java-based Satin system for running divide-and-conquer programs on Grid platforms.A major challenge in investigating the performance of such applications is the actual WAN behavior. Typical wide-area links are just part of the Internet and thus shared among many applications, making runtime measurements irreproducible and thus scientifically hardly valuable. To overcome this problem, we developed a WAN emulator as part of Panda, our general-purpose communication substrate. The WAN emulator allows us to run parallel applications on a single (large) parallel machine with only the wide-area links being emulated. The Panda emulator is highly accurate and configurable at runtime. We present a case study in which Satin runs across various emulated WAN scenarios. 相似文献

12.

A Grid-based Virtual Reactor: Parallel performance and adaptive load balancing

Vladimir V. Korkhov Valeria V. Krzhizhanovskaya P.M.A. Sloot 《Journal of Parallel and Distributed Computing》2008

We address the problem of porting parallel distributed applications from static homogeneous cluster environments to dynamic heterogeneous Grid resources. We introduce a generic technique for adaptive load balancing of parallel applications on heterogeneous resources and evaluate it using a case study application: a Virtual Reactor for simulation of plasma chemical vapour deposition. This application has a modular architecture with a number of loosely coupled components suitable for distribution over the Grid. It requires large parameter space exploration that allows using Grid resources for high-throughput computing. The Virtual Reactor contains a number of parallel solvers originally designed for homogeneous computer clusters that needed adaptation to the heterogeneity of the Grid. In this paper we study the performance of one of the parallel solvers, apply the technique developed for adaptive load balancing, evaluate the efficiency of this approach and outline an automated procedure for optimal utilization of heterogeneous Grid resources for high-performance parallel computing. 相似文献

13.

PC机群上共享存储与消息传递的比较 总被引：7，自引：0，他引：7

下载免费PDF全文

章隆兵吴少刚蔡飞胡伟武《软件学报》2004,15(6):842-849

共享存储和消息传递是目前两种主流的并行编程模型.一般认为,消息传递的可编程性不及共享存储友好.OpenMP是目前共享存储编程的实际工业标准.机群OpenMP系统在机群上提供了OpenMP编程环境,具有易编程和可扩展的特点,但是其性能如何一直是关注的热点.以机群OpenMP系统OpenMP/JIAJIA和典型的消息传递系相似文献

14.

Model‐based MPI‐IO tuning with Periscope tuning framework

Weifeng Liu Michael Gerndt Bin Gong 《Concurrency and Computation》2016,28(1):3-20

For many parallel applications, I/O performance is a major bottleneck. MPI‐IO, defined by the MPI forum, can help parallel applications overcome the performance and portability limitations of existing parallel I/O interfaces. Although autotuning has been used to improve the performance of computing kernels, MPI‐IO autotuning has rarely been studied. To automate MPI‐IO performance tuning, we designed and implemented an automatic tuner. The tuner relies on the Periscope tuning framework for transparently passing hints to the MPI‐IO library and for automatically collecting performance data. Unlike computational code, each MPI‐IO function takes a relatively long time to complete. Thus, exhaustively searching through the entire parameter space is impractical. So we developed a performance model that can direct us to shorten the tuning time. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

15.

Algorithms and the Grid

Geoffrey C. Fox Mehmet S. Aktas Galip Aydin Harshawardhan Gadgil Shrideep Pallickara Marlon E. Pierce Ahmet Sayar 《Computing and Visualization in Science》2009,12(3):115-124

We review the impact of Grid Computing and Web Services on scientific computing, stressing the importance of the “data-deluge” that is driven by deployment of new instruments, sensors and satellites. This implies the need to integrate the naturally distributed data sources with large simulation engines offering parallel low latency communication and so to integrate parallel and Grid computing paradigms. We start with an overview of these and the evolving service architectures. We illustrate the identified areas of interest for Algorithms and the Grid with the specific example of SERVOGrid that supports earthquake science research. We comment on the appropriate messaging infrastructure for Grids and data assimilation and contrast it with MPI. 相似文献

16.

CORBA and MPI code coupling

S. P. Kopysov I. V. Krasnopyorov V. N. Rychkov 《Programming and Computer Software》2006,32(5):276-283

Coupling of application programs designed for multiprocessor computing systems requires simultaneous use of several paradigms implemented as communication middleware. In this paper, we propose a method of integration of MPI, which is widely used in scientific parallel computations, and CORBA, which is designed for the development of object-oriented applications. This makes it possible to assemble integrated software systems for interdisciplinary computations on heterogeneous multiprocessor systems with the reuse of available application software. An example of inclusion of an MPI linear algebra package into a CORBA-based distributed object-oriented model for solving systems of equations is presented. 相似文献

17.

The design and implementation of Visuel performance monitoring and analysis toolkit for cluster and grid environments

Kuan-Ching Li Hsun-Chang Chang 《The Journal of supercomputing》2007,40(3):299-317

The computing power provided by high performance and low-cost PC-based clusters with Grid platforms are attractive and they are equal or superior to supercomputers and mainframes. In this paper, we present implementation and design rationale of Visuel toolkit for MPI parallel program performance measurement and analysis in cluster and grid environments. Most of performance visualization tools available today for high-performance platforms show solely system performance data (e.g., CPU load, memory usage, network bandwidth, server average load), and thus, being suitable for computing system activity visualization. The Visuel (Visuel (in French language) = to visualize) toolkit is web-based interface designed to show performance activities of all computing nodes of a distributed environment involved in the execution of MPI parallel program, such as CPU load level and memory usage of each computing node. In addition, this toolkit is able to display comparative performance data charts of MPI parallel applications and multiple executions under investigation. The usage of this toolkit shows that it outperforms in easing the process of investigation of parallel applications.

Hsun-Chang ChangEmail:

相似文献

18.

基于CELL宽带引擎架构的MPI研究与实现* 总被引：1，自引：0，他引：1

徐祯孙济洲于策亓大志张旭明《计算机应用研究》2010,27(7):2526-2529

研究了在CBEA上移植MPI消息传递编程模型和标准接口的可行性,并利用IBM CELL SDK 3.0实现了一组常用的MPI编程接口。实验结果表明,该组MPI接口可满足CBEA上应用开发的数据传输性能要求,并且其性能已接近现有DMA数据传输模式。该组MPI接口为CELL应用开发人员提供了一种通用编程接口解决方案。相似文献

19.

Efficient scheduling of MPI applications on networks of workstations

M.A.R. Dantas E.J. Zaluska 《Future Generation Computer Systems》1998,13(6):489-499

The availability of a large number of workstations connected through a network can represent an attractive option for high-performance computing for many applications. The message-passing interface (MPI) software environment is an effort from many organisations to define a de facto message-passing standard. In other words, the original specification was not designed as a comprehensive parallel programming environment and some researchers agree that the standard should be preserved as simple and clean as possible. Nevertheless, a software environment such as MPI should have somehow a scheduling mechanism for the effective submission of parallel applications on network of workstations. This paper presents an alternative lightweight approach called Selective-MPI (S-MPI), which was designed to enhance the efficiency of the scheduling of applications on an MPI implementation environment. 相似文献

20.

Raising the level of abstraction for developing message passing applications

Arora Ritu Bangalore Purushotham Mernik Marjan 《The Journal of supercomputing》2012,59(2):1079-1100

Message Passing Interface (MPI) is the most popular standard for writing portable and scalable parallel applications for distributed memory architectures. Writing efficient parallel applications using MPI is a complex task, mainly due to the extra burden on programmers to explicitly handle all the complexities of message-passing (viz., inter-process communication, data distribution, load-balancing, and synchronization). The main goal of our research is to raise the level of abstraction of explicit parallelization using MPI such that the effort involved in developing parallel applications is significantly reduced in terms of the reduction in the amount of code written manually while avoiding intrusive changes to existing sequential programs. In this research, generative programming tools and techniques are combined with a domain-specific language, Hi-PaL (High-Level Parallelization Language), for automating the process of generating and inserting the required code for parallelization into the existing sequential applications. The results show that the performance of the generated applications is comparable to the manually written versions of the applications, while requiring no explicit changes to the existing sequential code. 相似文献