期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using high performance Fortran for parallel programming

G. Sarma T. Zacharia D. Miles 《Computers & Mathematics with Applications》1998,35(12):41-57

A finite element code with a polycrystal plasticity model for simulating deformation processing of metals has been developed for parallel computers using High Performance Fortran (HPF). The conversion of the code from an original implementation on the Connection Machine systems using CM Fortran is described. The sections of the code requiring minimal inter-processor communication are easily parallelized, by changing only the syntax for specifying data layout. However, the solver routine based on the conjugate gradient method required additional modifications, which are discussed in detail. The performance of the code on a massively parallel distributed-memory Intel PARAGON supercomputer is evaluated through timing statistics. Published by Elsevier Science Ltd. 相似文献

2.

PartialRC: A Partial Recomputing Method for Efficient Fault Recovery on GPGPUs

下载免费PDF全文

徐新海杨学军薛京灵林宇斐林一松《计算机科学技术学报》2012,27(2):240-255

GPGPUs are increasingly being used to as performance accelerators for HPC (High Performance Computing) applications in CPU/GPU heterogeneous computing systems, including TianHe-1A, the world’s fastest supercomputer in the TOP500 list, built at NUDT (National University of Defense Technology) last year. However, despite their performance advantages, GPGPUs do not provide built-in fault-tolerant mechanisms to offer reliability guarantees required by many HPC applications. By analyzing the SIMT (single-instruction, multiple-thread) characteristics of programs running on GPGPUs, we have developed PartialRC, a new checkpoint-based compiler-directed partial recomputing method, for achieving efficient fault recovery by leveraging the phenomenal computing power of GPGPUs. In this paper, we introduce our PartialRC method that recovers from errors detected in a code region by partially re-computing the region, describe a checkpoint-based faulttolerance framework developed on PartialRC, and discuss an implementation on the CUDA platform. Validation using a range of representative CUDA programs on NVIDIA GPGPUs against FullRC (a traditional full-recomputing Checkpoint-Rollback-Restart fault recovery method for CPUs) shows that PartialRC reduces significantly the fault recovery overheads incurred by FullRC, by 73.5% when errors occur earlier during execution and 74.6% when errors occur later on average. In addition, PartialRC also reduces error detection overheads incurred by FullRC during fault recovery while incurring negligible performance overheads when no fault happens. 相似文献

3.

基于BLACS的2.5D并行矩阵乘法

廖霞李胜国卢宇彤杨灿群《计算机学报》2021,44(5):1037-1050

并行矩阵乘法是线性代数中最重要的基本运算之一,同时也是许多科学应用的基石.随着高性能计算(HPC)向E级计算发展,并行矩阵乘法的通信开销所占比重越来越大.如何降低并行矩阵乘法的通信开销,提高并行矩阵乘的可扩展性是当前研究的热点之一.本文提出一种新型的分布式并行稠密矩阵乘算法,即2.5D版本的PUMMA(Parallel... 相似文献

4.

Optimizing INTBIS on the CRAY Y-MP

Chenyi Hu Joe Sheldon R. Baker Kearfott Qing Yang 《Reliable Computing》1995,1(3):265-274

INTBIS is a well-tested software package which uses an interval Newton/generalized bisection method to find all numerical solutions to nonlinear systems of equations. Since INTBIS uses interval computations, its results are guaranteed to contain all solutions. To efficiently solve very large nonlinear systems on a parallel vector computer, it is necessary to effectively utilize the architectural features of the machine In this paper, we report our implementations of INTBIS for large nonlinear systems on the Cray Y-MP supercomputer. We first present the direct implementation of INTBIS on a Cray. Then, we report our work on optimizing INTBIS on the Cray Y-MP 相似文献

5.

Acceleration of acoustic emission signal processing algorithms using CUDA standard

Lubomir Riha Radislav Smid 《Computer Standards & Interfaces》2011,33(4):389-400

Offline processing of acoustic emission (AE) signal waveforms recorded during a long-term AE monitoring session is a challenging problem in AE testing area. This is due to the fact that today's AE systems can work with up to hundreds of channels and are able to process tens of thousands of AE events per second. The amount of data recorded during the session is very high.This paper proposes a way to accelerate signal processing methods for acoustic emission and to accelerate similarity calculation using the Graphic Processing Unit (GPU). GPU-based accelerators are an affordable High Performance Computing (HPC) solution which can be used in any industrial workstation or laptop. They are therefore suitable for onsite AE monitoring.Our implementation, which is based on Compute Unified Device Architecture (CUDA), proves that GPU is able to achieve 30 times faster processing speed than CPU for AE signal preprocessing. The similarity calculation is accelerated by up to 80 times. These results prove that GPU processing is a powerful and low-cost accelerator for AE signal processing algorithms. 相似文献

6.

并行油藏模拟软件的实现及在国产高性能计算机上的应用 总被引：5，自引：0，他引：5

曹建文潘峰姚继锋孙家昶赵国忠《计算机研究与发展》2002,39(8):973-980

主要介绍了百万网格点规模的精细油藏数值模拟在国产高性能并行计算机与微机机群系统上的应用情况 .针对若干组来自于国内油田的百万网格点实际数据 ,给出了在多种国产并行机环境下的运行结果 ,并作了分析与评价 .在此基础上 ,讨论并行油藏数值模拟软件高效实现过程中遇到的关键技术 ,探讨大型软件并行化过程中经常遇到的瓶颈问题及改进方案相似文献

7.

HPDV1：高性能并行调试器标准

下载免费PDF全文

黄瑞芳朱敏《计算机工程与科学》2000,22(1):79-82

缺乏调度器标准是难以评估一个调试器优劣的重要原因,本文介绍了旨在解决问题的第一个调度标准－－高性能调试论坛（ＨＰＤＦ）制定的高性能调试器标准版本１（ＨＰＤＶ１）。重点阐述了并行调试概念和调试器的并行行为。相似文献

8.

UNICORE: uniform access to supercomputing as an element of electronic commerce 总被引：4，自引：0，他引：4

Jim Almond Dave Snelling 《Future Generation Computer Systems》1999,15(5-6):539-548

The battle for the desktop has been won by workstations and PCs. Offering computational capacity adequate for most applications, and superior user interfaces, they also incorporate the user’s link to a global information base via the World Wide Web. By contrast, High Performance Computing facilities tend to be increasingly isolated by such deterrents as geographical remoteness, architectural individuality, and the non-uniform operational policies of autonomous centres. The future of such centralised Supercomputing facilities and large scale data resources may depend to a large extent on the development of interfaces for accessing their resources from the user’s desktop in a uniform and user-friendly manner; otherwise, High Performance Computing may fall short of its full potential, becoming increasingly specialised and less competitive. In the most pessimistic scenario, the volume of the HPC market could fall below the threshold required for its economic survival in the free marketplace. The Uniform Interface to Computing Resources (UNICORE) project addresses these issues using the mechanisms of the World Wide Web (WWW). 相似文献

9.

MARIANE: Using MApReduce in HPC environments

《Future Generation Computer Systems》2014

MapReduce is increasingly becoming a popular programming model. However, the widely used implementation, Apache Hadoop, uses the Hadoop Distributed File System (HDFS), which is currently not directly applicable to a majority of existing HPC environments such as Teragrid and NERSC that support other distributed file systems. On such resourceful High Performance Computing (HPC) infrastructures, the MapReduce model can rarely make use of full resources, as special circumstances must be created for its adoption, or simply limited resources must be isolated to the same end. This paper not only presents a MapReduce implementation directly suitable for such environments, but also exposes the design choices for better performance gains in those settings. By leveraging inherent distributed file systems’ functions, and abstracting them away from its MapReduce framework, MARIANE (MApReduce Implementation Adapted for HPC Environments) not only allows for the use of the model in an expanding number of HPC environments, but also shows better performance in such settings. This paper identifies the components and trade-offs necessary for this model, and quantifies the performance gains exhibited by our approach in HPC environments over Apache Hadoop in a data intensive setting at the National Energy Research Scientific Computing Center (NERSC). 相似文献

10.

中国“863计划”高性能计算的发展

谢向辉胡苏太《数据与计算发展前沿》2015,6(4):3-10

本文从国家高技术研究发展计划(863计划)角度,阐述中国十多年来在高性能计算基础设施、应用支撑和服务环境等方面展开的一系列战略性计划,以及这些计划的落实过程、取得的成就及产生的影响,并展望了我国高性能计算事业的未来发展。相似文献

11.

A performance comparison of current HPC systems: Blue Gene/Q,Cray XE6 and InfiniBand systems

《Future Generation Computer Systems》2014

We present here a performance analysis of three of current architectures that have become commonplace in the High Performance Computing world. Blue Gene/Q is the third generation of systems from IBM that use modestly performing cores but at large-scale in order to achieve high performance. The XE6 is the latest in a long line of Cray systems that use a 3-D topology but the first to use its Gemini interconnection network. InfiniBand provides the flexibility of using compute nodes from many vendors that can be connected in many possible topologies. The performance characteristics of each vary vastly, and the way in which nodes are allocated in each type of system can significantly impact on achieved performance. In this work we compare these three systems using a combination of micro-benchmarks and a set of production applications. In addition we also examine the differences in performance variability observed on each system and quantify the lost performance using a combination of both empirical measurements and performance models. Our results show that significant performance can be lost in normal production operation of the Cray XE6 and InfiniBand Clusters in comparison to Blue Gene/Q. 相似文献

12.

HPSEPS 在元与神威 • 太湖之光上的移植和性能分析

李燕赵永华王武赵莲《数据与计算发展前沿》2018,9(2):46-52

“神威·太湖之光”高效能计算机系统是世界上首台峰值运算速度超过 10 亿亿次量级的超级计算机,HPSEPS (High Performance Symmetric Eigenproblem Solvers) 是自主开发的大规模对称稠密矩阵特征问题并行求解器,包括标准对称稠密矩阵特征问题的并行计算方法, 对大规模数据问题的计算,表现出较好的性能,本文分别在中科院的“元”超级计算机上和神威·太湖之光超级计算机上进行了移植, 对比了两种超级计算机的系统性能, 并且在“神威·太湖之光”上分别链接适合其异构众核结构的 xMath 数学库和 mkl 数学库, 对求解器在链接两种不同数学库的计算机效果进行了测试与分析。相似文献

13.

Towards a performance-portable description of geometric multigrid algorithms using a domain-specific language

《Journal of Parallel and Distributed Computing》2014,74(12):3191-3201

相似文献

14.

HPP:一种支持高性能和效用计算的体系结构 总被引：3，自引：0，他引：3

孙凝晖李凯陈明宇《计算机学报》2008,31(9)

为了同时做到应对千万亿次高性能计算的技术挑战和满足数据中心(data center)未来的主要应用模式效用计算(utility computing)的需求,提出了一种称为HPP(Hyper Parallel Processing)的高性能计算机体系结构.HPP的主要特征是全局地址空间(global address space)和单一操作系统映像的超节点(hyper node).HPP结合了MPP的可扩展性,DSM的高效通信和机群的普及化的优点,为高性能计算和效用计算都提供了许多创新研究的机会.基于HPP体系结构,实现了一个曙光5000高性能计算机的原型系统,初步验证了它的可行性. 相似文献

15.

A report on teaching a series of online lectures on quantum computing from CERN

Combarro Elías F. Vallecorsa Sofia Rodríguez-Muñiz Luis J. Aguilar-González Álvaro Ranilla José Di Meglio Alberto 《The Journal of supercomputing》2021,77(12):14405-14435

The Journal of Supercomputing - Quantum computing (QC) is one of the most promising new technologies for High Performance Computing. Its potential use in High Energy Physics has lead CERN, one of... 相似文献

16.

Offline enforcement of contracts for high‐performance computing

Kostadin Damevski 《Concurrency and Computation》2011,23(13):1465-1473

Design by contract is a well‐known software design methodology which enhances the quality of software through assertions expressed at the interface level. The overhead averse nature of High‐Performance Computing (HPC) applications often precludes the use of design by contract due to its potential overhead, especially when handling the large data sizes common in HPC. Our approach is to reduce the overhead of design by contract, by postponing (or offloading) across time and space the enforcement of contracts. We argue that the semantic implications of this approach are not significant, while leading to a large potential overhead reduction. A reduced overhead strategy to contract implementation may be necessary for a wider acceptance of this useful software engineering primitive. We apply our approach to contracts developed for software components based on the CCA (Common Component Architecture) model, which targets HPC applications. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献

17.

基于SPRINT分类算法的异构分布式数据挖掘研究 总被引：3，自引：0，他引：3

宾宁李宏陈松乔《计算机测量与控制》2005,13(1):76-78,91

分类算法是数据挖掘领域最重要的技术之一。随着网络的迅猛发展,分布式环境的日益普遍,分布式数据挖掘已成为近年来数据挖掘中的热点问题。针对目前的数据库多为异构式分布,提出利用SPRINT算法来进行分布式环境下的分类研究。先简要介绍了SPRINT算法,然后针对具体实例,详细探讨了分站点的预处理、计算最佳分裂、中心站点的决策树生成等几个阶段以及具体的算法设计实现过程。相似文献

18.

Visualizations of molecular dynamics simulations of high-performance polycrystalline structural ceramics

《Parallel Computing》2016

Initiated by the Department of Defense (DOD) High Performance Computing Modernization Program (HPCMP), the Data Analysis and Assessment Center (DAAC), serves the needs of DOD HPCMP scientists by facilitating the analysis of an ever-increasing volume and complexity of data [1]. A research scientist and HPCMP user ran nanoscale molecular dynamics simulations using Large-scale Atomic/Molecular Massively Parallel Simulator code (LAMMPS) from Sandia National Labs. The largest simulation contained over 70 million atoms (Fig. 7). Data sets this large are required to study crack propagation and failure mechanisms that span multiple length scales with atomic resolution. The DAAC developed new methods to visualize the time evolution of data sets this large. The size and complexity of the molecular dynamics simulations and the analytics required the use of DOD HPCMP High Performance Computing (HPC) resources. 相似文献

19.

Monitoring Power Data: A first step towards a unified energy efficiency evaluation toolset for HPC data centers

《Environmental Modelling & Software》2014

The energy consumption of High Performance Computing (HPC) systems, which are the key technology for many modern computation-intensive applications, is rapidly increasing in parallel with their performance improvements. This increase leads HPC data centers to focus on three major challenges: the reduction of overall environmental impacts, which is driven by policy makers; the reduction of operating costs, which are increasing due to rising system density and electrical energy costs; and the 20 MW power consumption boundary for Exascale computing systems, which represent the next thousandfold increase in computing capability beyond the currently existing petascale systems. Energy efficiency improvements will play a major part in addressing these challenges.This paper presents a toolset, called Power Data Aggregation Monitor (PowerDAM), which collects and evaluates data from all aspects of the HPC data center (e.g. environmental information, site infrastructure, information technology systems, resource management systems, and applications). The aim of PowerDAM is not to improve the HPC data center's energy efficiency, but is to collect energy relevant data for analysis without which energy efficiency improvements would be non-trivial and incomplete. Thus, PowerDAM represents a first step towards a truly unified energy efficiency evaluation toolset needed for improving the overall energy efficiency of HPC data centers. 相似文献

20.

A New Parallel Differential Method for Optical Flow Estimation

Carmelo Lodato Salvatore Lopes 《Journal of Mathematical Imaging and Vision》2006,26(3):345-356

Optical flow estimation is a recurrent problem in several disciplines and assumes a primary importance in a number of applicative fields such as medical imaging [12], computer vision [6], productive process control [4], etc. In this paper, a differential method for optical flow evaluation is being presented. It employs a new error formulation that ensures a more than satisfactory image reconstruction in those points which are free of motion discontinuity. A dynamic scheme of brightness-sample processing has been used to regularise the motion field. A technique based on the concurrent processing of sequences with multiple pairs of images has also been developed for improving detection and resolution of mobile objects on the scene, if they exist. This approach permits to detect motions ranging from a fraction of a pixel to a few pixels per frame. Good results, even on noisy sequences and without the need of a filtering pre-processing stage, can be achieved. The intrinsic method structure can be exploited for favourable implementation on multi-processor systems with a scalable degree of parallelism. Several sequences, some with noise and presenting various types of motions, have been used for evaluating the performances and the effectiveness of the method. Carmelo Lodato received his Dr. Ing. Degree in Civil Engineering from the University of Palermo, Italy, in 1987. He is Researcher at the High Performance Computing and Networking Institute (ICAR) of the Italian National Research Council (CNR). His current research interests include computer vision, image processing, motion analysis, optimization and stochastic algorithms. Salvatore Lopes received his Dr. Ing. Degree (summa com laude) in Nuclear Engineering from the University of Palermo, Italy, in 1988. He is Researcher at the High Performance Computing and Networking Institute (ICAR) of the Italian National Research Council (CNR). His current research interests include computer vision, image processing, motion analysis, optimization and stochastic algorithms. 相似文献