期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

郑文旭潘晓东马迪汪浩《计算机工程与科学》2019,41(9):1526-1533

由于科学研究与商业应用等对高性能计算的需求与日俱增,高性能计算的性能和系统规模得到迅速发展。但是,急剧增长的功耗严重限制了高性能计算系统的设计和使用,使得低功耗技术成为高性能计算领域的关键技术。作为整个系统的核心组件,作业调度系统立足有限的系统资源,对用户提交的应用进行作业-资源分配,其能效性对于整个高性能计算系统的能耗控制与调节起到至关重要的作用。首先介绍主要的能量效率技术和常用的作业调度策略,然后对当前高性能计算作业调度能效性进行分析,并讨论了其面临的挑战及未来发展方向。相似文献

2.

Monitoring Power Data: A first step towards a unified energy efficiency evaluation toolset for HPC data centers

《Environmental Modelling & Software》2014

The energy consumption of High Performance Computing (HPC) systems, which are the key technology for many modern computation-intensive applications, is rapidly increasing in parallel with their performance improvements. This increase leads HPC data centers to focus on three major challenges: the reduction of overall environmental impacts, which is driven by policy makers; the reduction of operating costs, which are increasing due to rising system density and electrical energy costs; and the 20 MW power consumption boundary for Exascale computing systems, which represent the next thousandfold increase in computing capability beyond the currently existing petascale systems. Energy efficiency improvements will play a major part in addressing these challenges.This paper presents a toolset, called Power Data Aggregation Monitor (PowerDAM), which collects and evaluates data from all aspects of the HPC data center (e.g. environmental information, site infrastructure, information technology systems, resource management systems, and applications). The aim of PowerDAM is not to improve the HPC data center's energy efficiency, but is to collect energy relevant data for analysis without which energy efficiency improvements would be non-trivial and incomplete. Thus, PowerDAM represents a first step towards a truly unified energy efficiency evaluation toolset needed for improving the overall energy efficiency of HPC data centers. 相似文献

3.

Environment-conscious scheduling of HPC applications on distributed Cloud-oriented data centers 总被引：4，自引：0，他引：4

Saurabh Kumar GargAuthor Vitae Chee Shin Yeo^{Author Vitae} 《Journal of Parallel and Distributed Computing》2011,71(6):732-749

The use of High Performance Computing (HPC) in commercial and consumer IT applications is becoming popular. HPC users need the ability to gain rapid and scalable access to high-end computing capabilities. Cloud computing promises to deliver such a computing infrastructure using data centers so that HPC users can access applications and data from a Cloud anywhere in the world on demand and pay based on what they use. However, the growing demand drastically increases the energy consumption of data centers, which has become a critical issue. High energy consumption not only translates to high energy cost which will reduce the profit margin of Cloud providers, but also high carbon emissions which are not environmentally sustainable. Hence, there is an urgent need for energy-efficient solutions that can address the high increase in the energy consumption from the perspective of not only the Cloud provider, but also from the environment. To address this issue, we propose near-optimal scheduling policies that exploit heterogeneity across multiple data centers for a Cloud provider. We consider a number of energy efficiency factors (such as energy cost, carbon emission rate, workload, and CPU power efficiency) which change across different data centers depending on their location, architectural design, and management system. Our carbon/energy based scheduling policies are able to achieve on average up to 25% of energy savings in comparison to profit based scheduling policies leading to higher profit and less carbon emissions. 相似文献

4.

有限差分离散模型分布并行计算支持库技术

朱小谦《计算机工程与科学》2009,31(Z1)

本文研究实现了一个面向有限差分离散模型的分布并行计算支持库YHLIB。YHLIB库基于MPI消息传递接口设计实现,通过提供有限差分离散模型并行计算接口支持计算区域分解、域间通信、域内通信、循环下标转换、分布式I/O、动态负载平衡等功能,封装了并行计算实现细节,提高了并行程序开发效率。抽象模型实现和实际应用测试表明,YHLIB具有较高的并行效率。相似文献

5.

GPGPU编程技术初探

林茂董玉敏邹杰杨敏张晋楠《电脑编程技巧与维护》2010,(2):15-17,23

伴随着GPGPU计算技术的不断发展,HPC高性能计算系统体系结构正在悄然发生着一场变革,这场变革为高性能计算发展提供了一个新的方向、CUDA是NIVIDIA公司提供的利用GPGPU进行并行运算应用开发的一套C语言编程平台,通过它可以利用特定显卡的高性能运算能力进行一些大规模高性能计算,有效提升计算机系统的使用效率,本文主要介绍GPU发展现状以及如何利用CUDA编程技术进行并行运算软件开发．相似文献

6.

Model‐based MPI‐IO tuning with Periscope tuning framework

Weifeng Liu Michael Gerndt Bin Gong 《Concurrency and Computation》2016,28(1):3-20

For many parallel applications, I/O performance is a major bottleneck. MPI‐IO, defined by the MPI forum, can help parallel applications overcome the performance and portability limitations of existing parallel I/O interfaces. Although autotuning has been used to improve the performance of computing kernels, MPI‐IO autotuning has rarely been studied. To automate MPI‐IO performance tuning, we designed and implemented an automatic tuner. The tuner relies on the Periscope tuning framework for transparently passing hints to the MPI‐IO library and for automatically collecting performance data. Unlike computational code, each MPI‐IO function takes a relatively long time to complete. Thus, exhaustively searching through the entire parameter space is impractical. So we developed a performance model that can direct us to shorten the tuning time. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

7.

A Computational Science IDE for HPC Systems: Design and Applications

David E. Hudak Neil Ludban Ashok Krishnamurthy Vijay Gadepally Siddharth Samsi John Nehrbass 《International journal of parallel programming》2009,37(1):91-105

相似文献

8.

Accelerating big data analytics on HPC clusters using two-level storage

《Parallel Computing》2017

Data-intensive applications that are inherently I/O bound have become a major workload on traditional high-performance computing (HPC) clusters. Simply employing data-intensive computing storage such as HDFS or using parallel file systems available on HPC clusters to serve such applications incurs performance and scalability issues. In this paper, we present a novel two-level storage system that integrates an upper-level in-memory file system with a lower-level parallel file system. The former renders memory-speed high I/O performance and the latter renders consistent storage with large capacity. We build a two-level storage system prototype with Tachyon and OrangeFS, and analyze the resulting I/O throughput for typical MapReduce operations. Theoretical modeling and experiments show that the proposed two-level storage delivers higher aggregate I/O throughput than HDFS and OrangeFS and achieves scalable performance for both read and write. We expect this two-level storage approach to provide insights on system design for big data analytics on HPC clusters. 相似文献

9.

Evaluating ARM HPC clusters for scientific workloads

Jahanzeb Maqbool Sangyoon Oh Geoffrey C. Fox 《Concurrency and Computation》2015,27(17):5390-5410

The power consumption of modern high‐performance computing (HPC) systems that are built using power hungry commodity servers is one of the major hurdles for achieving Exascale computation. Several efforts have been made by the HPC community to encourage the use of low‐powered system‐on‐chip (SoC) embedded processors in large‐scale HPC systems. These initiatives have successfully demonstrated the use of ARM SoCs in HPC systems, but there is still a need to analyze the viability of these systems for HPC platforms before a case can be made for Exascale computation. The major shortcomings of current ARM‐HPC evaluations include a lack of detailed insights about performance levels on distributed multicore systems and performance levels for benchmarking in large‐scale applications running on HPC. In this paper, we present a comprehensive evaluation of results that covers major aspects of server and HPC benchmarking for ARM‐based SoCs. For the experiments, we built an unconventional cluster of ARM Cortex‐A9s that is referred to as Weiser and ran single‐node benchmarks (STREAM, Sysbench, and PARSEC) and multi‐node scientific benchmarks (High‐performance Linpack (HPL), NASA Advanced Supercomputing (NAS) Parallel Benchmark, and Gadget‐2) in order to provide a baseline for performance limitations of the system. Based on the experimental results, we claim that the performance of ARM SoCs depends heavily on the memory bandwidth, network latency, application class, workload type, and support for compiler optimizations. During server‐based benchmarking, we observed that when performing memory intensive benchmarks for database transactions, x86 performed 12% better for multithreaded query processing. However, ARM performed four times better for performance to power ratios for a single core and 2.6 times better on four cores. We noticed that emulated double precision floating point in Java resulted in three to four times slower performance as compared with the performance in C for CPU‐bound benchmarks. Even though Intel x86 performed slightly better in computation‐oriented applications, ARM showed better scalability in I/O bound applications for shared memory benchmarks. We incorporated the support for ARM in the MPJ‐Express runtime and performed comparative analysis of two widely used message passing libraries. We obtained similar results for network bandwidth, large‐scale application scaling, floating‐point performance, and energy‐efficiency for clusters in message passing evaluations (NBP and Gadget 2 with MPJ‐Express and MPICH). Our findings can be used to evaluate the energy efficiency of ARM‐based clusters for server workloads and scientific workloads and to provide a guideline for building energy‐efficient HPC clusters. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

10.

异构并行系统能耗优化分析模型

王桂彬杨学军唐滔徐新海《软件学报》2012,23(6):1382-1396

随着处理器功耗不断增大,功耗问题逐渐成为高性能计算机系统设计与实现的首要问题.当前,异构系统已成为高性能计算机的发展趋势之一.与传统同构体系结构相比,异构体系结构具有更高的理论峰值性能和能效,但是如何在满足应用性能的条件下充分发掘异构系统的能效优势,仍是一个挑战性问题.通过将应用程序抽象为由串行段和并行段组成的一般程序模型,建立了异构并行系统能耗优化模型通过分析方法依次给出并行段以及全程序(多程序段)能耗最优时处理器间满足的关系,分别给出了时间约束下能耗最优的处理器频率选择算法.最后,以CPU-GPU异构系统为平台,通过8个典型应用程序验证了方法的有效性. 相似文献

11.

MapReduce集群环境下的数据放置策略

荀亚玲张继福秦啸《软件学报》2015,26(8):2056-2073

MapReduce是一种适用于大规模数据密集型应用的有效编程模型,具有编程简单、易于扩展、容错性好等特点,已在并行和分布式计算领域得到了广泛且成功的应用.由于MapReduce将计算扩展到大规模的机器集群上,处理数据的合理放置成为影响MapReduce集群系统性能(包括能耗、资源利用率、通信和I/O代价、响应时间、系统的可靠性和吞吐率等)的关键因素之一.首先,对MapReduce编程模型的典型实现——Hadoop缺省的数据放置策略进行分析,并进一步讨论了MapReduce框架下,设计数据放置策略时需考虑的关键问题和衡量数据放置策略的标准;其次,对目前MapReduce集群环境下的数据放置策略优化方法的研究与进展进行了综述和分析;最后,分析和归纳了MapReduce集群环境下数据放置策略的下一步研究工作. 相似文献

12.

Dynamic-CoMPI: dynamic optimization techniques for MPI parallel applications

Rosa Filgueira Jesús Carretero David E. Singh Alejandro Calderón Alberto Núñez 《The Journal of supercomputing》2012,59(1):361-391

This work presents an optimization of MPI communications, called Dynamic-CoMPI, which uses two techniques in order to reduce the impact of communications and non-contiguous I/O requests in parallel applications. These techniques are independent of the application and complementaries to each other. The first technique is an optimization of the Two-Phase collective I/O technique from ROMIO, called Locality aware strategy for Two-Phase I/O (LA-Two-Phase I/O). In order to increase the locality of the file accesses, LA-Two-Phase I/O employs the Linear Assignment Problem (LAP) for finding an optimal I/O data communication schedule. The main purpose of this technique is the reduction of the number of communications involved in the I/O collective operation. The second technique, called Adaptive-CoMPI, is based on run-time compression of MPI messages exchanged by applications. Both techniques can be applied on every application, because both of them are transparent for the users. Dynamic-CoMPI has been validated by using several MPI benchmarks and real HPC applications. The results show that, for many of the considered scenarios, important reductions in the execution time are achieved by reducing the size and the number of the messages. Additional benefits of our approach are the reduction of the total communication time and the network contention, thus enhancing, not only performance, but also scalability. 相似文献

13.

基于Docker的MPI和OpenMP混合编程

赵博颖肖鹏张力《计算机与现代化》2018,(5):60

针对当前搭建集群并行系统复杂且耗时等问题,提出基于Docker搭建并行系统。介绍轻量级虚拟化技术Docker的核心概念和基本架构,并基于Docker技术在Linux平台上搭建集群并行开发环境。简要阐述并行计算的思想,叙述MPI和OpenMP并行计算的基本概念和特点,针对矩阵并行乘法的算法建立MPI和OpenMP的混合编程模型,并给出混合编程模型与MPI并行编程模型以及OpenMP并行编程模型的性能对比,分析出现差异的原因。基于该混合编程模型比较Docker与传统物理机两者搭建的并行系统的并行效率。相似文献

14.

集群环境下I/O文件传输调度器的设计与实现

刘冰段富《电脑开发与应用》2012,25(6):33-35

近年来研究人员对高性能计算中的并行I/O问题进行了深入研究,然而这些研究主要针对MPP问题,而对集群计算机系统中并行I/O问题的研究不多。因此,对于集群计算中的并行I/O系统进行研究是一个重要的研究课题。对集群计算中的并行I/O传输调度效率进行研究,设计了一个文件传输调度器,可以实现文件传输最快捷,节点资源最大利用,显著提高I/O节点吞吐率和反应时间。经过大量数据的测试和实验证明该调度器的有效性和适用性。相似文献

15.

A comparative study of Java and C performance in two large‐scale parallel applications

Aamir Shafi Bryan Carpenter Mark Baker Aftab Hussain 《Concurrency and Computation》2009,21(15):1882-1906

In the 1990s the Message Passing Interface Forum defined MPI bindings for Fortran, C, and C++. With the success of MPI these relatively conservative languages have continued to dominate in the parallel computing community. There are compelling arguments in favour of more modern languages like Java. These include portability, better runtime error checking, modularity, and multi‐threading. But these arguments have not converted many HPC programmers, perhaps due to the scarcity of full‐scale scientific Java codes, and the lack of evidence for performance competitive with C or Fortran. This paper tries to redress this situation by porting two scientific applications to Java. Both of these applications are parallelized using our thread‐safe Java messaging system—MPJ Express. The first application is the Gadget‐2 code, which is a massively parallel structure formation code for cosmological simulations. The second application uses the finite‐domain time‐difference method for simulations in the area of computational electromagnetics. We evaluate and compare the performance of the Java and C versions of these two scientific applications, and demonstrate that the Java codes can achieve performance comparable with legacy applications written in conventional HPC languages. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

16.

Scaling to a million cores and beyond: Using light-weight simulation to understand the challenges ahead on the road to exascale

《Future Generation Computer Systems》2014

As supercomputers scale to 1000 PFlop/s over the next decade, investigating the performance of parallel applications at scale on future architectures and the performance impact of different architecture choices for high-performance computing (HPC) hardware/software co-design is crucial. This paper summarizes recent efforts in designing and implementing a novel HPC hardware/software co-design toolkit. The presented Extreme-scale Simulator (xSim) permits running an HPC application in a controlled environment with millions of concurrent execution threads while observing its performance in a simulated extreme-scale HPC system using architectural models and virtual timing. This paper demonstrates the capabilities and usefulness of the xSim performance investigation toolkit, such as its scalability to 2²⁷ simulated Message Passing Interface (MPI) ranks on 960 real processor cores, the capability to evaluate the performance of different MPI collective communication algorithms, and the ability to evaluate the performance of a basic Monte Carlo application with different architectural parameters. 相似文献

17.

面向高性能计算机的海量数据处理平台实现与评测

黄訸易晓东李姗姗廖湘科《计算机研究与发展》2012,(Z1):357-361

高性能计算机主要应用于传统的科学计算领域,而在云计算时代,数据密集型应用成为一大类新型应用,已经变得越来越重要.主要探索如何在高性能计算机上高效地进行海量数据处理,使高性能计算机在进行科学计算的同时,能够非常好地支持数据密集型应用,拓展高性能计算机的应用领域.分析了高性能计算机上MapReduce模型实现和部署的可行性之后,在高性能计算环境中进行了实验.实验结果表明,存储系统的并行I/O能力不能充分发挥,是造成系统无法高效运行的主要瓶颈.而导致这个性能瓶颈的原因,是高并发带来的对集群文件系统资源的竞争和冲突.最后,提出了几种解决集群文件系统资源冲突的方案,这是今后的研究方向. 相似文献

18.

MPI和OpenMP混合并行模型下的遥感编目信息检索

下载免费PDF全文

曲海成梁雪剑刘万军籍瑞庆《中国图象图形学报》2015,20(11):1552-1560

目的空间位置检索是遥感影像检索中的关键步骤,为进一步提高海量遥感影像编目数据定位检索效率,降低误检率,提出一种基于MPI和OpenMP混合编程模型对射线法进行多层次并行化实现。方法首先完善传统射线法处理点在多边形边上以及射线与边的端点相交的情况;其次采用MPI实现基于程序层面多机并行,OpenMP实现算法层面单机多线程并行,通过开启多个线程同时处理多边形的各个点,判断它们是否在另一个多边形的内部。结果当系统中所有节点开启线程数之和等于主节点的最佳线程数时,全局计算速度达到最佳。混合并行算法相比串行算法检索时间减少50%以上,效率更高。结论 MPI+OpenMP混合并行比普通的串行执行、单纯MPI并行或单纯OpenMP并行执行空间定位检索算法效率显著提高,这种并行方案普遍适用于集群环境下的并行程序,并且可以进一步拓展到其他图像处理算法领域。相似文献

19.

基于强化学习的智能I/O调度算法

下载免费PDF全文

李琼郭御风蒋艳凰《计算机工程与科学》2010,32(7):58-61

利用机器学习方法解决存储领域中若干技术难题是目前存储领域的研究热点之一。强化学习作为一种以环境反馈作为输入、自适应环境的特殊的机器学习方法,能通过观测环境状态的变化,评估控制决策对系统性能的影响来选择最优的控制策略,基于强化学习的智能RAID控制技术具有重要的研究价值。本文针对高性能计算应用特点,将机器学习领域中的强化学习技术引入RAID控制器中,提出了基于强化学习的智能I/O调度算法RL-scheduler,利用Q-学习策略实现了面向并行应用的自治调度策略。RL-scheduler综合考虑了调度的公平性、磁盘寻道时间和MPI应用的I/O访问效率,并提出多Q-表交叉组织方法提高Q-表的更新效率。实验结果表明,RL-scheduler缩短了并行应用的平均I/O服务时间,提高了大规模并行计算系统的I/O吞吐率。相似文献

20.

Tuple switching network—When slower may be better

Justin Y. Shi Moussa Taifi Abdallah Khreishah Jie Wu 《Journal of Parallel and Distributed Computing》2012

This paper reports an application dependent network design for extreme scale high performance computing (HPC) applications. Traditional scalable network designs focus on fast point-to-point transmission of generic data packets. The proposed network focuses on the sustainability of high performance computing applications by statistical multiplexing of semantic data objects. For HPC applications using data-driven parallel processing, a tuple is a semantic object. We report the design and implementation of a tuple switching network for data parallel HPC applications in order to gain performance and reliability at the same time when adding computing and communication resources. We describe a sustainability model and a simple computational experiment to demonstrate extreme scale application’s sustainability with decreasing system mean time between failures (MTBF). Assuming three times slowdown of statistical multiplexing and 35% time loss per checkpoint, a two-tier tuple switching framework would produce sustained performance and energy savings for extreme scale HPC application using more than 1024 processors or less than 6 hour MTBF. Higher processor counts or higher checkpoint overheads accelerate the benefits. 相似文献