期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Programming for scientific computing on peta-scale heterogeneous parallel systems

杨灿群吴强唐滔王锋薛京灵《中南工业大学学报(英文版)》2013,20(5):1189-1203

相似文献

2.

A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs

下载免费PDF全文

杨杨崔慧敏冯晓兵薛京灵《计算机科学技术学报》2012,27(1):57-74

In this paper,we present a hybrid circular queue method that can significantly boost the performance of stencil computations on GPU by carefully balancing usage of registers and shared-memory.Unlike ea... 相似文献

3.

PartialRC: A Partial Recomputing Method for Efficient Fault Recovery on GPGPUs

下载免费PDF全文

徐新海杨学军薛京灵林宇斐林一松《计算机科学技术学报》2012,27(2):240-255

GPGPUs are increasingly being used to as performance accelerators for HPC (High Performance Computing) applications in CPU/GPU heterogeneous computing systems, including TianHe-1A, the world’s fastest supercomputer in the TOP500 list, built at NUDT (National University of Defense Technology) last year. However, despite their performance advantages, GPGPUs do not provide built-in fault-tolerant mechanisms to offer reliability guarantees required by many HPC applications. By analyzing the SIMT (single-instruction, multiple-thread) characteristics of programs running on GPGPUs, we have developed PartialRC, a new checkpoint-based compiler-directed partial recomputing method, for achieving efficient fault recovery by leveraging the phenomenal computing power of GPGPUs. In this paper, we introduce our PartialRC method that recovers from errors detected in a code region by partially re-computing the region, describe a checkpoint-based faulttolerance framework developed on PartialRC, and discuss an implementation on the CUDA platform. Validation using a range of representative CUDA programs on NVIDIA GPGPUs against FullRC (a traditional full-recomputing Checkpoint-Rollback-Restart fault recovery method for CPUs) shows that PartialRC reduces significantly the fault recovery overheads incurred by FullRC, by 73.5% when errors occur earlier during execution and 74.6% when errors occur later on average. In addition, PartialRC also reduces error detection overheads incurred by FullRC during fault recovery while incurring negligible performance overheads when no fault happens. 相似文献

4.

PARBLO: Page-Allocation-Based DRAM Row Buffer Locality Optimization 总被引：1，自引：1，他引：0

下载免费PDF全文

Wei Mi 《计算机科学技术学报》2009,24(6):1086-1097

DRAM row buffer conflicts can increase memory access latency significantly. This paper presents a new page-allocation-based optimization that works seamlessly together with some existing hardware and software optimizations to eliminate significantly more row buffer conflicts. Validation in simulation using a set of selected scientific and engineering benchmarks against a few representative memory controller optimizations shows that our method can reduce row buffer miss rates by up to 76% (with an average of 37.4%). This reduction in row buffer miss rates will be translated into performance speedups by up to 15% (with an average of 5%). 相似文献

5.

OpenMC： Towards Simplifying Programming for TianHe Supercomputers

下载免费PDF全文

廖湘科杨灿群唐滔易会战王锋吴强薛京灵《计算机科学技术学报》2014,(3):532-546

Modern petascale and future exascale systems are massively heterogeneous architectures. Developing productive intra-node programming models is crucial toward addressing their programming challenge. We introduce a directive- based intra-node programming model, OpenMC, and show that this new model can achieve ease of programming, high performance, and the degree of portability desired for heterogeneous nodes, especially those in TianHe supercomputers. While existing models are geared towards oifloading computations to accelerators （typically one）, OpenMC alms to more uniformly and adequately exploit the potential offered by multiple CPUs and accelerators in a compute node. OpenMC achieves this by providing a unified abstraction of hardware resources as workers and facilitating the exploitation of asynchronous task parallelism on the workers. We present an overview of OpenMC, a prototyping implementation, and results from some initial comparisons with OpenMP and hand-written code in developing six applications on two types of nodes from TianHe supercomputers. 相似文献

6.

文件传输及远程命令执行系统NET

薛京灵王用宁郑启华《小型微型计算机系统》1985,(12)

本文论述了系统NET。该系统实现了RD11计算机与dual68000计算机之间的文件传输,并部分实现了RD11机对dual68000机的远程命命执行。相似文献