期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

贾迅邬贵明谢向辉《计算机工程与科学》2018,40(2):224-230

能耗是目前高性能计算系统性能提升的一大挑战。主处理器连接加速器的异构计算技术可以有效提升系统能效,因而被广泛应用于当前高性能计算系统的设计。同等系统规模下,异构计算系统的Linpack效率普遍低于同构系统。针对这一问题,从结构设计的角度,基于真实计算系统的设计参数和性能数据,分析了大规模异构高性能计算系统Linpack效率受限的主要因素及其对结构设计的需求,并构建了针对异构计算系统的Linpack性能模型对分析结论进行了验证。研究成果对异构计算系统Linpack的性能优化以及未来高效异构架构的设计具有一定的指导意义。相似文献

2.

Dawning4000A high performance computer

Sun Ninghui Meng Dan 《Frontiers of Computer Science in China》2007,1(1):20-25

Dawning4000A is an AMD Opteron-based Linux Cluster with 11.2Tflops peak performance and 8.06Tflops Linpack performance. It was developed for the Shanghai Supercomputer Center (SSC) as one of the computing power stations of the China National Grid (CNGrid) project. The Massively Cluster Computer (MCC) architecture is proposed to put added-value on the industry standard system. Several grid-enabling components are developed to support the running environment of the CNGrid. It is an achievement for a high performance computer with the low-cost approach. 相似文献

3.

曙光2000超级计算机系统软件的设计 总被引：10，自引：3，他引：7

孙凝晖徐志伟《计算机学报》2000,23(1):9-20

曙光２０００超级计算机系统采用可扩展机群体系结构,是通用的超级并行计算机,可支持科学与工程计算、网络服务和数据处理应用。该文介绍了曙光２０００系统软件设计采用担ＳＵＭＡ技术路线,即在通信软件、可扩展文件系统和服务器取信的设计上体现可管理性,在单一系统映像、集成化并行环境和傻瓜界面的设计上体现好用性。文章详细阐述了系统软件的设计和关键技术,包括通信系统、ＣＯＳＭＯＳ可扩展文件系统、管理软件和用刻界面相似文献

4.

Petaflop supercomputers of China

Guoliang CHEN 《Frontiers of Computer Science in China》2010,4(4):427

After ten years of development, high performance computing (HPC) in China has made remarkable progress. In November, 2010, the NUDT Tianhe-1A and the Dawning Nebulae respectively claimed the 1st and 3rd places in the Top500 Supercomputers List; this recognizes internationally the level that China has achieved in high performance computer manufacturing. 相似文献

5.

曙光3000超级服务器设计的关键问题研究 总被引：1，自引：0，他引：1

孙凝晖孟丹《计算机学报》2002,25(11):1121-1132

曙光3000超级服务器是基于SMP机群体系结构的通用计算机系统，具有可扩展性，可用性，可管理性和高可用性的技术特点，该文着重介绍曙光3000系统设计中的若干关键问题，包括与SMP机群体系结构相关的可扩展性问题，系统软件中重要的可用性设计、底层通信的多种应用才机群管理系统的跨平台支持设计，另外还论述了超级服务器设计中存在的问题和作者的看法。相似文献

6.

可扩展单一映象文件系统的设计、实现及评价 总被引：1，自引：0，他引：1

王建勇祝明发徐志伟朱宁宁张弛《计算机研究与发展》1999,36(12):1502-1509

曙光超级服务器是典型的机群系统,ＣＯＳＭＯＳ是为其研制开发的可扩展的单一映象文件系统。文中主要描述了ＣＯＳＭＯＳ原型系统的设计、实现及评价。其中重点介绍了双粒度合作式缓存、分布式元数据管理及网络磁盘存储分组等关键技术,并利用Ｉ／Ｏ其准程序对原型文件系统进行了性能评价,测试结果表明了该原型系统在保证系统单一映象的基础上,具备良好的可扩展性。相似文献

7.

基于机群操作系统的并行调试器 总被引：2，自引：0，他引：2

鄢超刘淘英陈国良《计算机研究与发展》2004,41(4):630-636

并行调试工具的设计，是并行计算环境工具研究开发中的一个突出难点。介绍了一个在曙光3000上实现的并行调试器DCDB3．0。该调试器是未来曙光4000机群操作系统的一部分，是曙光3000上的第1个可运行版本，采用典型的客户／服务器模式。客户端的用户界面可将冗繁的调试信息与操作可视化。客户端可以远离提供服务的大型机，其远程通信依赖的是机群操作系统中的DRPC和任务管理，前者提供远程方法调用，后者使得客户端能够在服务器上启动相应的任务。DCDB3．0的服务器端负责处理调试任务和同客户端进行信息交互。DCDB3．0的功能具有可扩放性，使得可以在此平台上研究一些高级并行调试技术的实现。改进了已有的方式，实现了重放技术，并计划进一步添加其他高级并行调试技术。相似文献

8.

蛀洞路由机制及其芯片设计 总被引：4，自引：0，他引：4

曾嵘董向军《计算机学报》1997,20(5):404-411

蛀洞路由技术是“曙光１０００”大规模并行计算机所采用的一面关键技术。本文讨论了蛀洞路由机制，详细介绍了“曙光１０００”的蛀洞路由算法、芯片结构设计，以及由该芯片构成的处理机互连网络。相似文献

9.

曙光5000高性能计算机Barrier网络的设计 总被引：1，自引：0，他引：1

曹政王达伟刘新春孙凝晖《计算机学报》2008,31(10)

为优化Barrier操作的性能,提高大规模并行计算应用在曙光5000系统中的执行效率,文中提出了一种基于硬件的Barrier加速设计.该设计是采用树形Barrier算法,通过增强曙光5000互联网络交换芯片的功能,实现低延迟、可扩展、高可靠和可管理的Barrier网络.该网络支持并发16个Barrier操作,可在Fat-Tree拓扑环境下实现较低的Barrier操作延迟.相比已有实现,是更适合Fat-Tree拓扑的设计方案.理想情况下,1024个节点的同步操作在1.7μs内完成.根据Barrier操作归约和分发过程的特点,分别采用请求应答和超时催促两种机制,为Barrier操作的可靠性提供保障.以该设计实现的Barrier网络原型系统已通过FPGA验证. 相似文献

10.

Design and implementation of communication system of the Dawning 6000 supercomputer

Qiang Li Bo Li Zhigang Huo Ninghui Sun 《Frontiers of Computer Science in China》2010,4(4):466-474

An increasing number of supercomputers adopt a heterogeneous architecture, consisting of both general purpose CPUs and specialized accelerators. Such design is beneficial for scalability and power, but on the other hand, heterogeneity brings new challenges in communication systems to connect heterogeneous components and provide support for programming. The communication system of the Dawning 6000 connects two kinds of heterogeneous processors, Loongson and AMD, and adopts a three layer architecture with an intranode layer between heterogeneous components. To efficiently connect heterogeneous components, the system forms a global address space and provides a mechanism for message transmission via an in-node global store; and employing Infiniband network, provides an OS-bypassing virtualization method to share an Infiniband card between nodes. To facilitate programming on heterogeneous processors, it supports unified parallel C (UPC), with a modified complier based on global address space. Also, a special collective network is implemented for collective operations. Results obtained from a prototype system prove these features to be both feasible and efficient. 相似文献

11.

一体化机群操作系统Phoenix 总被引：8，自引：0，他引：8

孟丹詹剑锋王磊涂碧波张志宏《计算机研究与发展》2005,42(6):979-986

从操作系统的角度完备地定义了一体化机群功能软件Phoenix的体系结构,将机群操作系统分为异构资源、机群操作系统核心、用户环境3个层次,综合用户环境的核心需求,定义了机群操作系统核心的结构,并且基于组服务保证了机群操作系统核心的容错和可扩展特性．在机群操作系统核心的基础上构造了满足于不同用户需求的用户环境．Phoenix在曙光4000A高性能计算机系统上得到了应用．相似文献

12.

Design and implementation of communication system of the Dawning 6000 supercomputer

Qiang LI Bo LI Zhigang HUO Ninghui SUN 《Frontiers of Computer Science》2010,4(4):466

An increasing number of supercomputers adopt a heterogeneous architecture, consisting of both general purpose CPUs and specialized accelerators. Such design is beneficial for scalability and power, but on the other hand, heterogeneity brings new challenges in communication systems to connect heterogeneous components and provide support for programming. The communication system of the Dawning 6000 connects two kinds of heterogeneous processors, Loongson and AMD, and adopts a three layer architecture with an intra-node layer between heterogeneous components. To efficiently connect heterogeneous components, the system forms a global address space and provides a mechanism for message transmission via an in-node global store; and employing Infiniband network, provides an OS-bypassing virtualization method to share an Infiniband card between nodes. To facilitate programming on heterogeneous processors, it supports unified parallel C (UPC), with a modified complier based on global address space. Also, a special collective network is implemented for collective operations. Results obtained from a prototype system prove these features to be both feasible and efficient. 相似文献

13.

基于高速通信协议的COSMOS机群文件系统性能研究 总被引：4，自引：0，他引：4

贺劲徐志伟孟丹马捷冯军《计算机研究与发展》2002,39(2):129-135

作为曙光3000超级服务器的重要组成部分,COSMOS机群文件系统对机群文件系统协议,结构及性能优化等问题进行全面深入的探讨,首先描述了基于曙光3000机群高速通协议BCL－3的COSMOS文件系统的实现,然后引入并发带宽利用率,描述了通信与I／O对机群文件系统性能影响程序,最后介绍了有关性能实验并对实验结果作出解释。相似文献

14.

Challenges and possible approaches: towards the petaflops computers

Depei QIAN Danfeng ZHU 《Frontiers of Computer Science》2009,3(3):273

In parallel with the R&D efforts in USA and Europe, China’s National High-tech R&D program has setup its goal in developing petaflops computers. Researchers and engineers world-wide are looking for appropriate methods and technologies to achieve the petaflops computer system. Based on discussion on important design issues in developing the petaflops computer, this paper raises the major technological challenges including the memory wall, low power system design, interconnects, and programming support, etc. Current efforts in addressing some of these challenges and in pursuing possible solutions for developing the petaflops systems are presented. Several existing systems are briefly introduced as examples, including Roadrunner, Cray XT5 jaguar, Dawning 5000A/6000, and Lenovo DeepComp 7000. Architectures proposed by Chinese researchers for implementing the petaflops computer are also introduced. Advantages of the architecture as well as the difficulties in its implementation are discussed. Finally, future research direction in development of high productivity computing systems is discussed. 相似文献

15.

Optimizing Linpack Benchmark on GPU-Accelerated Petascale Supercomputer

下载免费PDF全文

王锋杨灿群杜云飞陈娟易会战徐炜遐《计算机科学技术学报》2011,26(5):854-865

In this paper we present the programming of the Linpack benchmark on TianHe-1 system,the first petascale supercomputer system of China,and the largest GPU-accelerated heterogeneous system ever attempted before.A hybrid programming model consisting of MPI,OpenMP and streaming computing is described to explore the task parallel,thread parallel and data parallel of the Linpack.We explain how we optimized the load distribution across the CPUs and GPUs using the two-level adaptive method and describe the implementation in details.To overcome the low-bandwidth between the CPU and GPU communication,we present a software pipelining technique to hide the communication overhead.Combined with other traditional optimizations,the Linpack we developed achieved 196.7 GFLOPS on a single compute element of TianHe-1.This result is 70.1% of the peak compute capability,3.3 times faster than the result by using the vendor’s library.On the full configuration of TianHe-1 our optimizations resulted in a Linpack performance of 0.563 PFLOPS,which made TianHe-1 the 5th fastest supercomputer on the Top500 list in November,2009. 相似文献

16.

Challenges and possible approaches: towards the petaflops computers

Depei Qian Danfeng Zhu 《Frontiers of Computer Science in China》2009,3(3):273-289

Roadrunner Hardware and Software Overview

查看详情

>>更多... 相似文献

17.

曙光5000芯片组系统级功能验证平台

下载免费PDF全文

刘涛王凯李晓民安学军《计算机工程与科学》2009,31(11)

曙光5000芯片组是曙光5000计算单元中的系统控制器,它通过HT接口连接两颗CPU并提供高速网络通信能力。为了确保曙光5000芯片组的功能正确性,我们为其设计了系统级功能验证平台SVP。SVP采用分层结构对系统进行建模,通过对本地计算单元的系统软件行为、硬件平台功能以及远程计算单元的网络行为进行模拟,提供了接近真实系统的验证环境。在曙光5000芯片组的验证过程中,SVP发现并排除了逻辑设计中的大多数功能错误,通过并行验证加速了验证覆盖率的收敛过程。相似文献

18.

高性能计算机曙光4000A的网格使能特征 总被引：1，自引：0，他引：1

孟丹孙凝晖徐志伟《计算机研究与发展》2004,41(12):2079-2087

网格计算的理想是实现基于Internet的资源共享和协同工作，是Internel．继WWW后的又一个发展浪潮．高性能计算机(超级服务器)是网格中主要的共享资源提供者，而网格也必将成为高性能计算机的主要应用环境．因此网格成为推动高性能计算机发展的一个重要因素，高性能计算机研制中必须考虑网格的需求，并提供必要的支持．曙光4000A是由中国科学院计算技术研究所最新研制的面向网格的高性能计算机，该系统的研制得到国家“八六三”高技术研究发展计划的支持，并作为中国国家网格的主节点部署在上海超级计算中心．详细论述了曙光4000A系统中主要的网格使能特征，这些特征从体系结构、系统硬件和软件方面对网格提供了支持，是从高性能计算机研制角度对网格使能技术进行的积极探索和尝试．相似文献

19.

复杂异构计算系统HPL的优化

黎雷生杨文浩马文静张娅赵慧赵海涛李会元孙家昶《软件学报》2021,32(8):2307-2318

当今世界的主流超级计算机越来越多地使用带有加速器的异构系统.随着加速器的浮点性能不断提高,超级计算机内计算节点的CPU、内存、总线、网络以及系统架构都要与之相适应.HPL（high performance Linpack）是高性能计算机评测的传统基准测试程序,复杂异构系统给HPL评测带来很多机遇与挑战.针对带有GPU的异构超级计算机系统,提出一套新的CPU与加速器计算任务分配方式,提出平衡点理论指导HPL性能优化.为了优化HPL程序,提出了使用CPU与加速器协同工作的look-ahead算法和行交换连续流水算法,实现了加速器、CPU、网络等部件的高度并行.此外,为带有加速器的系统设计了新的panel分解和行交换的实现方法,提高了加速器的利用率.在每个节点带有4个GPU的系统上,单节点HPL效率达到了79.51%. 相似文献

20.

国产异构架构系统上HPL的优化与分析

水超洋于献智王银山谭光明《软件学报》2020,31(7)

随着异构系统成为建造超级计算机的重要选择,如何让CPU与加速器协调工作以充分发挥异构系统的计算性能具有重要意义.HPL是高性能计算领域最重要的基准测试程序,传统面向纯CPU系统的HPL算法通过利用加速器加速矩阵乘法的做法已经无法取得很好的性能.针对这一问题,本文基于新的国产处理器-国产加速器异构系统提出了一个新的HPL性能模型,设计了一种全新的多线程细粒度异构HPL算法.我们完成了一个轻量级跨平台异构加速框架HPCX用来实现跨平台的HPL算法.我们的性能模型能够准确的预测类似异构系统的HPL性能,我们的多线程细粒度异构HPL算法在NVIDIA GPU平台上性能超过目前NVIDIA平台上性能最好的NVIDIA官方闭源nvhpl程序9%.在国产处理器-国产加速器平台512节点的规模上,我们的新HPL算法实现了2.3PFLOPS实测峰值性能和71.1%的浮点效率. 相似文献