期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

雷秀丽张婷赵洋冯景华徐斌孟祥飞朱小谦《计算机工程与科学》2012,34(8):176-183

安装在国家超级计算天津中心(以下简称天津超算中心)的"天河一号"超级计算机系统是目前世界上最快的超级计算机,已广泛应用于多个高性能计算领域,并取得了一系列具有国际影响力的应用成果。本文主要介绍了"天河一号"在石油勘探数据处理、生物信息与生物医药、环境科学、工程仿真、磁约束聚变领域的最新应用成果,其成果表明"天河一号"在上述领域具有良好的可扩展性和并行效率,对自主科技创新和产业技术提升给予了巨大支撑。相似文献

2.

“天河一号”在生命科学研究中的应用

菅晓东李扬冯景华孟祥飞朱小谦《计算机工程与科学》2012,34(8):171-175

本文首先对高性能计算和"天河一号"超级计算机进行了简要概述。随后详细描述了高性能计算在生命科学领域中的两大重要应用:一个是生物大分子的动力学模拟,介绍用户在"天河一号"取得的一些应用成果和进行的应用性能测试;另一个是生物信息学研究,重点介绍了华大基因在基于"天河一号"开展的GPU并行软件测试取得的良好结果。最后展望了高性能计算在生命科学领域中的发展趋势。相似文献

3.

基于内存的临时文件中转系统研究与设计

《计算机应用与软件》2018,(4)

由于超快的CPU内核与相对慢的存储器之间的速度差距在日渐增大,存储器系统可能成为当前制约系统性能的主要瓶颈。"天河二号"等超级计算机的问世,为需要大规模计算的软件系统提供了强大的计算能力支撑。但是对于那些会产生大量的临时小文件的系统,天河的集中式存储会严重影响到整体性能。为了解决上述问题,提出利用Linux消息队列、Socket网络编程、文件I/O等技术研究实现一个基于内存的临时文件中转系统。并且在"天河二号"超级计算机的漏洞挖掘系统上进行了应用对比测试,结果良好。这对改善"天河二号"超级计算机集中式存储的瓶颈具有指导意义。相似文献

4.

我国首台千万亿次超级计算机安装完成

《办公自动化》2010,(21):62

据新华社电我国首台千万亿次超级计算机"天河一号"日前在国家超级计算天津中心完成全部设备安装。完成系统调试后,将作为我国第一个具有千万亿次计算能力的超算中心,面向社会提供超级计算服务。超级计算机代表着当代信息技术的最高水平,是一国科技实力的重要标志。"天河一号"是我国国防科技大学自主研制的首台千万亿次超级计算机系统,标志着我国成为继美国之后,第二个能够研制千万亿次超级计算机的国家。在全球前相似文献

5.

天河二号运算1小时相当13亿人用计算器算1000年

贾鹏《计算机与网络》2014,(13):18-19

<正>近日,在德国莱比锡召开的2013国际超级计算大会上,世界超级计算机TOP500组织正式发布了第41届世界超级计算机500强排名榜,国防科技大学研制的天河二号超级计算机,以峰值计算速度每秒5.49亿亿次、持续计算速度每秒3.39亿亿次双精度浮点运算的优异性能位居榜首。中国超级计算机继天河一号之后,第二次登上了世界第一的宝座。天河二号成为当今世界运算速度最快、综合技术领相似文献

6.

千万亿次计算背后的秘密透过天河一号看超级计算机技术

韩歌民《微型计算机》2010,(3):118-125

在去年10月底,长沙举办的中国高性能计算学术年会上,国防科技大学研制的千万亿次超级计算机“天河号”成为焦点,这是我国国内计算能力最高的超级计算机．而且标志着我国超级计算机的研发能力成功实现了千万亿次计算的跨越。相似文献

7.

千万亿次超级计算机系统——“天河一号”研制成功

《电脑开发与应用》2009,22(12)

我国首台千万亿次超级计算机第统—"天河一号"近日由国防科学技术大学研制成功.天河一号24小时的工作量,如果用现在最先进的双核高性能个人PC机来操作,需要整整160年才能完成. 相似文献

8.

基于光滑聚集代数多重网格的有限元并行计算实现方法

武立伟张健飞张倩《计算机辅助工程》2017,26(6):16-22

基于光滑聚集代数多重网格法实现一种用于结构有限元并行计算的预条件共轭梯度求解方法。对计算区域进行均匀划分,将这些子区域分配给各个进程同时进行单元刚度矩阵的计算,并组合形成分布式存储的整体平衡方程。采用光滑聚集代数多重网格预条件共轭梯度法对整体平衡方程进行并行求解,在天河二号超级计算机上进行数值试验,分析代数多重网格的主要参数对算法性能的影响,测试程序的并行计算性能。试验结果表明该方法具有较好的并行性能和可扩展性,适合于大规模实际应用。相似文献

9.

高阶精度CFD应用在天河2系统上的异构并行模拟与性能优化

王勇献张理论车永刚徐传福刘巍程兴华《计算机研究与发展》2015,(4):833-842

在当前主流的众核异构高性能计算机平台上开展超大规模计算流体力学（computational fluid dynamics ,CFD）应用的高效并行数值模拟仍然面临着一系列挑战性技术问题,也是该领域的热点研究问题之一．面向天河2高性能异构并行计算平台,针对高阶精度C FD流场数值模拟程序的高效并行进行了探索,重点讨论了C FD应用特点与众核异构高性能计算机平台特征相适应的性能优化策略,从任务分解、并行度挖掘、多线程优化、SIMD向量化、CPU与加速器协同优化等方面,提出一系列性能提升技术．通过在天河2高性能异构并行计算平台上进行了多个算例的数值模拟,模拟的最大C FD规模达到1228亿个网格点,共使用约59万C P U＋M IC处理器核,测试结果表明移植优化后的程序性能提高2．6倍左右,且具有良好的可扩展性．相似文献

10.

基于Python的大规模高性能LBM多相流模拟

徐传福王曦刘舒陈世钊林玉《计算机科学》2020,47(1):17-23

Python由于具有丰富的第三方库、开发高效等优点,已成为数据科学、智能科学等应用领域最流行的编程语言之一。Python强调了对科学与工程计算的支持,目前已积累了丰富的科学与工程计算库和工具。例如,SciPy和NumPy等数学库提供了高效的多维数组操作及丰富的数值计算功能。以往,Python主要作为脚本语言,起到连接数值模拟前处理、求解器和后处理的“胶水”功能,以提升数值模拟的自动化处理水平。近年来,国外已有学者尝试采用Python代码实现求解计算功能,并在高性能计算机上开展了超大规模并行计算研究,取得了不错的效果。由于自身特点,高效大规模Python数值模拟的实现和性能优化与传统基于C/C++和Fortran的数值模拟等具有很大的不同。文中实现了国际上首个完全基于Python的大规模并行三维格子玻尔兹曼多相流模拟代码PyLBMFlow,探索了Python大规模高性能计算和性能优化方法。首先,利用NumPy多维数组和通用函数设计实现了LBM流场数据结构和典型计算内核,通过一系列性能优化并对LBM边界处理算法进行重构,大幅提升了Python的计算效率,相对于基准实现,优化后的串行性能提升了两个量级。在此基础上,采用三维流场区域分解方法,基于mpi4py和Cython实现了MPI+OpenMP混合并行;在天河二号超级计算机上成功模拟了基于D3Q19离散方法和Shan-Chen BGK碰撞模型的气液两相流,算例规模达百亿网格,并行规模达1024个结点,并行效率超过90%。相似文献

11.

Hybrid hierarchy storage system in MilkyWay-2 supercomputer

Weixia XU Yutong LU Qiong LI Enqiang ZHOU Zhenlong SONG Yong DONG Wei ZHANG Dengping WEI Xiaoming ZHANG Haitao CHEN Jianying XING Yuan YUAN 《Frontiers of Computer Science》2014,8(3):367-377

With the rapid improvement of computation capability in high performance supercomputer system, the imbalance of performance between computation subsystem and storage subsystem has become more and more serious, especially when various big data are produced ranging from tens of gigabytes up to terabytes. To reduce this gap, large-scale storage systems need to be designed and implemented with high performance and scalability.MilkyWay-2 (TH-2) supercomputer system with peak performance 54.9 Pflops, definitely has this kind of requirement for storage system. This paper mainly introduces the storage system in MilkyWay-2 supercomputer, including the hardware architecture and the parallel file system. The storage system in MilkyWay-2 supercomputer exploits a novel hybrid hierarchy storage architecture to enable high scalability of I/O clients, I/O bandwidth and storage capacity. To fit this architecture, a user level virtualized file system, named H²FS, is designed and implemented which can cooperate local storage and shared storage together into a dynamic single namespace to optimize I/O performance in IO-intensive applications. The evaluation results show that the storage system in MilkyWay-2 supercomputer can satisfy the critical requirements in large scale supercomputer, such as performance and scalability. 相似文献

12.

适用于异构集群的混合并行流线生成系统

刘俊高阳单桂华迟学斌《计算机系统应用》2021,30(3):60-69

流线是流场可视化的主要方法之一,而针对大规模流场的流线生成由于计算量大往往需要采用高性能计算机这样的并行计算环境结合并行化算法以实现计算加速.在当前异构计算系统越来越普遍的情况下,为了充分利用并行异构计算环境的计算能力,实现更高效的并行流线生成,本文采用了基于数据并行原语结合分布式消息通讯的技术架构,设计了一套适用于异构集群的混合并行流线生成系统,并在此基础上针对数据分块、数据冗余化及进程通讯策略等方面进行设计,提出并实现了一套并行粒子追踪算法.该系统被部署于国产超算平台上,并针对大规模CFD流场模拟结果数据可视化应用开展了实验.本文给出了相关实验结果,分析了核心并行算法的速度性能、可扩展性以及负载均衡等方面情况,说明了系统及算法的有效性和可扩展性. 相似文献

13.

大规模并行时域有限差分法电磁计算研究

江树刚张玉赵勋旺《数据与计算发展前沿》2015,6(4):29-38

基于我国超级计算机平台,开展了大规模并行时域有限差分法(Finite-Difference Time-DomainFDTD)的性能和应用研究。在我国首台百万亿次"魔方"超级计算机、具有国产CPU的"神威蓝光"超级计算机和当前排名世界第一的"天河二号"超级计算机上就并行FDTD方法的并行性能进行了测试,并分别突破了10000 CPU核,100000 CPU核和300000 CPU核的并行规模。在不同测试规模下,该算法的并行效率均达到了50%以上,表明了本文并行算法具有良好的可扩展性。通过仿真分析多个微带天线阵的辐射特性和某大型飞机的散射特性,表明本文方法可以在不同架构的超级计算机上对复杂电磁问题进行精确高效电磁仿真。相似文献

14.

On the energy footprint of I/O management in Exascale HPC systems

《Future Generation Computer Systems》2016

The advent of unprecedentedly scalable yet energy hungry Exascale supercomputers poses a major challenge in sustaining a high performance-per-watt ratio. With I/O management acquiring a crucial role in supporting scientific simulations, various I/O management approaches have been proposed to achieve high performance and scalability. However, the details of how these approaches affect energy consumption have not been studied yet. Therefore, this paper aims to explore how much energy a supercomputer consumes while running scientific simulations when adopting various I/O management approaches. In particular, we closely examine three radically different I/O schemes including time partitioning, dedicated cores, and dedicated nodes. To do so, we implement the three approaches within the Damaris I/O middleware and perform extensive experiments with one of the target HPC applications of the Blue Waters sustained-petaflop supercomputer project: the CM1 atmospheric model. Our experimental results obtained on the French Grid’5000 platform highlight the differences among these three approaches and illustrate in which way various configurations of the application and of the system can impact performance and energy consumption. Moreover, we propose and validate a mathematical model that estimates the energy consumption of a HPC simulation under different I/O approaches. Our proposed model gives hints to pre-select the most energy-efficient I/O approach for a particular simulation on a particular HPC system and therefore provides a step towards energy-efficient HPC simulations in Exascale systems. To the best of our knowledge, our work provides the first in-depth look into the energy-performance tradeoffs of I/O management approaches. 相似文献

15.

A parallel two-level method for simulating blood flows in branching arteries with the resistive boundary condition

Yuqi Wu Xiao-Chuan Cai 《Computers & Fluids》2011,45(1):92-102

Computer modeling of blood flows in the arteries is an important and very challenging problem. In order to understand, computationally, the sophisticated hemodynamics in the arteries, it is essential to couple the fluid flow and the elastic wall structure effectively and specify physiologically realistic boundary conditions. The computation is expensive and the parallel scalability of the solution algorithm is a key issue of the simulation. In this paper, we introduce and study a parallel two-level Newton–Krylov–Schwarz method for simulating blood flows in compliant branching arteries by using a fully coupled system of linear elasticity equation and incompressible Navier–Stokes equations with the resistive boundary condition. We first focus on the accuracy of the resistive boundary condition by comparing it with the standard pressure type boundary condition. We then show the parallel scalability results of the two-level approach obtained on a supercomputer with a large number of processors and on problems with millions of unknowns. 相似文献

16.

Performance of the 3D FFT on the 6D network torus QCDOC parallel supercomputer

Bin Fang Glenn Martyna 《Computer Physics Communications》2007,176(8):531-538

QCDOC is a massively parallel supercomputer with tens of thousands of nodes distributed on a six-dimensional torus network. The 6D structure of the network provides the needed communication resources for many communication-intensive applications. In this paper, we present a parallel algorithm for three-dimensional Fast Fourier Transform and its implementation for a 4096-node QCDOC prototype. Two techniques have been used to increase its parallel performance: simultaneous multi-dimensional communication and communication-and-computation overlapping. Benchmarking experiments suggest that 3D FFTs of size 128×128×128 can scale well on such platforms up to 4096 nodes. Our performance results suggest stronger scalability on QCDOC than on IBM BlueGene/L supercomputer. 相似文献

17.

一种求解汽车外流场问题的可扩展数值算法

闫争争陈荣亮赵宇波蔡小川《集成技术》2015,4(1):25-36

受外型复杂、雷诺数高等因素影响,汽车外流场流动的数值计算规模巨大且难以精确求解。发展高效并行算法以利用超级计算平台资源来数值求解外流问题成为该领域的研究热点。文章提出一种全隐格式的可扩展并行Newton-Krylov-Schwarz算法对某真实汽车的外流场流动问题进行计算。通过与风洞试验以及主流计算流体力学软件的计算结果对比验证了算法的正确性。并行数值计算结果显示,文章的算法在数千处理器规模下仍具有很好的并行可扩展性。相似文献

18.

基于关联规则的分布式通信网告警相关性研究 总被引：3，自引：0，他引：3

吴简李兴明《计算机科学》2009,36(11):204-207

描述了基于数据挖掘的通信网告警相关性分析.在分布式数据库中直接运用序列算法效率很低,因为这需要大量的额外通信.为此提出了一种有效的分布式关联规则挖掘算法--EDMA,它通过局部剪枝与全局剪枝来最小化候选项集数目和通信量.在局部站点上运用先进的压缩关联矩阵CMatrix统计局部项集支持数.此外还利用项目剪枝与交易剪枝共同来减少扫描时间.最后仿真验证了EDMA比其他经典分布式算法有更高的运算效率、更低的通信开销以及更好的可扩展性. 相似文献

19.

Coupling multibody dynamics and computational fluid dynamics on 8192 processor cores

J. Götz K. Iglberger C. Feichtinger S. Donath U. Rüde 《Parallel Computing》2010,36(2-3):142-151

This paper describes a method for the fully resolved simulation of particle laden flows. For this purpose, we discuss the parallelization of large scale coupled fluid structure interaction with up to 37 million geometrically modeled moving objects incorporated in the flow. The simulation is performed using a 3D lattice Boltzmann solver for the fluid flow and a so-called rigid body physics engine for the treatment of the objects. The numerical algorithms and the parallelization are discussed in detail. Furthermore, performance results are presented for test cases on up to 8192 processor cores running on an SGI Altix supercomputer. The approach enables a detailed simulation of large scale particulate flows that are relevant for many industrial applications. 相似文献