期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

华信神威数值气象预报业务系统落户无锡市气象局业务中心——高性能计算应用走向“百姓家”

《计算机与网络》2004,(18):26-26

2004年9月9日，无锡市委、市政府在无锡市气象局举行交接启用仪式，国家并行计算机工程技术研究中心为无锡市气象局定制的“神威新世纪-16P集群计算机系统”正式移交启用，国家并行中心和华云信息技术工程公司联合开发的“华信神威数值气象预报业务系统”也在无锡市气象局业务中心投入试用，代表高新科技水平的高性能计算机再一次走人了“寻常百姓家”。相似文献

2.

全局文件系统的设计与实现

蒋金虎陈左宁黄文政《计算机工程》2005,31(1):71-72,F003

全局文件系统(GFS)是大规模并行处理计算机系统和机群系统的关键技术。该文系统地介绍了一种新型的全局文件系统在“神威”计算机系统上的设计和实现,重点讨论了系统采用的一系列高性能和高可靠措施。最后提出了对下一代全局文件系统的展望。相似文献

3.

基于用户数据包协议(UDP)的实时集群节点控制与实现 总被引：4，自引：1，他引：4

向建军左继章白欣《计算机工程与应用》2002,38(19):48-50

集群计算机技术是当今高性能并行计算机系统中的一个研究热点。文章基于用户数据包传输协议(UDP),有效地对集群系统互为信任关系各个节点进行实时控制,实现了实时集群的单一系统映像,并采用通用商业化部件构建了实时集群计算机系统,拓展了集群计算机的实时应用领域。相似文献

4.

神威太湖之光可靠性及可用性设计与分析

高剑刚胡晋龚道永方燕飞刘骁何王全金利峰郑方李宏亮《计算机研究与发展》2021,58(12):2696-2707

随着系统规模与集成度的快速增加,可靠性与可用性问题成为构建E级计算机系统所面临的重大挑战.针对神威太湖之光超级计算机可靠性与可用性设计与实现开展全面的分析.首先概要描述神威太湖之光超级计算机系统结构.其次,系统提出神威太湖之光超级计算机可靠性增强技术以及故障预测、主动迁移、任务局部降级等主被动容错技术,建立神威太湖之光超级计算机多层次主被动协同的容错系统.再次,根据系统故障统计信息,分析失效分布及主要失效来源,结合指数、对数正态与韦布尔3种典型寿命周期分布,对神威太湖之光系统故障间隔时间分布进行数据拟合分析.最大似然估计与K-S(Kolmogorov Smirnov)检验结果表明,对数正态分布与系统失效经验数据取得了最好的拟合度,建立神威太湖之光系统失效分布模型,并计算得出系统平均无故障时间.通过系统运行统计与实际应用测试,分析了故障预测精确度以及主动迁移、局部降低等容错技术的时间开销与容错效果.最后,在神威太湖之光超级计算机可靠性与可用性分析的基础上,提出E级计算机系统高可靠与高可用技术发展建议. 相似文献

5.

利用PBS搭建可自由扩展高性能集群研究

赵广鹏冯冰洁《计算机光盘软件与应用》2013,(1):123+135

目前国内流行的高性能计算集群一般都是固定的资源,对于扩展都是固定的采购机器。目前各个高校机房的空闲时间都很多,存在大量计算资源浪费的情况。论文通过对PBS系统进行分析,了解其内部原理及实现方式,对空闲计算资源加以整合利用,依托河南理工大学高性能计算平台进行深入研究和探讨。相似文献

6.

基于RapidIO的高性能通信接口的设计与实现 总被引：1，自引：0，他引：1

梁基金亨科徐炜民郑衍衡沈文枫《计算机应用与软件》2009,26(7):43-45

在高性能计算领域中,集群系统对于高速互连网络的要求越来越高.通过研究将RapidIO用于集群系统的高速互连网络,设计并且实现了基于RapidIO的高性能通信接口,该接口针对MPI高度优化,并且充分利用RapidIO的硬件优势,具有基于优先级的多流、可靠有序的数据包传输等特点,并且使用了空闲缓存池提高性能.实验数据表明,在带宽和延迟上,该专用接口都比原来的接口有优势. 相似文献

7.

集群系统的现状与挑战 总被引：8，自引：0，他引：8

郑纬民《计算机教育》2004,(6):23-23

一、集群系统成为构建高性能计算系统的主流方式由于具有低成本、高性能和良好的可扩展性,集群系统已经日益成为构建高性能计算系统的主要方式。从2003年11月发布的TOP 500的情况来看,以系统个数计算,集群系统占41.6%;从Linpack性能来看,集群系统占TOP 500的49.8%。在前10名的系统中,集群系统更是占到了7个。由此可见,在构建超大规模计算系统时,集群系统已经成为主流。集群系统的一个重要特点是尽量使用商用部件以降低成本。用来构建集群系统的各个部件,包括计算结点和通信网络,都可以在市场上很方便地得到而无需专门定制。而使用开放源… 相似文献

8.

基于集群的图像并行处理技术的研究

李学锋《电脑开发与应用》2007,20(5):13-15

介绍了集群计算机系统和PVM,通过分析集群计算机系统的特征,提出基于集群计算机系统的图像并行处理系统,并分析了该系统的结构、基本功能与关键技术。该系统能提供更好的图像处理性能,具有较好的实际应用价值。相似文献

9.

基于任务负载监测的高性能集群节点启停机制*

曹宗雁曹荣强戴志辉朱鹏迟学斌《计算机应用研究》2011,28(12):4663-4665

对高性能计算集群在运行过程中如何通过关闭闲置节点来实现有效节能的问题进行了研究和探讨,设计和实现了基于任务负载量统计监测的节点启停机制.根据对系统中作业运行和排队情况的记录和分析,通过参数估计设计了反映队列任务情况的负载因子,并围绕负载因子制定具体策略,结合作业系统的队列设置和资源分配规则,对集群中的空闲节点进行自动启停控制.模拟实验表明,基于任务负载监测的节点启停机制能够有效地自动启停系统中闲置的节点,从而降低系统功耗,并且对系统中作业的整体完成时间基本不造成影响. 相似文献

10.

面向节点异构GPU集群的能量有效调度方案

《计算机应用与软件》2013,(3)

GPU集群已经成为高性能计算(HPC)领域的主流组件。随着处理单元的发展和集群节点的拓展,GPU集群将在节点层面趋于异构化。提出一套针对异构任务在节点异构GPU集群上的能量有效调度方案。形式化地描述其任务和资源模型以及能耗评估模型。通过特定的节点选择策略,减少空闲状态的能耗损失。通过任务类型划分和组合分配以及DVFS,增加CPU资源利用率。该方案从系统层面着手,能够与现有的算法和指令层面的优化方法兼容。相似文献

11.

OSCAR集群技术

王璟张云泉《计算机工程与设计》2004,25(11):1872-1875

集群系统是目前最广泛被采用的高性能计算机系统解决方案。安装一个高性能计算集群需要多个节点协同安装和配置，这对于多达几百台计算机节点的集群系统常常是很麻烦的过程。目前最受欢迎的集群系统安装软件包”OSCAR(Open Source Cluster Application Resource)”很好地解决了这个问题。详细介绍了OSCAR的各个功能部件的工作方式和使用方法，并概括了OSCAR集群的安装流程。相似文献

12.

高性能计算机的可靠性技术现状与趋势 总被引：5，自引：0，他引：5

黄永勤金利峰刘耀《计算机研究与发展》2010,47(4)

随着高性能计算机系统性能的不断提升和硬件规模的不断扩大,如何实现系统的可靠运行,是高性能计算机尤其是P级计算机研制中面临的重要技术挑战.从高性能计算机对可靠性技术的需求出发,全面介绍了高性能计算机硬件设计中的可靠性技术现状,包括避错、静态冗余、动态冗余和在线替换等技术,详细分析了各种可靠性技术在典型机器中的应用情况;最后对高性能计算机可靠性技术的发展趋势进行了深入探讨,包括多核处理器的可靠性设计、全方位的内存防护技术和刀片式的冗余架构. 相似文献

13.

HPC机群分布式强制访问控制技术可行性研究

霍建同李云春杨秀梅《计算机科学与探索》2014,(5):543-549

高性能计算（high performance computing,HPC）机群具有单一系统和分布式系统的双重特点,从而对机群的安全性提出了新的挑战。根据高性能计算机群的安全现状和需求,提出了一种适用于高性能计算机群的分布式强制访问控制模型;根据该模型设计了一个基于单节点的强制访问控制系统SE Linux,实现了高性能计算机群访问控制系统框架,并搭建了一个原型系统。最后,对高性能计算机群强制访问控制技术的可行性进行了分析和验证。分析结果表明,高性能计算机群分布式强制访问控制技术在功能上能够满足高性能计算机群的安全需求,对系统的计算和带宽的消耗也在可接受的范围内。相似文献

14.

基于申威众核处理器的混合并行遗传算法

赵瑞祥郑凯刘垚王肃刘艳沈焕学周谦豪《计算机应用》2017,37(9):2518-2523

传统遗传算法求解计算密集型任务时,适应度函数的执行时间增加相当快,致使当种群规模或者进化代数增大时,算法的收敛速度非常缓慢。基于此,设计了"粗粒度-主从式"混合式并行遗传算法（HBPGA）,并在目前TOP500上排名第一的超级计算机神威"太湖之光"平台上实现。该算法模型采用两级并行架构,结合了MPI和Athread两种编程模型,与传统在单核或者一级并行构架的多核集群上实现的遗传算法相比,在申威众核处理器上实现了二级并行,并得到了更好的性能和更高的加速比。实验中,当从核数为16×64时,最大加速比达到544,从核加速比超过31。相似文献

15.

Job migration in HPC clusters by means of checkpoint/restart

Rodríguez-Pascual Manuel Cao Jiajun Moríñigo José A. Cooperman Gene Mayo-García Rafael 《The Journal of supercomputing》2019,75(10):6517-6541

Until now, jobs running on HPC clusters were tied to the node where their execution started. We have removed that limitation by integrating a user-level checkpoint/restart library into a resource manager, fully transparent to both the user and running application. This opens the door to a whole new set of tools and scheduling possibilities based on the fact that jobs can be migrated, checkpointed, and restarted on a different place or in a different moment, while providing fault tolerance for every job running on the cluster. This is of utmost importance in the future generation of exascale HPC clusters, where an increasing degree and complexities of efficient scheduling make it challenging to obtain the required degree of parallelism demanded by the applications.

相似文献

16.

曙光高性能计算机在数值预报模式中的应用

王俊超彭涛冯光柳《计算机技术与发展》2014,(10):178-181

文中首先介绍了中国气象局武汉暴雨研究所高性能计算机应用现状和目前的模式业务系统,针对气象预报模式精细化对计算能力的更高需求,中国气象局武汉暴雨研究所采用曙光高性能计算机集群对原有集群系统进行升级,升级后的计算节点CPU可提供11.40 TFlops的双精浮点计算能力;其次,讨论了升级后的高性能计算机几个关键技术的现状并对未来进行展望;最后以WRF模式为例,对升级后的高性能计算机的性能进行了分析,得到了较好的加速比。结果表明：新升级的集群系统将大大节省区域高分辨数值预报模式运算时间,有助于提高科研成果的转化效率。相似文献

17.

Tibidabo: Making the case for an ARM-based HPC system

《Future Generation Computer Systems》2014

It is widely accepted that future HPC systems will be limited by their power consumption. Current HPC systems are built from commodity server processors, designed over years to achieve maximum performance, with energy efficiency being an after-thought. In this paper we advocate a different approach: building HPC systems from low-power embedded and mobile technology parts, over time designed for maximum energy efficiency, which now show promise for competitive performance.We introduce the architecture of Tibidabo, the first large-scale HPC cluster built from ARM multicore chips, and a detailed performance and energy efficiency evaluation. We present the lessons learned for the design and improvement in energy efficiency of future HPC systems based on such low-power cores. Based on our experience with the prototype, we perform simulations to show that a theoretical cluster of 16-core ARM Cortex-A15 chips would increase the energy efficiency of our cluster by 8.7×, reaching an energy efficiency of 1046 MFLOPS/W. 相似文献

18.

An energy management system for cluster infrastructures

Carlos de Alfonso Miguel Caballer Fernando Alvarruiz Vicente Hernández 《Computers & Electrical Engineering》2013

This paper presents a general energy management system for High Performance Computing (HPC) clusters and cloud infrastructures that powers off cluster nodes when they are not being used, and conversely powers them on when they are needed. This system can be integrated with different HPC cluster middleware, such as Batch-Queuing Systems or Cloud Management Systems, and can also use different mechanisms for powering on and off the computing nodes. The presented system makes it possible to implement different energy-saving policies depending on the priorities and particularities of the cluster. It also provides a hook system to extend the functionality, and a sensor system in order to take into account environmental information. 相似文献

19.

PMSS: A programmable memory system and scheduler for complex memory patterns

Tassadaq Hussain Amna Haider Eduard Ayguadé 《Journal of Parallel and Distributed Computing》2014

HPC industry demands more computing units on FPGAs, to enhance the performance by using task/data parallelism. FPGAs can provide its ultimate performance on certain kernels by customizing the hardware for the applications. However, applications are getting more complex, with multiple kernels and complex data arrangements, generating overhead while scheduling/managing system resources. Due to this reason all classes of multi threaded machines–minicomputer to supercomputer–require to have efficient hardware scheduler and memory manager that improves the effective bandwidth and latency of the DRAM main memory. This architecture could be a very competitive choice for supercomputing systems that meets the demand of parallelism for HPC benchmarks. In this article, we proposed a Programmable Memory System and Scheduler (PMSS), which provides high speed complex data access pattern to the multi threaded architecture. This proposed PMSS system is implemented and tested on a Xilinx ML505 evaluation FPGA board. The performance of the system is compared with a microprocessor based system that has been integrated with the Xilkernel operating system. Results show that the modified PMSS based multi-accelerator system consumes 50% less hardware resources, 32% less on-chip power and achieves approximately a 19x speedup compared to the MicroBlaze based system. 相似文献