首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Gromacs是一个大型的分子动力学模拟软件。国家超算济南中心拥有两套高性能计算系统,分别为基于Intel CPU构建的高性能计算集群(理论峰值超过100T),以及基于国产SW1600 CPU构建的MPP超级计算系统(理论峰值超过1P)。本文介绍了Gromacs软件包在两个高性能计算平台的移植、部署,并以生物大分子作为实例在两个平台上进行了分子动力学模拟测试。  相似文献   

2.
In recent years, heterogeneous systems and cooperative computing have become popular research directions in the field of high performance computing. With fast scaling of the size of high performance computer systems, problems such as power consumption and reliability come to the forefront. The aim of high performance computing has thus shifted from merely seeking peak performance to comprehensively pursuing high efficiency, which takes into consideration many factors including performance, cost, power, reliability and so on. A heterogeneous computing system consisting of general-purpose CPU(s) and special-purpose accelerator(s) features high performance, lower power consumption and low cost, etc. Hence, it has already become the mainstream in the field of high performance computing. However, such systems still face many challenges and problems, for example, programmability and reliability. In this paper, we firstly analyze the main challenges facing heterogeneous computing systems. Then, we introduce the architecture of the first petaflop computing system in China, the Tianhe-1 (TH-1) heterogeneous system, including its hardware/software interface and interconnect network. During development of the TH-1 system, several challenges were encountered; research into the solutions of these challenges is subsequently presented.  相似文献   

3.
安装在国家超级计算天津中心(以下简称天津超算中心)的"天河一号"超级计算机系统是目前世界上最快的超级计算机,已广泛应用于多个高性能计算领域,并取得了一系列具有国际影响力的应用成果。本文主要介绍了"天河一号"在石油勘探数据处理、生物信息与生物医药、环境科学、工程仿真、磁约束聚变领域的最新应用成果,其成果表明"天河一号"在上述领域具有良好的可扩展性和并行效率,对自主科技创新和产业技术提升给予了巨大支撑。  相似文献   

4.
本文首先对高性能计算和"天河一号"超级计算机进行了简要概述。随后详细描述了高性能计算在生命科学领域中的两大重要应用:一个是生物大分子的动力学模拟,介绍用户在"天河一号"取得的一些应用成果和进行的应用性能测试;另一个是生物信息学研究,重点介绍了华大基因在基于"天河一号"开展的GPU并行软件测试取得的良好结果。最后展望了高性能计算在生命科学领域中的发展趋势。  相似文献   

5.
在“天河一号”超级计算机上测试NEMO模式系统,针对NEMO模式在进行大量进程并发存储时所导致的整体应用性能下降问题,提出了并发进程分组输出优化方法。通过将大量并发存储的进程进行合理分组并排队输出,以解决大量进程同时读写文件时对存储资源的竞争所导致的存储效率下降问题。测试表明,在使用分组输出优化方法后,NEMO模式GYRE012算例的存储性能最高可提升33%以上,总体时间性能最高可提升28%以上。  相似文献   

6.
分子动力学数值模拟程序在现代高性能计算机上的计算效率往往很低,只能发挥系统峰值性能的几个百分点。本文对并行分子动力学程序PMD3D在联想深腾6800超级计算机上进行性能优化。通过性能分析,我们发现粒子相互作用力计算中相互关联的浮点运算严重影响了处理器的指令级并行效率,为此我们应用计算缓存的方法,将大量不规则的浮点计算进行缓存,达到一定规模后再进行向量化计算。这样使得单机性能在优化后提升4倍多,达到处理器峰值性能5.2GFlops的32.3%。最后,在深腾6800的64个节点的256个CPU上进行了并行性能测试,达到峰值运算性能1.3万亿次的27%。  相似文献   

7.
大规模集群上的并行计算软件需要具备处理部分节点、网络等失效的容错能力,也需要具有易于管理、维护、移植和可扩展的服务能力。针对星形计算模型,研究和开发了一套并行计算框架。利用调度节点内部的可变粒度分解器、相关队列等方法,实现了全系统容错,且具有较好的易用性、可移植性和可扩展性。系统目前可以实现300TFlops计算能力下连续运行超过150h,而且还具有进一步的可扩展能力。  相似文献   

8.
Complicated global climate problems trigger researchers from different scientific disciplines to link multiphysics simulations called models for integrated modeling of climate changes by using a software framework called earth system modeling (ESM). As its critical component, coupler is in charge of connections and interactions among models. With the advance of next-generation models, greater data transfer volume and higher coupling frequency are expected to put heavy performance burden on coupler. High efficient coupling techniques are required. In this paper, we propose the sub-domain mapping method to improve the parallel coupling consisted of data transfer and data transformation. By using one specific interpolation oriented communication routing, the communication operations that are originally decentralized in various steps can be combined together for execution. This can reduce the redundant communications and the entailed synchronization costs. The tests on the Tianhe-1A (TH-1A) supercomputer show that our method can achieve 1.1 to 4.9 fold performance improvements. We also present further optimization solution for the multi-interpolation cases. The test results show that our method can achieve up to 3.4 fold speedup over the original coupling execution of the current climate system.  相似文献   

9.
This paper proposes a formal approach to protocol performance testing based on the extended concurrent TTCN,To meet the needs of protocol performance testing,concurrent TTCN is extended,and the extended concurrent TTCN‘s operational semantics is defined in terms of Input-Output Labeled Transition System.An architecture design of protocol performance test system is described,and an example of test cases and its test result are given.  相似文献   

10.
The authors of this paper have previously proposed the global virtual data space system (GVDS) to aggregate the scattered and autonomous storage resources in China’s national supercomputer grid (National Supercomputing Center in Guangzhou, National Supercomputing Center in Jinan, National Supercomputing Center in Changsha, Shanghai Supercomputing Center, and Computer Network Information Center in Chinese Academy of Sciences) into a storage system that spans the wide area network (WAN), which realizes the unified management of global storage resources in China. At present, the GVDS has been successfully deployed in the China National Grid environment. However, when accessing and sharing remote data in the WAN, the GVDS will cause redundant transmission of data and waste a lot of network bandwidth resources. In this paper, we propose an edge cache system as a supplementary system of the GVDS to improve the performance of upper-level applications accessing and sharing remote data. Specifically, we first designs the architecture of the edge cache system, and then study the key technologies of this architecture: the edge cache index mechanism based on double-layer hashing, the edge cache replacement strategy based on the GDSF algorithm, the request routing based on consistent hashing method, and the cluster member maintenance method based on the SWIM protocol. The experimental results show that the edge cache system has successfully implemented the relevant operation functions (read, write, deletion, modification, etc.) and is compatible with the POSIX interface in terms of function. Further, it can greatly reduce the amount of data transmission and increase the data access bandwidth when the accessed file is located at the edge cache system in terms of performance, i.e., its performance is close to the performance of the network file system in the local area network (LAN).  相似文献   

11.
Intel 新一代处理器 KNL 作为一种具有极强运算能力的多核处理器,拥有 16GB 高速片上内存(MCDRAM),物理核心数量高达 72 个,单 CPU 的双精度浮点峰值为 3TFlops,为高并行负载应用提供强大的性能支持。各种主流的并行软件也纷纷使用 KNL 众核、高速内存技术,由于 LAMMPS (large-scale atomic/molecular massively parallel simulator) 在材料科学和计算化学中的广泛应用,因此在 KNL 节点上优化 LAMMPS 成为相关领域近些年的研究热点。本文以郑州超算中心的 KNL 集群为平台,采用 MCDRAM 和第三方扩展包两种方法对 LAMMPS 进行优化。MCDRAM 可以加快 CPU 读取数据的速度,第三方扩展包从源码的角度对程序中的条件判断进行优化。试验结果表明,优化后的 LAMMPS 执行时间明显减少,加速比可达 49x,是 CPU 平台加速比的 5.5x。  相似文献   

12.
An analysis of real-world operational data of Tianhe-1A(TH-1A)supercomputer system shows that chilled water data not only can reflect the status of a chiller system but also are related to supercomputer load.This study proposes AquaSee,a method that can predict the load and cooling system faults of supercomputers by using chilled water pressure and temperature data.This method is validated on the basis of real-world operational data of the TH-1A supercomputer system at the National Supercomputer Center in Tianjin.Datasets with various compositions are used to construct the prediction model,which is also established using different prediction sequence lengths.Experimental results show that the method that uses a combination of pressure and temperature data performs more effectively than that only consisting of either pressure or temperature data.The best inference sequence length is two points.Furthermore,an anomaly monitoring system is set up by using chilled water data to help engineers detect chiller system anomalies.  相似文献   

13.

Scientists at the Mississippi State University Diagnostic Instrumentation and Analysis Laboratory and the Idaho National Engineering and Environmental Laboratory (INEEL) have developed an expert system for a noninvasive characterization of containerized radiological waste. The characterization of the containers is necessary for determining their proper disposition. Three prototypes were developed, with each using a different method of handling uncertainty - a fuzzy system, a Bayesian network system, and a neural network system. The performance of each expert system was assessed to determine how well it modeled the decisions made by the INEEL domain expert. The prototype systems were also analyzed to measure the agreement in their decisions, the domain expert's decisions, and the decisions made by two additional experts. The neural network prototype was further analyzed to determine how consistent it was in its assessments. This paper describes the analysis of the performance of the three expert system prototypes.  相似文献   

14.
In this paper, we present the PolyMAX module which enhances network simulator 2 (ns-2), the most popular network simulator used in academia, to provide one of the most complete simulation tools to evaluate the performance of Mobile WiMAX networks. PolyMAX is based on the National Institute of Standards and Technology (NIST) module and our specific contributions consist on the design and implementation of the Quality of Service (QoS) classes and QoS management messages, the uplink access grant-request mechanisms, Adaptive Modulation and Coding, and a scheduler handling all five WiMAX QoS classes. We also present validation results for the different components of our module and typical WiMAX simulation scenarios illustrating its flexibility and some of its features. The PolyMAX module represents an important tool enabling researchers to easily implement their Mobile WiMAX scheduling and Adaptive Modulation and Coding (AMC) algorithms and accurately evaluate their performance for realistic scenarios.  相似文献   

15.
刘芳芳  杨超  袁欣辉  吴长茂  敖玉龙 《软件学报》2018,29(12):3921-3932
世界首台峰值性能超过100P的超级计算机——神威太湖之光已经研制完成,该超级计算机采用了国产申威异构众核处理器,该处理器不同于现有的纯CPU,CPU-MIC,CPU-GPU架构,采用了主-从核架构,单处理器峰值计算能力为3TFlops/s,访存带宽为130GB/s.稀疏矩阵向量乘SpMV(sparse matrix-vector multiplication)是科学与工程计算中的一个非常重要的核心函数,众所周知,其是带宽受限型的,且存在间接访存操作.国产申威处理器给稀疏矩阵向量乘的高效实现带来了很大的挑战.针对申威处理器提出了一种CSR格式SpMV操作的通用异构众核并行算法,该算法从任务划分、LDM空间划分方面进行精细设计,提出了一套动静态buffer的缓存机制以提升向量x的访存命中率,提出了一套动静态的任务调度方法以实现负载均衡.另外还分析了该算法中影响SpMV性能的几个关键因素,并开展了自适应优化,进一步提升了性能.采用Matrix Market矩阵集中具有代表性的16个稀疏矩阵进行了测试,相比主核版最高有10倍左右的加速,平均加速比为6.51.通过采用主核版CSR格式SpMV的访存量进行分析,测试矩阵最高可达该处理器实测带宽的86%,平均可达到47%.  相似文献   

16.
Mixture analysis is a necessary component for capturing sub-pixel heterogeneity in the characterization of land cover from remotely sensed images. Mixture analysis approaches in remote sensing vary from conventional linear mixture models to nonlinear neural network mixture models. Linear mixture models are fairly simple and generally result in poor mixture analysis accuracy. Neural network models can achieve much higher accuracy, but typically lack interpretability. In this paper we present a mixture discriminant analysis (MDA) model for inferring land cover fractions within forest stands from Landsat Thematic Mapper images. Specifically, individual class distributions are modeled as mixtures of subclasses of Gaussian distributions, and land cover fractions are estimated using the corresponding posterior probabilities. Compared to a benchmark study on accuracy of mixture models with Plumas National Forest data, this MDA model easily outperforms traditional linear mixture models and parallels the performance of the ARTMAP neural network mixture model. In other words, the MDA model is observed to successfully combine the performance characteristics of more complex neural network models (due to the nonlinear nature of its classification rules), with the ease of interpretation associated with linear mixture models (due to its relatively simple structure). MDA models therefore offer an attractive alternative for addressing the mixture modeling problem in remote sensing.  相似文献   

17.
Recent advances in the parallelizability of fast N-body algorithms, and the programmability of graphics processing units (GPUs) have opened a new path for particle based simulations. For the simulation of turbulence, vortex methods can now be considered as an interesting alternative to finite difference and spectral methods. The present study focuses on the efficient implementation of the fast multipole method and pseudo-particle method on a cluster of NVIDIA GeForce 8800 GT GPUs, and applies this to a vortex method calculation of homogeneous isotropic turbulence. The results of the present vortex method agree quantitatively with that of the reference calculation using a spectral method. We achieved a maximum speed of 7.48 TFlops using 64 GPUs, and the cost performance was near $9.4/GFlops. The calculation of the present vortex method on 64 GPUs took 4120 s, while the spectral method on 32 CPUs took 4910 s.  相似文献   

18.
We present a dense coding network based on continuous-variable graph state along with its corresponding protocol. A scheme to distill bipartite entanglement between two arbitrary modes in a graph state is provided in order to realize the dense coding network. We also analyze the capacity of network dense coding and provide a method to calculate its maximum mutual information. As an application, we analyze the performance of dense coding in a square lattice graph state network. The result showed that the mutual information of the dense coding is not largely affected by the complexity of the network. We conclude that the performance of dense coding network is very optimistic.  相似文献   

19.
The performance of the interprocessor communication architecture of the CM-2 is analyzed. A discrete-time Markov chain model of its network architecture is developed to compute the message delay introduced by the network architecture. Due to the synchronous time-division multiplexing nature of the network operation, it is amenable to a discrete-time Markov chain modeling. The analysis yields formulas for response time and several other related performance measures, showing how the performance of the network degrades with the message arrival rate and other parameters. Since the communication delays affect interprocess communication, knowledge of the sensitivity of the delays to the parameters can be a useful aid in designing a high performance parallel system. To keep the analysis tractable, an approximate Markov model is used that requires the use of fixed-point iteration for its solution. Validation of the results against a simulation study reveals that the analysis predicts the performance of the network with high accuracy  相似文献   

20.
An alternator is a network of concurrent processes, which satisfies the following conditions. (1) If one process executes the critical step, no neighbor of the process executes the critical step in the same computing step. (2) Along any infinite computing steps, each process executes the critical step infinitely often. (3) An alternator is self-stabilizing to the above conditions. An alternator is said to be 1-fair if condition (2) is changed as: A process can execute the critical step twice only if all other processes execute the critical step at least once. In this paper, we proposed an alternator for rings of odd size. The design has the snap property in the sense that it satisfies condition (1) even when transient faults occur. The alternator allows each process execute the critical step once every three steps when it stabilizes. The design is optimal 1-fair in the sense that no other 1-fair design can have better performance. Based on the above design, we fine-tune the alternator to achieve maximal performance. That is, our final alternator is a maximal alternator: a process is allowed to execute the critical step when both its two neighbors do not execute the critical step.Received: September 2001, Accepted: December 2002, This research was supported in part by the National Science Council of the Republic of China under the Contracts NSC 89-2213-E-007-140 and 90-2213-E008-054.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号