首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
高性能计算中,硬件支持的多播操作对应用程序性能具有至关重要的影响.Infiniband网络中现有的两类多播路由算法中,MINIHOP-MC未考虑路由均衡性问题,导致链路edge forwarding index(EFI)指数很大,严重影响多播消息性能;SSSP-MC虽然部分考虑了路由均衡性问题,但其运行时间很长,不能满...  相似文献   

2.
随着多核处理器的发展和计算需求的不断增长,高性能计算系统规模不断增大.使用模拟器对高性能计算系统进行模拟,对系统设计及优化有着重要的作用,互连网络模拟则是其中不可或缺的一部分.设计实现了一种基于OM Net++的大规模InfiniBand互连网络模拟系统,该系统通过记录的并行程序M PI消息来驱动网络仿真过程,可以模拟...  相似文献   

3.
介绍一种新型的高性能计算机专用网络拓扑结构MPU,包括其数学模型、技术实现、路由算法等.从理论上分析了MPU的性能,并就其性能与目前先进的3-D Torus网络等进行了数学上的对比分析,MPU的大部分性能指标均优于3-D Torus网络.本文还实现了一个为MPU开发的大型并行模拟器MPUS,介绍了MPUS的架构、实现、工作流程等,最后给出了仿真结果.实验证明,MPU设计正确,且MPUS具有良好的扩放性.  相似文献   

4.
5.
6.
基于星形互连网络的并行快速傅立叶变换算法   总被引:6,自引:0,他引:6  
星形互连网络是一种易于实现大规模并行计算的互连网络拓扑结构。利用星形互连网络的递归可分解性的多样性,提出了一种基于星形互连网络的并行快速傅立叶变换算法的实现方法。该方法能够有效地减少计算过程中处理器结点之间的通信开销。提出的星图结点和数据的映射应运 及实现并行FFT的思想可推广到线性方程组求解、矩阵乘法等其它并行算法在星形互连网络上的实现。  相似文献   

7.
大规模并行处理系统互连通信的新技术研究   总被引:2,自引:0,他引:2  
本文综述了大规模并行处理系统研究了工作的概况,指出其研究热点和关键技术是实现高效的互连通信。文中重点介绍了该领域的研究内容:结点结构、网络接口、切换技术,拓扑结构,路由算法,通信机制,通信协议,计算模型等。  相似文献   

8.
多级互连网络互连函数的矩阵理论   总被引:3,自引:1,他引:3  
多级互连网网络是大规模并行处理系统和大型ATM交换机采用的主要互连结构。  相似文献   

9.
网络互连     
  相似文献   

10.
基于混合petri网的网络建模与仿真   总被引:1,自引:0,他引:1  
该文基于网络交换节点传输特性的辨识模型,提出了一种基于混合petri网的节点仿真模型,以及利用该模型进行网络建模的方法。分析了交换节点混合petri网模型的仿真精度,给出了保证信息量守恒的仿真条件。对于不能消除的误差,分析了其在仿真过程中的动态范围,得出了模型能够自动补偿的结论。通过建立一个网络的混合petri网模型,并利用visual object net2.0进行了信息传输过程仿真。仿真结果说明该网络模型能够很好地模拟信息传输的动态过程,能够体现网络交换节点的非线性传输特性。  相似文献   

11.
Sep: A Fixed Degree Regular Network for Massively Parallel Systems   总被引:4,自引:0,他引:4  
We propose a family of regular Cayley network graphs of degree three based on permutation groups for design of massively parallel systems. These graphs are shown to be based on the shuffle exchange operations, to have logarithmic diameter in the number of vertices, and to be maximally fault tolerant. We investigate different algebraic properties of these networks (including fault tolerance) and propose a simple routing algorithm. These graphs are shown to be able to efficiently simulate or embed other permutation group based graphs; thus they seem to be very attractive for VLSI implementation and for applications requiring bounded number of I/O ports as well as to run existing applications for other permutation group based network architectures.  相似文献   

12.
In this paper, we consider a massively parallel system that is composed of heterogeneous processors, that is, processors with different processing power, and that combines the advantages of the SIMD and MIMD architectures. The heterogeneous mixed-mode (HeMM) execution model is composed of two main components, which operate in the well-known SIMD and MIMD paradigms. The main computing power comes from a component that is composed of a massive number of processors and operates in a data parallel manner. The other component is composed of a few (or even one) fast processors which operate in the MIMD paradigm. The operation of a small number of processors in an MIMD paradigm has been well demonstrated through actual systems. The processors in this component add flexibility to the execution of the parallel programs such that it adjusts to the changing parallelism of the program to enhance the performance. Based on this execution model we analyze the gains in performance that is obtainable by this new system. We show that substantial performance gains can be obtained by using the HeMM system.  相似文献   

13.
A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched networks that are controlled by a randomized-routing policy that includes trunk-reservation. A single instruction multiple data implementation is described and corresponding experiments on a 16384 processor MasPar parallel computer are reported. A multiple instruction multiple data implementation is also described and corresponding experiments on an Intel IPSC/860 parallel computer, using 16 processors, are reported. By exploiting parallelism, our algorithm increases the possible execution rate of such complex simulations by as much as an order of magnitude.  相似文献   

14.
15.
Molecular dynamics simulation is a class of applications that require reducing the execution time of fixed-size problems. This reduction in execution time is important to drug design and protein interaction studies. Many implementations of parallel molecular dynamics have been developed, but very little work has addressed issues related to the use of machines with 50,000 processors for modest-sized problems in the range of 50,000 atoms. Current massively parallel machines present a major obstacle to achieving good performance:communication overhead. In this paper we quantify the communication latency and network bandwidth necessary to achieve 30–40% efficiency on future message-passing machines with sizes on the order of tens of thousands of processors, for executing molecular dynamics problems with the same order of atoms. We derive an analytical model of a benchmark application that simulates a system of helium atoms executing on the Intel Touchstone Delta using an interaction decomposition method. This model is validated and used to extrapolate information on the startup time and network bandwidth. The results indicate that for an MPP with a four-dimensional mesh topology using 400 MHz processors, the communication startup time must be at most 30 clock cycles and the network bandwidth at least 2.3 GB/s. This configuration results in 30–40% efficiency of the MPP for a problem with 50,000 atoms executing on 50,000 processors.  相似文献   

16.
本文采用MPI消息传递模式自主开发出适用于高超声速流动数值模拟的并行计算软件,该软件以三维Navier-Stokes方程为基本控制方程来求解层流问题,应用基于结构网格的有限体积法对计算域进行离散,采用AUSMPW+格式求解对流通量,利用MUSCL插值方法获得高阶精度,时间格式上采用LU-SGS方法进行时间迭代以加快求解定常流动的收敛过程。在高性能计算机上针对不同高超声速流动进行大规模并行计算的结果表明,所开发的CFD并行计算软件具有较高的并行计算效率,为高超声速飞行器气动力/热的准确预测提供了高效工具。  相似文献   

17.
18.
在大规模并行系统中,系统级互连网络的设计至关重要.InfiniBand作为一种高性能交换式网络被广泛应用于大规模并行处理系统中.mesh/torus拓扑结构相较于目前普遍应用于InfiniBand网络的胖树拓扑结构拥有更好的性能与可扩展性.尽管如此,研究发现,用传统的mesh/torus拓扑结构构建InfiniBand互连网络存在诸多问题.分析了传统网络拓扑结构的缺陷,并提出了一种基于InfiniBand的多链路mesh/torus互连网络.这种改进型的拓扑结构通过充分利用交换机间的多链路可以获得比传统mesh/torus网络更高的带宽.另外,同时给出了与该网络拓扑结构相配套的高效路由算法.最后,通过网络仿真技术对提出的算法进行了评估,实验结果显示提出的路由算法相较于其他路由算法拥有更好的性能与可扩展性.  相似文献   

19.
一种新型片上网络互连结构的仿真和实现   总被引:2,自引:0,他引:2  
综合性能、硬件实现等方面考虑,提出一种基于片上网络的互连拓扑结构-层次化路由结构MLR(Multi-Layer Router).该结构通过层次化设计减小网络直径,具有良好的对称性和扩展性.网络建模仿真和硬件实现结果显示,在不同网络负载和不同IP核节点数的情况下,MLR与传统结构相比,在处理网络通信时,对于网络丢包率、通信延迟和网络吞吐量等网络性能参数均有最多50%-70%的提升;同时通过共享路由的方式,减少了超过20%的芯片面积和40%以上的动态功耗,有效降低了互连结构的硬件开销  相似文献   

20.
This paper addresses the utilization of traces taken from MPI applications to do simulation-based performance studies of parallel computing systems. Different mechanisms to capture traces are discussed, pointing out important limitations of some of them. One of these limitations is the invisibility of message interchanges in collective operations, which is circumvented modifying a trace-capturing library. During a simulation, trace records must be simulated in causal order, to fully comply with application semantics. Alternatives to follow this order, and the risks of not following it, are presented and discussed. The techniques introduced in this paper have been implemented in an in-house developed simulation environment, which is used in two example studies to show its usefulness: an evaluation of alternatives for interconnection network design, and a performance prediction study in which traces from one machine are used to estimate the execution times of applications running in a different machine.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号