期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

陈淑平周慧霖何王全漆锋滨《计算机工程与应用》2022,58(5):138-147

高性能计算中,硬件支持的多播操作对应用程序性能具有至关重要的影响.Infiniband网络中现有的两类多播路由算法中,MINIHOP-MC未考虑路由均衡性问题,导致链路edge forwarding index(EFI)指数很大,严重影响多播消息性能;SSSP-MC虽然部分考虑了路由均衡性问题,但其运行时间很长,不能满... 相似文献

2.

基于OMNet++的大规模InfiniBand互连网络模拟系统

汪鑫林放刘轶钱德沛《计算机工程与科学》2021,43(5):792-798

随着多核处理器的发展和计算需求的不断增长,高性能计算系统规模不断增大.使用模拟器对高性能计算系统进行模拟,对系统设计及优化有着重要的作用,互连网络模拟则是其中不可或缺的一部分.设计实现了一种基于OM Net++的大规模InfiniBand互连网络模拟系统,该系统通过记录的并行程序M PI消息来驱动网络仿真过程,可以模拟... 相似文献

3.

一种新的高性能计算机互连网络及其并行仿真

李晖吴俊敏陈国良《小型微型计算机系统》2010,31(9)

介绍一种新型的高性能计算机专用网络拓扑结构MPU,包括其数学模型、技术实现、路由算法等.从理论上分析了MPU的性能,并就其性能与目前先进的3-D Torus网络等进行了数学上的对比分析,MPU的大部分性能指标均优于3-D Torus网络.本文还实现了一个为MPU开发的大型并行模拟器MPUS,介绍了MPUS的架构、实现、工作流程等,最后给出了仿真结果.实验证明,MPU设计正确,且MPUS具有良好的扩放性. 相似文献

4.

NOVELL与DECNET网络系统互连

董玲徐亚非《计算机网络》1993,(2):49-50

相似文献

5.

基于OSI的异构系统互连网络

吉逸许明明《小型微型计算机系统》1993,14(7):24-30

相似文献

6.

基于星形互连网络的并行快速傅立叶变换算法 总被引：6，自引：0，他引：6

史云涛侯紫峰宋建平《计算机研究与发展》2002,39(5):625-630

星形互连网络是一种易于实现大规模并行计算的互连网络拓扑结构。利用星形互连网络的递归可分解性的多样性，提出了一种基于星形互连网络的并行快速傅立叶变换算法的实现方法。该方法能够有效地减少计算过程中处理器结点之间的通信开销。提出的星图结点和数据的映射应运及实现并行FFT的思想可推广到线性方程组求解、矩阵乘法等其它并行算法在星形互连网络上的实现。相似文献

7.

大规模并行处理系统互连通信的新技术研究 总被引：2，自引：0，他引：2

郑世荣李晓峰《计算机研究与发展》1996,33(6):402-407,447

本文综述了大规模并行处理系统研究了工作的概况，指出其研究热点和关键技术是实现高效的互连通信。文中重点介绍了该领域的研究内容：结点结构、网络接口、切换技术，拓扑结构，路由算法，通信机制，通信协议，计算模型等。相似文献

8.

多级互连网络互连函数的矩阵理论 总被引：3，自引：1，他引：3

艾军《小型微型计算机系统》1998,19(9):7-11

多级互连网网络是大规模并行处理系统和大型ＡＴＭ交换机采用的主要互连结构。相似文献

9.

网络互连

张保栋《计算机与通信》1995,(12):4-7,44

相似文献

10.

基于混合petri网的网络建模与仿真 总被引：1，自引：0，他引：1

李勇曹广益朱新坚《计算机仿真》2005,22(6):88-91

该文基于网络交换节点传输特性的辨识模型,提出了一种基于混合petri网的节点仿真模型,以及利用该模型进行网络建模的方法。分析了交换节点混合petri网模型的仿真精度,给出了保证信息量守恒的仿真条件。对于不能消除的误差,分析了其在仿真过程中的动态范围,得出了模型能够自动补偿的结论。通过建立一个网络的混合petri网模型,并利用visual object net2.0进行了信息传输过程仿真。仿真结果说明该网络模型能够很好地模拟信息传输的动态过程,能够体现网络交换节点的非线性传输特性。相似文献

11.

Sep: A Fixed Degree Regular Network for Massively Parallel Systems 总被引：4，自引：0，他引：4

Latifi Shahram Srimani Pradip K. 《The Journal of supercomputing》1998,12(3):277-291

We propose a family of regular Cayley network graphs of degree three based on permutation groups for design of massively parallel systems. These graphs are shown to be based on the shuffle exchange operations, to have logarithmic diameter in the number of vertices, and to be maximally fault tolerant. We investigate different algebraic properties of these networks (including fault tolerance) and propose a simple routing algorithm. These graphs are shown to be able to efficiently simulate or embed other permutation group based graphs; thus they seem to be very attractive for VLSI implementation and for applications requiring bounded number of I/O ports as well as to run existing applications for other permutation group based network architectures. 相似文献

12.

A Heterogeneous Mixed-Mode Execution Model for Massively Parallel Systems

《Journal of Parallel and Distributed Computing》1999,56(1):2-16

In this paper, we consider a massively parallel system that is composed of heterogeneous processors, that is, processors with different processing power, and that combines the advantages of the SIMD and MIMD architectures. The heterogeneous mixed-mode (HeMM) execution model is composed of two main components, which operate in the well-known SIMD and MIMD paradigms. The main computing power comes from a component that is composed of a massive number of processors and operates in a data parallel manner. The other component is composed of a few (or even one) fast processors which operate in the MIMD paradigm. The operation of a small number of processors in an MIMD paradigm has been well demonstrated through actual systems. The processors in this component add flexibility to the execution of the parallel programs such that it adjusts to the changing parallelism of the program to enhance the performance. Based on this execution model we analyze the gains in performance that is obtainable by this new system. We show that substantial performance gains can be obtained by using the HeMM system. 相似文献

13.

A Sweep Algorithm for Massively Parallel Simulation of Circuit-Switched Networks

《Journal of Parallel and Distributed Computing》1993,18(4):484-500

A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched networks that are controlled by a randomized-routing policy that includes trunk-reservation. A single instruction multiple data implementation is described and corresponding experiments on a 16384 processor MasPar parallel computer are reported. A multiple instruction multiple data implementation is also described and corresponding experiments on an Intel IPSC/860 parallel computer, using 16 processors, are reported. By exploiting parallelism, our algorithm increases the possible execution rate of such complex simulations by as much as an order of magnitude. 相似文献

14.

计算机网络的互连

徐光贤林仲卯《计算机研究与发展》1993,30(5):50-54

相似文献

15.

Parallel Molecular Dynamics: Implications for Massively Parallel Machines

Valerie E. Taylor Rick L. Stevens Kathryn E. Arnold 《Journal of Parallel and Distributed Computing》1997,45(2):159

Molecular dynamics simulation is a class of applications that require reducing the execution time of fixed-size problems. This reduction in execution time is important to drug design and protein interaction studies. Many implementations of parallel molecular dynamics have been developed, but very little work has addressed issues related to the use of machines with 50,000 processors for modest-sized problems in the range of 50,000 atoms. Current massively parallel machines present a major obstacle to achieving good performance:communication overhead. In this paper we quantify the communication latency and network bandwidth necessary to achieve 30–40% efficiency on future message-passing machines with sizes on the order of tens of thousands of processors, for executing molecular dynamics problems with the same order of atoms. We derive an analytical model of a benchmark application that simulates a system of helium atoms executing on the Intel Touchstone Delta using an interaction decomposition method. This model is validated and used to extrapolate information on the startup time and network bandwidth. The results indicate that for an MPP with a four-dimensional mesh topology using 400 MHz processors, the communication startup time must be at most 30 clock cycles and the network bandwidth at least 2.3 GB/s. This configuration results in 30–40% efficiency of the MPP for a problem with 50,000 atoms executing on 50,000 processors. 相似文献

16.

基于结构网格的高超声速流动大规模并行数值模拟研究

丁国昊李桦潘沙高洪贺《计算机工程与科学》2012,34(8):154-159

本文采用MPI消息传递模式自主开发出适用于高超声速流动数值模拟的并行计算软件,该软件以三维Navier-Stokes方程为基本控制方程来求解层流问题,应用基于结构网格的有限体积法对计算域进行离散,采用AUSMPW+格式求解对流通量,利用MUSCL插值方法获得高阶精度,时间格式上采用LU-SGS方法进行时间迭代以加快求解定常流动的收敛过程。在高性能计算机上针对不同高超声速流动进行大规模并行计算的结果表明,所开发的CFD并行计算软件具有较高的并行计算效率,为高超声速飞行器气动力/热的准确预测提供了高效工具。相似文献

17.

并行机互联网络拓扑结构描述语言--TOD

李强国于洋何凯李涛杨愚鲁《计算机应用研究》2006,23(3):76-78

相似文献

18.

基于InfiniBand的多链路mesh/torus大规模并行系统互连网络

夏晓爽刘轶王允彬钱德沛《计算机研究与发展》2012,49(1):76-82

在大规模并行系统中,系统级互连网络的设计至关重要.InfiniBand作为一种高性能交换式网络被广泛应用于大规模并行处理系统中.mesh/torus拓扑结构相较于目前普遍应用于InfiniBand网络的胖树拓扑结构拥有更好的性能与可扩展性.尽管如此,研究发现,用传统的mesh/torus拓扑结构构建InfiniBand互连网络存在诸多问题.分析了传统网络拓扑结构的缺陷,并提出了一种基于InfiniBand的多链路mesh/torus互连网络.这种改进型的拓扑结构通过充分利用交换机间的多链路可以获得比传统mesh/torus网络更高的带宽.另外,同时给出了与该网络拓扑结构相配套的高效路由算法.最后,通过网络仿真技术对提出的算法进行了评估,实验结果显示提出的路由算法相较于其他路由算法拥有更好的性能与可扩展性. 相似文献

19.

一种新型片上网络互连结构的仿真和实现 总被引：2，自引：0，他引：2

陈芳露陆雯青虞志益周晓方《小型微型计算机系统》2010,31(5)

综合性能、硬件实现等方面考虑,提出一种基于片上网络的互连拓扑结构-层次化路由结构MLR(Multi-Layer Router).该结构通过层次化设计减小网络直径,具有良好的对称性和扩展性.网络建模仿真和硬件实现结果显示,在不同网络负载和不同IP核节点数的情况下,MLR与传统结构相比,在处理网络通信时,对于网络丢包率、通信延迟和网络吞吐量等网络性能参数均有最多50%-70%的提升;同时通过共享路由的方式,减少了超过20%的芯片面积和40%以上的动态功耗,有效降低了互连结构的硬件开销相似文献

20.

Interconnection Network Simulation Using Traces of MPI Applications

J. Miguel-Alonso J. Navaridas F. J. Ridruejo 《International journal of parallel programming》2009,37(2):153-174

This paper addresses the utilization of traces taken from MPI applications to do simulation-based performance studies of parallel computing systems. Different mechanisms to capture traces are discussed, pointing out important limitations of some of them. One of these limitations is the invisibility of message interchanges in collective operations, which is circumvented modifying a trace-capturing library. During a simulation, trace records must be simulated in causal order, to fully comply with application semantics. Alternatives to follow this order, and the risks of not following it, are presented and discussed. The techniques introduced in this paper have been implemented in an in-house developed simulation environment, which is used in two example studies to show its usefulness: an evaluation of alternatives for interconnection network design, and a performance prediction study in which traces from one machine are used to estimate the execution times of applications running in a different machine. 相似文献