首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 119 毫秒
1.
超级并行计算机的设计目标之一是,计算能力与I/O性能保持匹配和均衡,随着大规模并行处理技术的发展和现代科学与工程技术研究的需要,超级计算机计算能力与I/O处理能力不协调问题越来越突出,愈发成为超级并行计算机发展的制肘因素,本文分析了I/O系统发展滞后的原因,讨论了几种超级并行计算机I/O系统的结构,并对如何提高和充分发挥超级计算机并行I/O性能,从系统结构,互连技术,I/O服务节点选择及软件设计等方面进行了探讨。  相似文献   

2.
异构无线网络互连是当前研究的热点,针对异构无线网络互连的通用结构问题,提出了一种基于IP一体化交换机制的互连互通策略,重点研究了互连总体结构、互连核心模块设计等关键技术,并通过对互连策略、IP映射和IP封装技术的性能仿真结果分析,验证了互连互通策略的有效性.  相似文献   

3.
本文讨论局部网络网际互连的体系结构。文中先概述局部网际互连及其体系结构的发展,然后着重讨论了Pup网际互连结构模式,并与其它典型互连结构ISO、DOD进行了比较分析。  相似文献   

4.
芯片间的互连速率已经达到GHz量级,相比较于低速互连,高速互连的测试遇到了新的挑战.本文探讨了高速互连测试的难点,传统互连测试方法的不足,进而介绍了互连内建自测试(IBIST)的结构以及方法,最后给出IBIST在FPGA中的一种实现.  相似文献   

5.
该文将从OSI结构层次模型讨论互连的含义、互连的方法和实现,最后讨论了网络互连类型。  相似文献   

6.
一种用于机群系统的双环光互连网络性能分析   总被引:2,自引:0,他引:2  
高性能计算机网络对带宽的需求,使得如何提供高带宽的互连网络以及充分利用互连网络的固有带宽度为一个研究焦点。基于Linux操作系统,以千兆光互连接口卡为网络接口,设计实现了一种可用于机群系统的双环光互连网络。介绍了光互连网接口卡的基本结构,接口卡驱动软件的设计方法,双环光互连网络的拓朴结构及其特性。分析测试了光互连网络的通信性能,指出了影响系统整体性能的关键因素。  相似文献   

7.
随着计算机系统对互连网络性能需求的提高,传统的电互连技术出现了许多无法克服的问题,光互连技术应运而生。本文设计了一种新的基于高速光开关的处理器间光互连结构——PIBOS,并在此基础上提出了单级PIBOS和多级PIBOS中的链路仲裁和路由算法。模拟结果表明,采用PIBOS互连结构,减少了数据传输过程中的光电转换操作,提高了网络的吞吐率,降低了系统延时,并很好地实现了互连系统的扩展。  相似文献   

8.
Torus连接Petersen图互连网络及路由算法   总被引:3,自引:0,他引:3  
可扩展性和短直径是设计大规模并行计算机系统互连网络的两个重要因素.基于Petersen图的短直径和正规性和Torus拓扑结构的可扩展性,提出了一种新的互连网络拓扑结构,称为Torus连接Petersen图互连网络.该互连网络拓扑结构具有短直径、正规性、对称性和良好的扩展性.网络节点采用混合编码方法,使得路由算法设计简单.分别设计了基于混合编码的单播、广播路由算法.分析表明提出的互连网络具有较好的拓扑性质.  相似文献   

9.
光互连计算机体系结构研究   总被引:1,自引:0,他引:1       下载免费PDF全文
本文首先介绍了光互连技术和并行计算技术的发展现状。光互连技术具有高互连带宽、低功耗、高互连密度等特性 ,这些特性使得光互连技术在提高系统性能、减少系统体积、降低系统成本等方面将发挥重要作用。提出了一种采用自由空间光互连技术的计算机体系结构模型。这种模型能够充分利用自由空间光互连技术高互连密度和高互连带宽的特点 ,同时能够由数据驱动进行计算 ,具有较高的并行效率 ,采作 RAW结构中使用的 Benchmark程序进行的模拟证明了这种结构的较高并行效率。自由空间光互连技术所基于的器件技术与工艺已经日益成熟 ,如采用倒焊工…  相似文献   

10.
SoC中各IP核之间的互连结构是决定片上系统性能的关键因素.近年来,片上互连通信结构的配置与优化成为SoC通信综合的研究重点和热点,而已有方法优化SoC互连通信结构的仿真速度较慢,支持设计自动化的能力较差,使用的单目标优化算法无法解决多个性能目标之间的冲突.针对以上不足提出了吞吐量和延时约束下的片上互连通信结构的自动配置与优化的方法,该方法提出了片上总线互连通信结构模板,使用事务级通信仿真和多目标演化算法,探索吞吐量和延时约束下的多目标Pareto空间.与已有的先进Srinivasan方法相比,该方法的吞吐量提高10%,传输延迟降低17%,有效提高了SoC互连通信结构的优化质量.  相似文献   

11.
Advances in interconnection network performance and interprocessor interaction mechanisms enable the construction of fine-grain parallel computers in which the nodes are physically small and have a small amount of memory. This class of machines has a much higher ratio of processor to memory area and hence provides greater processor throughput and memory bandwidth per unit cost relative to conventional memory-dominated machines. This paper describes the technology and architecture trends motivating fine-grain architecture and the enabling technologies of high-performance interconnection networks and low-overhead interaction mechanisms. We conclude with a discussion of our experiences with the J-Machine, a prototype fine-grain concurrent computer.  相似文献   

12.
王逸林  蔡平  梅继丹 《计算机工程》2008,34(10):259-260
在并行处理系统中,处理节点之间的通信开销是制约处理机性能提高的主要瓶颈。该文提出一种以TMS320C641X数字信号处理器为核心的并行处理系统,设计了PCI总线、串口和包交换网络等多种并行互联网络,使得输入、输出、控制等多种数据流分离,在适合的网络上传输,可以提高传输效率,实现高性能DSP与高性能互联系统的结合。  相似文献   

13.
Performance of multiprocessor interconnection networks   总被引:1,自引:0,他引:1  
A tutorial is provided on the performance evaluation of multiprocessor interconnection networks, to guide system designers in their design process. A classification of parallel/distributed systems is followed by a classification of multiprocessor interconnection networks. Basic terminology for performance evaluation is presented. The performance of crossbar interconnection networks, multistage interconnection networks, and multiple-bus systems is then addressed, and a comparison is made along them  相似文献   

14.
High-radix switches reduce network cost and improve network performance, especially in large switch-based interconnection networks. However, there are some problems related to the integration scale to implement such switches in a single chip. An interesting alternative for building high-radix switches consists of combining several current smaller single-chip switches to obtain switches with a greater number of ports. A key design issue of this kind of high-radix switches is the internal switch configuration, specifically, the correspondence between the ports of these high-radix switches and the ports of their smaller internal single-chip switches. In this paper we use artificial intelligence and data mining techniques in order to obtain the optimal internal configuration of all the switches in the network of large supercomputers running parallel applications. Simulation results show that using the resultant switch configurations, it is possible to achieve similar performance as with single-chip switches with the same radix, which would be unfeasible with the current integration scale.  相似文献   

15.
帅典勋  冯翔  赵宏彬  王兴 《计算机学报》2004,27(11):1441-1450
该文作者曾提出了广义细胞自动机(GCA)的原理和并行算法.并且应用于网络快速包交换等动态优化问题.该文进一步讨论了这种新的广义细胞自动机的体系结构、算法的硬件实现及其电路设计。它们对于GCA的实际应用有重要意义.GCA结构不同于Hopfield神经网络(HNN)和细胞神经网络(CNN),GCA由多层次多粒度宏细胞组成塔形结构.它具有多粒度的宏细胞动力学特征.相同粒度宏细胞之间没有交互,但不同粒度宏细胞之间存在一定程度的交互或反馈.分析和实验表明.在问题求解的优化性、实时性、硬件实现复杂性等方面.该文给出的GCA结构和硬件实现.与HNN和CNN相比有诸多优点.  相似文献   

16.
闭排队网络基于并行仿真的灵敏度估计和优化算法   总被引:2,自引:0,他引:2  
基于Markov性能势理论,对一类闭排队网络的灵敏度估计和优化,建立了一种行之有效的并行仿真算法。采用公共随机数,使所有的处理器使用相同的样本轨道,以减少各个处理器之间的通讯时间。在一台SPMD并行计算机上的仿真实例表明,该并行仿真算法对于闭排队网络的优化能显著地提高运算速度。  相似文献   

17.
在大规模并行系统中,系统级互连网络的设计至关重要.InfiniBand作为一种高性能交换式网络被广泛应用于大规模并行处理系统中.mesh/torus拓扑结构相较于目前普遍应用于InfiniBand网络的胖树拓扑结构拥有更好的性能与可扩展性.尽管如此,研究发现,用传统的mesh/torus拓扑结构构建InfiniBand互连网络存在诸多问题.分析了传统网络拓扑结构的缺陷,并提出了一种基于InfiniBand的多链路mesh/torus互连网络.这种改进型的拓扑结构通过充分利用交换机间的多链路可以获得比传统mesh/torus网络更高的带宽.另外,同时给出了与该网络拓扑结构相配套的高效路由算法.最后,通过网络仿真技术对提出的算法进行了评估,实验结果显示提出的路由算法相较于其他路由算法拥有更好的性能与可扩展性.  相似文献   

18.
This paper extends research into rhombic overlapping-connectivity interconnection networks into the area of parallel applications. As a foundation for a shared-memory non-uniform access bus-based multiprocessor, these interconnection networks create overlapping groups of processors, buses, and memories, forming a clustered computer architecture where the clusters overlap. This overlapping-membership characteristic is shown to be useful for matching parallel application communication topology to the architecture's bandwidth characteristics. Many parallel applications can be mapped to the architecture topology so that most or all communication is localized within an overlapping cluster, at the low latency of processor direct to cache (or memory) over a bus. The latency of communication between parallel threads does not degrade parallel performance or limit the graininess of applications. Parallel applications can execute with good speedup and scaling on a proposed architecture which is designed to obtain maximum advantage from the overlapping-cluster characteristic, and also allows dynamic workload migration without moving the instructions or data. Scalability limitations of bus-based shared-memory multiprocessors are overcome by judicious workload allocation schemes, that take advantage of the overlapping-cluster memberships. Bus-based rhombic shared-memory multiprocessors are examined in terms of parallel speedup models to explain their advantages and justify their use as a foundation for the proposed computer architecture. Interconnection bandwidth is maximized with bi-directional circular and segmented overlapping buses. Strategies for mapping parallel application communication topologies to rhombic architectures are developed. Analytical models of enhanced rhombic multiprocessor performance are developed with a unique bandwidth modeling technique, and are compared with the results of simulation.  相似文献   

19.
Polymorphic Torus is a novel interconnection network for SIMD massively parallel computers, able to support effectively both local and global communication. Thanks to this characteristic, Polymorphic Torus is highly suitable for computer vision applications, since vision involves local communication at the low-level stage and global communication at the intermediate- and high-level stages. In this paper we evaluate the performance of Polymorphic Torus in the computer vision domain. We consider a set of basic vision tasks, namely,convolution, histogramming, connected component labeling, Hough transform, extreme point identification, diameter computation, andvisibility, and show how they can take advantage of the Polymorphic Torus communication capabilities. For each basic vision task we propose a Polymorphic Torus parallel algorithm, give its computational complexity, and compare such a complexity with the complexity of the same task inmesh, tree, pyramid, and hypercube interconnection networks. In spite of the fact that Polymorphic Torus has the same wiring complexity as mesh, the comparison shows that in all of the vision tasks under examination it achieves complexity lower than or at most equal to hypercube, which is the most powerful among the interconnection networks considered.  相似文献   

20.
A new parallel implementation of Strassen’s matrix multiplication algorithm is proposed for massively parallel supercomputers with 2D, all-port torus interconnection networks. The proposed algorithm employs a special conflict-free routing pattern for better scalability and is able to yield a performance rate very close to the theoretical bound for many practical network and matrix sizes. It effectively scales up to very large networks typically containing hundreds-of-thousands processors where petaflop or exaflop processing rates are sought.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号