期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

朱成阳柴燕涛董德尊张鹤颖庞征斌《计算机工程与科学》2016,38(2):240-248

由于高速互连网络上的负载不均衡,一些网络结点成为了热点,可能导致部分结点或是链路拥塞,这会极大地降低互连网络的性能。现有的基于预约的拥塞避免技术SRP可以进行主动的拥塞避免,极大地缓解了由于热点问题所带来的负面效应。但是,在热点模式下,其它非热点结点的路由器资源绝大多数处于空闲状态,为了进一步充分利用互连网络的资源,提升互连网络性能,提出了一种基于SRP改进的中间结点缓存技术IRP。IRP可以根据不同的拓扑,例如胖树,有效地利用热点的邻居结点的路由器资源,先利用胖树拓扑的多路径将报文发送给空闲路由器,一旦目的结点路由器可利用,则将缓存报文发送给目的结点,降低互连网络的延迟。相似文献

2.

基于星形互连网络的并行快速傅立叶变换算法 总被引：6，自引：0，他引：6

史云涛侯紫峰宋建平《计算机研究与发展》2002,39(5):625-630

星形互连网络是一种易于实现大规模并行计算的互连网络拓扑结构。利用星形互连网络的递归可分解性的多样性，提出了一种基于星形互连网络的并行快速傅立叶变换算法的实现方法。该方法能够有效地减少计算过程中处理器结点之间的通信开销。提出的星图结点和数据的映射应运及实现并行FFT的思想可推广到线性方程组求解、矩阵乘法等其它并行算法在星形互连网络上的实现。相似文献

3.

基于Clos网络的高阶路由器结构

施得君李宏亮胡舒凯《计算机工程与科学》2023,(12):2099-2112

路由器为高性能互连网络的关键组成部分，利用高阶路由器可灵活构建网络直径低、路由路径丰富、容错性能高的拓扑结构。分层结构将整个路由器分成多个子交叉开关实现，子交叉开关规模较小，典型实现为子交叉开关的数量等于路由器端口数，每个子交叉开关对应一个输入/输出端口。分层结构每个子交叉开关的输入和输出都设有缓冲区，导致分层结构路由器内部有大量缓冲区，扩展性受限。网络结构将用于构建系统的网络拓扑实现在片内，如通过网格、全互连或胖树连接较小的交换机，并通过集成电路技术实现在一个路由器中，对外表现为一个高阶路由器。网络结构成本低，构建系统网络后除了要考虑系统网络拓扑的性能，还需要考虑路由器本身的路由问题。提出基于Clos网络的分层结构路由器，综合了传统分层结构高性能和网络结构低成本的优点，并提出2种Clos网络的调度算法，在均匀流量模式下接近100%带宽，RTL综合评估其实现最多减少面积25.9%。相似文献

4.

并行区域分割包分类算法

颜天信王永纲石江涛冯海涛《小型微型计算机系统》2005,26(11):1898-1902

网络包分类技术是下一代路由器、防火墙、QoS保证机制实现、网络信息检测等设备的关键技术,在区域分割思想基础上，并在FPGA内实现的并行区域分割包分类算法是一种基于共享存储器和并行处理单元的高速网络包分类算法；它主要包括区域分割思想的存储器映射方法和两级、多通道并行处理技术两大部分．相似文献

5.

并行系统的以存储器为中心的互联机制MCIM 总被引：2，自引：1，他引：1

李三立戈弋武剑峰《计算机学报》1999,22(4):395-402

并行系统中计算结点之间的互联网络一直是并行体系结构的研究热点,３０年来曾研究过多种ＩＮ的结构及其特性,然而这些ＩＮ都是以逻辑电路为基础的。本文提出一种以多端口快速静态存储器为中心的并行系统互联机制,称之为ＭＣＩＭ,ＭＣＩＭ不同于共享共享存储器,它的容量较小,划分为多个消息传递的通信邮区,并通过每个端口的访问接口（ＰＡＩ）。连接８－１６个计算结点。常用的四端口存储器可组成３２－６４个计算结点的并行相似文献

6.

流水通道--一种高速的MPP系统互连

刘燕徐炜遐杨晓东《计算机学报》1998,21(11):995-1002

传统大规模并行处理机系统中的互连网络及路由器在强同步方式下工作，相邻部由器间消息的传送必须保证一个周有完成，系统互连长度成为影响网络主频的重要因素，同时也限制了系统的可扩展性，采用流水通道可在一条通道上同时传送多个数据，使得网络的主频独立于线的长度，提高了网络的传输速率，本文介绍了一这思想，并对流水通道互连网络的实现技术进行了研究，着重阐述了源同步传输，切换技术，流控策略等几个关键技术问题。相似文献

7.

网络存储器与机器互连

万志坤张江陵《小型微型计算机系统》1998,19(1):59-61

本文提出了适用于网络计算环境的网络存储器的概念，介绍了网络存储器的原理实现方法及其对系统互连的支持网络存储器的主要优点在于能提高互连的效率，降低互连的成本。相似文献

8.

多处理器信号处理系统的互连网络研究 总被引：1，自引：0，他引：1

王新宏陈航李志舜《计算机工程》2001,27(9):79-80

分析了互连网络形式的特点,根据信号处理任务的特点,结合流水线处理和并行处理,采用层次结构,实现了多端口存储器互连网络形式的钦处理器信号处理系统。系统充分利用了多端口存储器互连网络的高速、低传输延迟和易于控制的优点,将扩展性重点放在流水线处理和并行处理的扩展性上,并避开了多端口存储互连网络扩展性差的缺点,系统具有较好的扩展性、重建性、灵活性和通用性。相似文献

9.

多级互连网络中的multicast通信 总被引：3，自引：1，他引：3

王晓东周兴铭《计算机研究与发展》1998,35(1):40-44

ＭＰＰ系统中的并行通信是目前并行处理研究的热点，改善并行通信性能，提高网络吞吐率是促进ＭＰＰ性能发挥的关键问题。ｍｕｌｔｉｃａｓｔ通信是区别于点到点通信的一对多通信方式，因而功能更强大，使用起来更灵活方便，在并行处理中应用十分广泛。文中以基于开关元件实现结点间动态互连的多级互连网络为背景，研究了ｍｕｌｔｉｃａｓｔ通信路上算法的效率。相似文献

10.

全互连立方体网络的并行处理系统中的应用

王洪玉董秀国《计算机研究与发展》2001,38(5):609-615

提出一种应用于大规模并行处理系统的结点度等于常数的递归多级分层互连网络,称为全互连立方体网络（fully connected cubic network,FCCN）。FCCN具有可扩展性好、延伸性能好等优点,一个m-FCCN可以由8个（m－1）－FCCN递归得到,FCCN网络的结点度与网络的规模大小无关等于常数4,网络的直径和平均结点距离都与结点数的立方根成正比,提出FCCN中的简单路由算法,并将FCCN互连网络结构在大规模光电混合处理系统中进行应用,通过实际计算结果证明FCCN具有比较高的并行处理效率。相似文献

11.

Comparative modeling and evaluation of CC-NUMA and COMA onhierarchical ring architectures

Xiaodong Zhang Yong Yan 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(12):1316-1331

Parallel computing performance on scalable shared-memory architectures is affected by the structure of the interconnection networks linking processors to memory modules and on the efficiency of the memory/cache management systems. Cache Coherence Nonuniform Memory Access (CC-NUMA) and Cache Only Memory Access (COMA) are two effective memory systems, and the hierarchical ring structure is an efficient interconnection network in hardware. This paper focuses on comparative performance modeling and evaluation of CC-NUMA and COMA on a hierarchical ring shared-memory architecture. Analytical models for the two memory systems for comparative evaluation are presented. Intensive performance measurements on data migrations have been conducted on the KSR-1, a COMA hierarchical ring shared-memory machine. Experimental results support the analytical models, and we present practical observations and comparisons of the two cache coherence memory systems. Our analytical and experimental results show that a COMA system balances the work load well. However the overhead of frequent data movement may match the gains obtained from improving load balance. We believe our performance results could be further generalized to the two memory systems on a hierarchical network architecture. Although a CC-NUMA system may not automatically balance the load at the system level, it provides an option for a user to explicitly handle data locality for a possible performance improvement 相似文献

12.

基于IP互连的DSP水声阵列信号并行处理实现方法

罗杰刘千里《计算机与数字工程》2012,40(1):71-73,100

设计并实现了一种基于IP网络互连的、可扩展的声纳阵列信号并行处理系统。该系统采用二片TI公司高性能网络多媒体处理器TMS320DM642组成的板上流水线并行结构作为一个处理节点,并借助IP网络实现板间互连并行处理,可根据换能器阵元和处理速度的要求适当增减处理节点的数目。声纳系统的每个处理节点与数据采集转换部分采用TCP/IP网络连接,可以通过物理上添加一个或多个处理节点,提高系统的数据处理能力。相似文献

13.

浅谈分布式局域网的互连

陈文一《广东电脑与电讯》2007,(10):31-32,42

随着网络技术的不断发展,一些大企业应用局域网技术互连成企业内部网。本文阐述了使用静态路由来完成分布式LAN的互连的过程,先设计网络拓扑结构图,规划IP地址分配,配置路由器,最后检测网络连通性。相似文献

14.

基于SRIO的多DSP并行信号处理系统 总被引：1，自引：0，他引：1

下载免费PDF全文

屈磊宋慰军苟冬荣柴小丽奚军《计算机工程》2008,34(Z1)

利用SRIO作为系统内部互联总线,以集成了SRIO接口的TMS320C6455为核心处理器,设计一种多DSP并行信号处理系统.该方案不仅有效解决系统连接的瓶颈,而且实现拓扑结构可重构,大幅度提高了系统的信号处理能力.该文分析该系统的设计思路,给出硬件结构,并结合实际应用验证了该系统的高带宽、可重构特点. 相似文献

15.

Gemini: an optical interconnection network for parallel processing

Chamberlain R.D. Franklin M.A. Ch'ng Shi Baw 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(10):1038-1055

The Gemini interconnect is a dual technology (optical and electrical) interconnection network designed for use in tightly-coupled multicomputer systems. It consists of a circuit-switched optical data path in parallel with a packet-switched electrical control/data path. The optical path is used for transmission of long data messages and the electrical path is used for switch control and transmission of short data messages. The paper describes the architecture of the interconnection network and related communications protocols. Fairness issues associated with network operation are addressed and a discrete-event simulation model of the entire system is described. Network performance characteristics derived from the simulation model are presented. The results show significant performance benefits when using virtual output queuing and quantify the tradeoffs between throughput and fairness in the system 相似文献

16.

Dual-port RAMs simplify processor communications

David C Wyland 《Microprocessors and Microsystems》1988,12(10):585-594

Conventional memory blocks have a single address input and a single, usually bidirectional, data output. Dual-port memories have two address inputs and two data ports. These memories have been designed to facilitate the exchange of data between CPUs within a multiprocessor system. Each microprocessor can access the multiport memory and therefore read the data of another processor or leave data for another processor. There are two problems in the design of multiport memory systems. The first, and more trivial, concerns the way in which each processor supplies an address to the memory and how it accesses the memory data bus. This is not a particularly complex problem and the designer biggest worry is how to design the interface with the least number of multiplexers and buffers. Whenever a processor wishes to access the multiport memory, it takes control of the address and data bus and then accesses the memory. A more fundamental design problem is posed when two or more processors try to access the memory nearly simultaneously. Memory contention is solved by the use of an arbitration circuit that arbitrates between the contending processors, grants access to only one processor and forces the others to wait. Fortunately, it is no longer necessary for all designers to construct their own dual-port memories from discrete components, since several manufacturers now put the memory, address and data multiplexers plus arbitration circuits on chip. IDT's application note shows how its dual-port memory operates and how it is used in multiprocessor systems. 相似文献

17.

On the interconnection of causal memory systems

《Journal of Parallel and Distributed Computing》2004,64(4):498-506

In this paper, we look at the interconnection of propagation-based causal Distributed shared memory (DSM) systems. We present extremely simple protocols to interconnect two such systems (possibly implemented with different algorithms), that only require the existence of a bidirectional reliable FIFO channel connecting one process from each system. We show that the resulting DSM system is also causal. This result can be used to interconnect any number of DSM propagation-based causal systems, by interconnecting them in pairs with a tree topology. 相似文献

18.

PIPORS: a parallel input parallel output register switching system

Der-Fu Tao Author Vitae Liang-Teh Lee Author Vitae 《Computers & Electrical Engineering》2004,30(6):427-440

In order to make data exchange speed fast enough for supporting the current communication systems or networks, a high speed switching system with low transmission delay and low data loss is required. Many researchers used statistical time division multiplexing techniques to design the switching system for achieving a higher throughput. In such switching systems with n input/output ports, the internal execution speed must be n times faster than the speed of the system with single input/output port. This designing philosophy is really not an appropriate way as the demand trend for higher speed system in the future.For improving the drawbacks of the switching system mentioned above, a novel, revolutionary architecture of a Parallel Input Parallel Output Register Switching System (PIPORS) is proposed in this paper. The PIPORS is based on the interconnection of the small distributed Shared Memory Modules (SMM) and the Shift Register Switch Array (SRSA). This construction will accelerate the switching speed. In addition, the number of input/output ports of the system can easily be extended for providing a higher capacity to respond to the trend of fast increasing amount of data transferred in the system. Three simple methods to extend the input/output ports and the capacity of the internal memory are presented.For evaluating the performance of the proposed system, we made some performance comparisons among our PIPORS and Central Shared Memory Switching System (CSMS) with respect to the amount of total memory required, data loss probability, transmission delay and switching performance. It shows that a better performance can be achieved in our PIPORS. 相似文献