共查询到18条相似文献,搜索用时 223 毫秒
1.
由于高速互连网络上的负载不均衡,一些网络结点成为了热点,可能导致部分结点或是链路拥塞,这会极大地降低互连网络的性能。现有的基于预约的拥塞避免技术SRP可以进行主动的拥塞避免,极大地缓解了由于热点问题所带来的负面效应。但是,在热点模式下,其它非热点结点的路由器资源绝大多数处于空闲状态,为了进一步充分利用互连网络的资源,提升互连网络性能,提出了一种基于SRP改进的中间结点缓存技术IRP。IRP可以根据不同的拓扑,例如胖树,有效地利用热点的邻居结点的路由器资源,先利用胖树拓扑的多路径将报文发送给空闲路由器,一旦目的结点路由器可利用,则将缓存报文发送给目的结点,降低互连网络的延迟。 相似文献
2.
基于星形互连网络的并行快速傅立叶变换算法 总被引:6,自引:0,他引:6
星形互连网络是一种易于实现大规模并行计算的互连网络拓扑结构。利用星形互连网络的递归可分解性的多样性,提出了一种基于星形互连网络的并行快速傅立叶变换算法的实现方法。该方法能够有效地减少计算过程中处理器结点之间的通信开销。提出的星图结点和数据的映射应运 及实现并行FFT的思想可推广到线性方程组求解、矩阵乘法等其它并行算法在星形互连网络上的实现。 相似文献
3.
路由器为高性能互连网络的关键组成部分,利用高阶路由器可灵活构建网络直径低、路由路径丰富、容错性能高的拓扑结构。分层结构将整个路由器分成多个子交叉开关实现,子交叉开关规模较小,典型实现为子交叉开关的数量等于路由器端口数,每个子交叉开关对应一个输入/输出端口。分层结构每个子交叉开关的输入和输出都设有缓冲区,导致分层结构路由器内部有大量缓冲区,扩展性受限。网络结构将用于构建系统的网络拓扑实现在片内,如通过网格、全互连或胖树连接较小的交换机,并通过集成电路技术实现在一个路由器中,对外表现为一个高阶路由器。网络结构成本低,构建系统网络后除了要考虑系统网络拓扑的性能,还需要考虑路由器本身的路由问题。提出基于Clos网络的分层结构路由器,综合了传统分层结构高性能和网络结构低成本的优点,并提出2种Clos网络的调度算法,在均匀流量模式下接近100%带宽,RTL综合评估其实现最多减少面积25.9%。 相似文献
4.
网络包分类技术是下一代路由器、防火墙、QoS保证机制实现、网络信息检测等设备的关键技术,在区域分割思想基础上,并在FPGA内实现的并行区域分割包分类算法是一种基于共享存储器和并行处理单元的高速网络包分类算法;它主要包括区域分割思想的存储器映射方法和两级、多通道并行处理技术两大部分. 相似文献
5.
6.
7.
本文提出了适用于网络计算环境的网络存储器的概念,介绍了网络存储器的原理实现方法及其对系统互连的支持网络存储器的主要优点在于能提高互连的效率,降低互连的成本。 相似文献
8.
9.
多级互连网络中的multicast通信 总被引:3,自引:1,他引:3
MPP系统中的并行通信是目前并行处理研究的热点,改善并行通信性能,提高网络吞吐率是促进MPP性能发挥的关键问题。multicast通信是区别于点到点通信的一对多通信方式,因而功能更强大,使用起来更灵活方便,在并行处理中应用十分广泛。文中以基于开关元件实现结点间动态互连的多级互连网络为背景,研究了multicast通信路上算法的效率。 相似文献
10.
提出一种应用于大规模并行处理系统的结点度等于常数的递归多级分层互连网络,称为全互连立方体网络(fully connected cubic network,FCCN)。FCCN具有可扩展性好、延伸性能好等优点,一个m-FCCN可以由8个(m-1)-FCCN递归得到,FCCN网络的结点度与网络的规模大小无关等于常数4,网络的直径和平均结点距离都与结点数的立方根成正比,提出FCCN中的简单路由算法,并将FCCN互连网络结构在大规模光电混合处理系统中进行应用,通过实际计算结果证明FCCN具有比较高的并行处理效率。 相似文献
11.
Xiaodong Zhang Yong Yan 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(12):1316-1331
Parallel computing performance on scalable shared-memory architectures is affected by the structure of the interconnection networks linking processors to memory modules and on the efficiency of the memory/cache management systems. Cache Coherence Nonuniform Memory Access (CC-NUMA) and Cache Only Memory Access (COMA) are two effective memory systems, and the hierarchical ring structure is an efficient interconnection network in hardware. This paper focuses on comparative performance modeling and evaluation of CC-NUMA and COMA on a hierarchical ring shared-memory architecture. Analytical models for the two memory systems for comparative evaluation are presented. Intensive performance measurements on data migrations have been conducted on the KSR-1, a COMA hierarchical ring shared-memory machine. Experimental results support the analytical models, and we present practical observations and comparisons of the two cache coherence memory systems. Our analytical and experimental results show that a COMA system balances the work load well. However the overhead of frequent data movement may match the gains obtained from improving load balance. We believe our performance results could be further generalized to the two memory systems on a hierarchical network architecture. Although a CC-NUMA system may not automatically balance the load at the system level, it provides an option for a user to explicitly handle data locality for a possible performance improvement 相似文献
12.
设计并实现了一种基于IP网络互连的、可扩展的声纳阵列信号并行处理系统。该系统采用二片TI公司高性能网络多媒体处理器TMS320DM642组成的板上流水线并行结构作为一个处理节点,并借助IP网络实现板间互连并行处理,可根据换能器阵元和处理速度的要求适当增减处理节点的数目。声纳系统的每个处理节点与数据采集转换部分采用TCP/IP网络连接,可以通过物理上添加一个或多个处理节点,提高系统的数据处理能力。 相似文献
13.
随着网络技术的不断发展,一些大企业应用局域网技术互连成企业内部网。本文阐述了使用静态路由来完成分布式LAN的互连的过程,先设计网络拓扑结构图,规划IP地址分配,配置路由器,最后检测网络连通性。 相似文献
14.
15.
Chamberlain R.D. Franklin M.A. Ch'ng Shi Baw 《Parallel and Distributed Systems, IEEE Transactions on》2002,13(10):1038-1055
The Gemini interconnect is a dual technology (optical and electrical) interconnection network designed for use in tightly-coupled multicomputer systems. It consists of a circuit-switched optical data path in parallel with a packet-switched electrical control/data path. The optical path is used for transmission of long data messages and the electrical path is used for switch control and transmission of short data messages. The paper describes the architecture of the interconnection network and related communications protocols. Fairness issues associated with network operation are addressed and a discrete-event simulation model of the entire system is described. Network performance characteristics derived from the simulation model are presented. The results show significant performance benefits when using virtual output queuing and quantify the tradeoffs between throughput and fairness in the system 相似文献
16.
David C Wyland 《Microprocessors and Microsystems》1988,12(10):585-594
Conventional memory blocks have a single address input and a single, usually bidirectional, data output. Dual-port memories have two address inputs and two data ports. These memories have been designed to facilitate the exchange of data between CPUs within a multiprocessor system. Each microprocessor can access the multiport memory and therefore read the data of another processor or leave data for another processor. There are two problems in the design of multiport memory systems. The first, and more trivial, concerns the way in which each processor supplies an address to the memory and how it accesses the memory
data bus. This is not a particularly complex problem and the designer
biggest worry is how to design the interface with the least number of multiplexers and buffers. Whenever a processor wishes to access the multiport memory, it takes control of the address and data bus and then accesses the memory. A more fundamental design problem is posed when two or more processors try to access the memory nearly simultaneously. Memory contention is solved by the use of an arbitration circuit that arbitrates between the contending processors, grants access to only one processor and forces the others to wait. Fortunately, it is no longer necessary for all designers to construct their own dual-port memories from discrete components, since several manufacturers now put the memory, address and data multiplexers plus arbitration circuits on chip. IDT's application note shows how its dual-port memory operates and how it is used in multiprocessor systems. 相似文献
17.
《Journal of Parallel and Distributed Computing》2004,64(4):498-506
In this paper, we look at the interconnection of propagation-based causal Distributed shared memory (DSM) systems. We present extremely simple protocols to interconnect two such systems (possibly implemented with different algorithms), that only require the existence of a bidirectional reliable FIFO channel connecting one process from each system. We show that the resulting DSM system is also causal. This result can be used to interconnect any number of DSM propagation-based causal systems, by interconnecting them in pairs with a tree topology. 相似文献
18.
Der-Fu Tao Author Vitae Liang-Teh Lee Author Vitae 《Computers & Electrical Engineering》2004,30(6):427-440
In order to make data exchange speed fast enough for supporting the current communication systems or networks, a high speed switching system with low transmission delay and low data loss is required. Many researchers used statistical time division multiplexing techniques to design the switching system for achieving a higher throughput. In such switching systems with n input/output ports, the internal execution speed must be n times faster than the speed of the system with single input/output port. This designing philosophy is really not an appropriate way as the demand trend for higher speed system in the future.For improving the drawbacks of the switching system mentioned above, a novel, revolutionary architecture of a Parallel Input Parallel Output Register Switching System (PIPORS) is proposed in this paper. The PIPORS is based on the interconnection of the small distributed Shared Memory Modules (SMM) and the Shift Register Switch Array (SRSA). This construction will accelerate the switching speed. In addition, the number of input/output ports of the system can easily be extended for providing a higher capacity to respond to the trend of fast increasing amount of data transferred in the system. Three simple methods to extend the input/output ports and the capacity of the internal memory are presented.For evaluating the performance of the proposed system, we made some performance comparisons among our PIPORS and Central Shared Memory Switching System (CSMS) with respect to the amount of total memory required, data loss probability, transmission delay and switching performance. It shows that a better performance can be achieved in our PIPORS. 相似文献