首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Providing highly flexible connectivity is a major architectural challenge for hardware implementation of reconfigurable neural networks. We perform an analytical evaluation and comparison of different configurable interconnect architectures (mesh NoC, tree, shared bus and point-to-point) emulating variants of two neural network topologies (having full and random configurable connectivity). We derive analytical expressions and asymptotic limits for performance (in terms of bandwidth) and cost (in terms of area and power) of the interconnect architectures considering three communication methods (unicast, multicast and broadcast). It is shown that multicast mesh NoC provides the highest performance/cost ratio and consequently it is the most suitable interconnect architecture for configurable neural network implementation. Routing table size requirements and their impact on scalability were analyzed. Modular hierarchical architecture based on multicast mesh NoC is proposed to allow large scale neural networks emulation. Simulation results successfully validate the analytical models and the asymptotic behavior of the network as a function of its size.  相似文献   

2.
随着单个芯片上集成的处理器的个数越来越多,传统的电互连网络已经无法满足对互连网络性能的需求,需要一种新的互连方式,因此光互连网络技术应运而生.目前,电互连的片上网络在功耗、性能、带宽、延迟等方面遇到了瓶颈,而光互连作为一种新的互连方式引用到片上网络具有低损耗、高吞吐率、低延迟等无可比拟的优势.本文主要探讨了片上光网络的...  相似文献   

3.
本文介绍了波分复用技术与全光网络的概念,提出了基于光槽路由的波分复用网络体系结构,讨论了实现光槽与分组传送的方法以及有关的网络存取协议,并给出了全光学网络中网桥与节点的示意性结构。本文的目的是提供一个这种方法的基本描述。  相似文献   

4.
We describe a testbed to study both the theoretical aspects and physical implementation issues associated with high-bit-rate, multihop, packet-switched OTDM networks. We have found that using optical time-division-multiplexed (OTDM) techniques can greatly increase the bandwidth of a single-wavelength channel. Ultrafast OTDM networks are excellent candidates for meeting the system requirements for massively parallel processor interconnects, which include low latency, high bandwidth, and immunity to electromagnetic interference. High-bit-rate transparent optical networks (or TONs) for multiprocessor interconnects will be best realized with an OTDM network architecture. To fully use the bandwidth of optical fiber, we spaced the picosecond pulses closely together (about 10 ps) and typically applied a return-to-zero modulation format. While the total capacity of TDM and wavelength division multiplexing (WDM) networks may essentially be the same, TDM systems have better throughput delay performance. They also have faster, single-channel access times for high-data-rate end users such as HDTV video servers, terabyte-media data banks, and supercomputers  相似文献   

5.
Hybrid optical/electrical interconnects using commercial optical circuit switches have been previously proposed as an attractive alternative to fully-connected electronically-switched networks. Among other advantages, such a design offers increased port density, bandwidth/port, cabling and energy efficiency, compared to conventional packet-switched counterparts. Recent proposals for such system designs have looked at small and/or medium scale networks employing hybrid interconnects. In our previous work, we presented a hybrid optical/electrical interconnect architecture targeting large-scale deployments in high-performance computing and datacenter environments. To reduce complexity, our architecture employs a regular shuffle network topology that allows for simple management and cabling. Thanks to using a single-stage core interconnect and multiple optical planes, our design can be both incrementally scaled up (in capacity) and scaled out (in the number of racks) without requiring major re-cabling and network re-configuration. In this paper, we extend the fundamentals of our existing work towards quantifying and understanding the performance of these type of systems against more diverse workload communication patterns and system design parameters. In this context, we evaluate–among other characteristics–the overhead of the reconfiguration (decomposition and routing) scheme proposed and extend our simulations to highly adversarial flow generation rate/duration values that challenge the reconfiguration latency of the system.  相似文献   

6.
The paper presents the development and the performance of a novel bus based message passing interconnection scheme which can be used to join a large number of INMOS transputers via their serial communication links. The main feature of this architecture is that it avoids the communication overhead which occurs in systems where processing nodes relay communications to their neighbors. It also produces a flexible and scalable machine whose attractive characteristics are its simplicity and low latency for large configurations. We show that this architecture is free from deadlock, exhibits much smaller latency than most directly connected transputer networks and has a scalable bandwidth, in contrast to other bus topologies  相似文献   

7.
High-radix switches are desirable building blocks for large computer interconnection networks, because they are more suitable to convert chip I/O bandwidth into low latency and low cost than low-radix switches [J. Kim, W.J. Dally, B. Towles, A.K. Gupta, Microarchitecture of a high-radix router, in: Proc. ISCA 2005, Madison, WI, 2005]. Unfortunately, most existing switch architectures do not scale well to a large number of ports, for example, the complexity of the buffered crossbar architecture scales quadratically with the number of ports. Compounded with support for long round-trip times and many virtual channels, the overall buffer requirements limit the feasibility of such switches to modest port counts. Compromising on the buffer sizing leads to a drastic increase in latency and reduction in throughput, as long as traditional credit flow control is employed at the link level. We propose a novel link-level flow control protocol that enables high-performance scalable switches that are based on the increasingly popular buffered crossbar architecture, to scale to higher port counts without sacrificing performance. By combining credited and speculative transmission, this scheme achieves reliable delivery, low latency, and high throughput, even with crosspoint buffers that are significantly smaller than the round-trip time. The proposed scheme substantially reduces message latency and improves throughput of partially buffered crossbar switches loaded with synthetic uniform and non-uniform bursty traffic. Moreover, simulations replaying traces of several typical MPI applications demonstrate communication speedup factors of 2 to 10 times.  相似文献   

8.
Large-scale distributed shared-memory multiprocessors (DSMs) provide a shared address space by physically distributing the memory among different processors. A fundamental DSM communication problem that significantly affects scalability is an increase in remote memory latency as the number of system nodes increases. Remote memory latency, caused by accessing a memory location in a processor other than the one originating the request, includes both communication latency and remote memory access latency over I/O and memory buses. The proposed architecture reduces remote memory access latency by increasing connectivity and maximizing channel availability for remote communication. It also provides efficient and fast unicast, multicast, and broadcast capabilities, using a combination of aggressively designed multiplexing techniques. Simulations show that this architecture provides excellent interconnect support for a highly scalable, high-bandwidth, low-latency network.  相似文献   

9.
Cost-effective designs of WDM optical interconnects   总被引:1,自引:0,他引:1  
Optical communication, in particular, wavelength-division-multiplexing (WDM) technique, has become a promising networking choice to meet ever-increasing demands on bandwidth from emerging bandwidth-intensive computing/communication applications, such as data browsing in the World Wide Web, multimedia conferencing, e-commerce, and video-on-demand services. As optics becomes a major networking media in all communications needs, optical interconnects will inevitably play an important role in interconnecting processors in parallel and distributed computing systems. We consider a cost-effective design of WDM optical interconnects for current and future generation parallel and distributed computing and communication systems. We first categorize WDM optical interconnects into two different connection models based on their target applications: the wavelength-based model and the fiber-link-based model. Most of existing WDM optical interconnects belong to the first category. We then present a minimum cost design for WDM optical interconnects under wavelength-based model by using sparse crossbar switches instead of full crossbar switches in combination with wavelength converters. For applications that use the fiber-link-based model, we show that network cost can be significantly reduced, and present such a minimum cost design for WDM optical interconnects under this model. Finally, we generalize the idea used in the design for the fiber-link-based model to WDM optical interconnects under the wavelength-based model, and obtain another new design that can trade off switch cost with wavelength converter cost in this type of WDM optical interconnect. The results in this paper are applicable to any emerging optical switching technologies, such as SOA-based and MEMS-based technologies.  相似文献   

10.
配置流驱动计算体系结构指导下的ASIP设计   总被引:1,自引:0,他引:1  
为了兼顾嵌入式处理器设计中的灵活性与高效性,提出配置流驱动计算体系结构.在体系结构设计中将软/硬件界面下移,使功能单元之间的互连网络对编译器可见,并由编译器来完成传输路由,从而支持复杂但更为高效的互连网络.在该体系结构指导下,提出一种支持段式可重构互连网络的专用指令集处理器(ASIP)设计方法.该方法应用到密码领域的3类ASIP设计中表明,与简单总线互连相比,在不影响性能的前提下,可平均节约53%的互连功耗和38.7%的总线数量,从而达到减少总线数量、降低互连功耗的目的.  相似文献   

11.
互连网络是高性能计算系统和数据中心的核心组件之一,也是决定其系统整体性能的全局性基础设施。随着高性能计算、云计算和大数据技术的迅速发展,传统的电互连网络在性能、能耗和成本等方面无法满足高性能计算应用和数据中心业务的大规模可扩展通信需求,面临着严峻的挑战。为此,近年来相关研究者提出了多种面向高性能计算和数据中心的可重构的光互连网络结构。首先阐明了光互连网络相对于电互连网络的优势;然后介绍了几种典型的可重构光互连网络体系结构,并对其特点进行了分析比较;最后探讨了可重构光互连网络的发展趋势。  相似文献   

12.
In recent years, the need for high-performance network monitoring tools, which can cope with rapidly increasing network bandwidth, has become vital. A possible solution is to utilize the processing power of multi-core processors that nowadays are available as commercial-off-the-shelf (COTS) hardware. In this paper, we introduce a software solution for wire-speed packet capturing and transmission for TCP/IP networks under Linux operating system, called DashCap. The results of our experimental evaluations show that the proposed solution causes more than two times performance boost for packet capturing in comparison to the existing software solutions under Linux. We have proposed a scalable software architecture for network monitoring tools called DashNMon, which is based on DashCap. Multi-core awareness is a distinguished property of this architecture. Comparing to the existing cluster-based solutions, DashNMon can be used with COTS multi-core processors. In order to evaluate the proposed solutions, we have developed several prototype tools. The results of the experiments carried out using these tools show the scalability and high performance of the network monitoring tools that are based on the proposed architecture. Using the proposed architecture, it is possible to design and implement high-performance multi-threaded network intrusion detection systems (NIDSs) or application-layer firewalls, completely in the user space and with better utilization of the computational resources of multi-processor/multi-core systems.  相似文献   

13.
This paper extends research into rhombic overlapping-connectivity interconnection networks into the area of parallel applications. As a foundation for a shared-memory non-uniform access bus-based multiprocessor, these interconnection networks create overlapping groups of processors, buses, and memories, forming a clustered computer architecture where the clusters overlap. This overlapping-membership characteristic is shown to be useful for matching parallel application communication topology to the architecture's bandwidth characteristics. Many parallel applications can be mapped to the architecture topology so that most or all communication is localized within an overlapping cluster, at the low latency of processor direct to cache (or memory) over a bus. The latency of communication between parallel threads does not degrade parallel performance or limit the graininess of applications. Parallel applications can execute with good speedup and scaling on a proposed architecture which is designed to obtain maximum advantage from the overlapping-cluster characteristic, and also allows dynamic workload migration without moving the instructions or data. Scalability limitations of bus-based shared-memory multiprocessors are overcome by judicious workload allocation schemes, that take advantage of the overlapping-cluster memberships. Bus-based rhombic shared-memory multiprocessors are examined in terms of parallel speedup models to explain their advantages and justify their use as a foundation for the proposed computer architecture. Interconnection bandwidth is maximized with bi-directional circular and segmented overlapping buses. Strategies for mapping parallel application communication topologies to rhombic architectures are developed. Analytical models of enhanced rhombic multiprocessor performance are developed with a unique bandwidth modeling technique, and are compared with the results of simulation.  相似文献   

14.
Cut-through switching promises low latency delivery and has been used in new generation switches, especially in high speed networks demanding low communication latency. The interconnection of cut-through switches provides an excellent network platform for high speed local area networks (LANs). For cost and performance reasons. Irregular topologies should be supported in such a switch-based network. Switched irregular networks are truly incrementally scalable and have potential to be reconfigured to adapt to the dynamics of network traffic conditions. Due to the arbitrary topologies of networks, it is critical to develop an efficient deadlock-free routing algorithm. A novel deadlock-free adaptive routing algorithm called adaptive-trail routing is proposed to allow irregular interconnection of cut-through switches. The adaptive routing algorithm is based on two unidirectional adaptive trails constructed from two opposite unidirectional Eulerian trails. Some heuristics are suggested in terms of the selection of Eulerian trails, the avoidance of long routing paths, and the degree of adaptivity. Extensive simulation experiments are conducted to evaluate the performance of the proposed and two other routing algorithms under different topologies and traffic workloads  相似文献   

15.
Many corporate network managers and service providers believe that more capacity can address bandwidth demands as well as delays in message transfer. As the sidebar "Optical Technology and Wavelength-Division Multiplexing" briefly describes, the solution of choice is optical technology that uses wavelength-division multiplexing (WDM). The implication of WDM is that network managers can improve their Internet application and network response time by leasing more capacity from service providers - response time being the sum of message (request and response) and application-processing delays. But although WDM transmission is promising, it presents a trade-off between bandwidth and message-transfer latency - a trade-off that is particularly significant for long-haul networks where latency begins to dominate and offset any additional bandwidth advantage for single-message transfers. To help IT network managers get the most from WDM-based technology, the authors have derived a model that establishes a boundary bandwidth for single message transfers. The boundary represents the point at which latency begins to offset any advantage from additional bandwidth. The model clearly shows that bandwidth management is as critical to the success of this technology as having access to a large supply of raw bandwidth.  相似文献   

16.
With the development of Multi-Processor System-on-Chip (MPSoC) in recent years, the intra-chip communication is becoming the bottleneck of the whole system. Current electronic network-on-chip (NoC) designs face serious challenges, such as bandwidth, latency and power consumption. Optical interconnection networks are a promising technology to overcome these problems. In this paper, we study the routing problem in optical NoCs with arbitrary network topologies. Traditionally, a minimum hop count routing policy is employed for electronic NoCs, as it minimizes both power consumption and latency. However, due to the special architecture of current optical NoC routers, such a minimum-hop path may not be energy-wise optimal. Using a detailed model of optical routers we reduce the energy-aware routing problem into a shortest-path problem, which can then be solved using one of the many well known techniques. By applying our approach to different popular topologies, we show that the energy consumed in data communication in an optical NoC can be significantly reduced. We also propose the use of optical burst switching (OBS) in optical NoCs to reduce control overhead, as well as an adaptive routing mechanism to reduce energy consumption without introducing extra latency. Our simulation results demonstrate the effectiveness of the proposed algorithms.  相似文献   

17.
Multicomputers built around a general network are an attractive architecture for a wide class of applications. The architecture provides many benefits compared with special-purpose approaches, including heterogeneity, reuse of application and system code, and sharing of resources. The architecture also poses new challenges to both computer system implementers and users. First, traditional local-area networks do not have enough bandwidth and create a communication bottleneck, thus seriously limiting the set of applications that can be run effectively. Second, programmers have to deal with large bodies of code distributed over a variety of architectures, and work in an environment where both the network and nodes are shared with other users. Our experience in the Nectar project shows that it is possible to overcome these problems. We show how networks based on high-speed crossbar switches and efficient protocol implementations can support high bandwidth and low latency communication while still enjoying the flexibility of general networks, and we use three applications to demonstrate that network-based multicomputers are a practical architecture. We also show how the network traffic generated by this new class of applications poses severe requirements for networks  相似文献   

18.
In this work we propose a fine grained approach with self-adaptive migration rate for distributed evolutionary computation. Our target is to gain some insights on the effects caused by communication when the algorithm scales. To this end, we consider a set of basic topologies in order to avoid the overlapping of algorithmic effects between communication and topological structures. We analyse the approach viability by comparing how solution quality and algorithm speed change when the number of processors increases and compare it with an Island model based implementation. A finer-grained approach implies a better chance of achieving a larger scalable system; such a feature is crucial concerning large-scale parallel architectures such as peer-to-peer systems. In order to check scalability, we perform a threefold experimental evaluation of this model: first, we concentrate on the algorithmic results when the problem scales up to eight nodes in comparison with how it does following the Island model. Second, we analyse the computing time speedup of the approach while scaling. Finally, we analyse the network performance with the proposed self-adaptive migration rate policy that depends on the link latency and bandwidth. With this experimental setup, our approach shows better scalability than the Island model and a equivalent robustness on the average of the three test functions under study.  相似文献   

19.
Innovative efforts to provide a clean-slate design of congestion control for future high-speed heterogeneous networks have recently led to the development of explicit congestion control. These methods rely on multi-byte router feedback and aim to contribute to the design of a more scalable Internet of tomorrow. However, experimental evaluation and deployment experience with these approaches is still limited to either low bandwidth networks or simple topologies. In addition, all existing implementations are exclusively applicable to either rate- or window-based protocols and are unable to study performance of different protocols on a common platform. This paper aims to fill this void by designing a unified Linux implementation framework for both rate- and window-based methods that does not incur any interference with the system’s network stack or applications. Using this implementation, we implement and reveal several key properties of four recent explicit congestion control protocols XCP, JetMax, RCP, and PIQI-RCP using Emulab’s gigabit testbed with a ariety of simple yet representative network topologies. Our experiments not only confirm the known behavior of these methods, but also demonstrate their previously undocumented properties (e.g., RCP’s transient overshoot under abrupt traffic load changes and JetMax’s low utilization in the presence mice flows).  相似文献   

20.
薛媛  王晟  徐世中 《计算机应用研究》2008,25(12):3761-3764
为了能够更好地支持突发性数据业务,提出了一种新型网络交换结构——基于环路的混合交换光网络(cyclebased hybrid switching optical networks,CHSON)。该网络结合了光电路交换(OCS)和光突发交换(OBS)两种交换技术,不仅可以有效地降低网络节点的分组转发压力,而且能够较好地承载突发性数据业务。首先介绍了CHSON的网络结构和虚拓扑设计,然后阐述了节点设计及其执行流程。仿真表明,CHSON具有比OCS网络更低的丢包率,而且在丢包率和平均分组延时方面,CHSON较  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号