期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

分级环片上网络互连 总被引：1，自引：0，他引：1

王炜乔林杨广文汤志忠《计算机学报》2010,33(2)

在大规模、超大规模片上互连网络中,因为二维互连方式的性能较差而使多维互连方式成为可选方案之一.文中首先基于区域划分设计了一种分级环互连结构,分析了其静态互连特性,然后基于卡诺图编码设计了一种分级环互连的路由结构以及寻径方法,在均匀通信模式测试了不同的分级环级联链路缓冲区设置方法下网络的性能,详细分析了按照等比序列设置分级环级联链路缓冲区时分级环互连方式的动态网络特性,最后根据互连性能与Mesh等二维片上互连方式比较的结果,给出了分级环互连方式的使用场合.实验结果表明,虽然在较小规模网络中性能较差,但是分级环互连方式能以较低的成本、较高的性能实现大规模、超大规模片上网络的互连,其中单环分级互连方式在较低网络负载下综合性能更好,而双环分级互连方式则具有更大的网络负载能力,在较高网络负载下性能更好. 相似文献

2.

一种递归定义的可扩展片上网络拓扑结构 总被引：1，自引：0，他引：1

朱晓静《计算机学报》2011,34(5):924-930

晶体管工艺的持续发展导致片上处理器数的逐渐增多,片上系统的核间通信要求吞吐量高、延时低、可扩展性好,传统的片上总线和crossbar互连结构已无法满足片上系统的通信需求,为此研究者提出新的片上互连结构,称为片上网络.为满足片上网络的特有通信需求,提出了一种可扩展的拓扑结构Rgrid及其路由算法DR,它缩短了片上处理器间... 相似文献

3.

一种新型片上网络互连结构的仿真和实现 总被引：2，自引：0，他引：2

陈芳露陆雯青虞志益周晓方《小型微型计算机系统》2010,31(5)

综合性能、硬件实现等方面考虑,提出一种基于片上网络的互连拓扑结构-层次化路由结构MLR(Multi-Layer Router).该结构通过层次化设计减小网络直径,具有良好的对称性和扩展性.网络建模仿真和硬件实现结果显示,在不同网络负载和不同IP核节点数的情况下,MLR与传统结构相比,在处理网络通信时,对于网络丢包率、通信延迟和网络吞吐量等网络性能参数均有最多50%-70%的提升;同时通过共享路由的方式,减少了超过20%的芯片面积和40%以上的动态功耗,有效降低了互连结构的硬件开销相似文献

4.

SoC片上通信结构的研究综述

周文彪张岩毛志刚《微处理机》2007,28(3):1-5

由于系统芯片中IP核数目的逐渐增大,片上通信结构逐渐成了整个SoC的性能瓶颈,基于共享总线的SoC通信结构具有无法克服的局限性,这就对传统的共享总线片上通信系统提出了严峻的挑战。文中全面综述了近些年片上通信系统方面的研究,分析了共享总线,交叉开关,点到点,片上网络NoC(Network on Chip),混合互连五种片上通信结构的优缺点,以及对整个系统芯片性能的影响.最后指出片上通信研究的方向。相似文献

5.

片上光网络:一种新型片上互连网络

计永兴钱悦崔大为窦文华《计算机工程与科学》2011,33(4):56-61

随着单个芯片上集成的处理器的个数越来越多,传统的电互连网络已经无法满足对互连网络性能的需求,需要一种新的互连方式,因此光互连网络技术应运而生.目前,电互连的片上网络在功耗、性能、带宽、延迟等方面遇到了瓶颈,而光互连作为一种新的互连方式引用到片上网络具有低损耗、高吞吐率、低延迟等无可比拟的优势.本文主要探讨了片上光网络的... 相似文献

6.

HHSR:一种命令与数据分传片上网络原型

王炜乔林汤志忠李清宝《计算机科学》2012,39(4):299-303

在前面工作的基础上,根据大规模、超大规模片上网络互连结构的性能特点,针对网络所传输信息的不同特性以及对传输的不同要求,提出了一种命令与数据分传的片上网络原型系统HHSR。该原型系统分别在两套具有不同拓扑结构的片上网络中传输命令和数据,选取速度较快且综合性能较好的单环分级互连网络用于命令包的传输,以满足其实时性的要求,选取速度稍慢但成本较低的六边形Mesh网格用于数据包的传输。实验结果表明,这种命令与数据分传的片上网络原型系统在牺牲一定的数据包传送时间和花费一定成本的基础上,保证和提高了命令与控制信息的传送速度,从而保证和提高了整个片上多处理器的性能。相似文献

7.

二维片上网络局部均匀随机通信性能分析

王炜乔林杨广文汤志忠《计算机研究与发展》2010,47(3)

作为对全局均匀随机通信二维片上网络性能分析的延续和深入,首先描述了全局均匀随机通信模式和局部均匀随机通信模式的数学模型,分析了二者的关系;然后用链路数表示通信成本,基于作者独立设计的片上网络路由与通信协议,分析了不同结构和规模各结构网络性能随局部通信概率变化而变化的规律,并依据几种结构的性能相互关系及结构特点对它们进行了简单分类.结果表明,全局均匀随机通信模式其实是局部均匀随机通信模式的特例,随着局部通信概率的增大,各种结构的网络性能逐步提高;相比较而言,四边形、三角形网眼Mesh网络及其变形结构更适合于在本地通信概率较小或者通信密集型的应用,而当本地通信概率较大或者通信强度较低的情况下应用六边形网眼Mesh及其变形结构、多环相切及其回绕结构可能会取得更好的综合性能. 相似文献

8.

片上互连网络的功耗特征分析与优化

孙晓乐钱亚龙齐新新张云放陈娟袁远董勇《计算机工程与科学》2020,42(7):1141-1150

随着处理器核数的增加,片上互连网络NoC结构日趋复杂,导致片上互连网络功耗所占的比重和功耗分析的难度也在增加。片上互连网络的任务映射,既要保证多处理器核心之间通信的高性能,又要保证耗费尽可能少的功耗和面积,即在有限的功耗和面积开销下获得较高的性能。在进行任务映射时,核心之间的通信距离是减少任务通信功耗的关键。连续且近凸的区域有助于缩短任务的通信距离。分析了一种功耗最优的片上互连网络启发式映射算法（INC）,该算法由区域选择算法和节点映射算法组成。对区域选择算法的2个因子进行了改进,使应用总的通信开销最小化且保证后续应用以很小的通信代价进行区域选择。提出了新的基于选择区域的映射算法。它们在动态到达程序映射问题中的实验结果表明,新的区域选择算法和节点映射算法相比于INC,可以减少12.10%的通信功耗,并且带来11.23%的通信延迟优化。相似文献

9.

访存敏感的增量式MPSoC应用映射

王一拙左琦计卫星王小军石峰《计算机研究与发展》2015,52(5)

现代多处理器片上系统(multiprocessor system-on-chip,MPSoC)通常采用片上网络(network-on-chip,NoC)作为其基本互连结构,应用映射是基于片上网络互连的MPSoC设计中的关键问题,应用映射决定应用划分成的各个任务到片上网络节点的分配.许多基于片上网络互连的MPSoC系统将共享存储作为网络中的独立节点,针对这类MPSoC系统,提出一种访存敏感的增量式动态映射策略.该策略离线分析获取应用的访存特征,运行中当应用到达系统时,根据其访存特征选择不同的映射算法,将热点应用围绕共享存储器布局,非热点应用远离共享存储器布局,并最小化应用间以及应用所含任务间的通信链路竞争.模拟实验表明:与贪恋区域选择加随机节点映射的策略相比较,提出的策略对系统整体通信功耗平均节约34.6％,性能提升可达36.3％,并能适应不同片上网络规模. 相似文献

10.

吞吐量和延时约束下的片上通信结构的Pareto空间优化

曹亚菲王大伟李思昆《计算机研究与发展》2009,46(Z1)

SoC中各IP核之间的互连结构是决定片上系统性能的关键因素.近年来,片上互连通信结构的配置与优化成为SoC通信综合的研究重点和热点,而已有方法优化SoC互连通信结构的仿真速度较慢,支持设计自动化的能力较差,使用的单目标优化算法无法解决多个性能目标之间的冲突.针对以上不足提出了吞吐量和延时约束下的片上互连通信结构的自动配置与优化的方法,该方法提出了片上总线互连通信结构模板,使用事务级通信仿真和多目标演化算法,探索吞吐量和延时约束下的多目标Pareto空间.与已有的先进Srinivasan方法相比,该方法的吞吐量提高10%,传输延迟降低17%,有效提高了SoC互连通信结构的优化质量. 相似文献

11.

A generic FPGA prototype for on-chip systems with network-on-chip communication infrastructure

Mohammad Arjomand Amirali Boroumand Hamid Sarbazi-Azad 《Computers & Electrical Engineering》2014

As System-on-Chips (SoCs) grow in complexity and size, proposals of networks-on-chip (NoCs) as the on-chip communication infrastructure are justified by reusability, scalability, and energy efficiency provided by the interconnection networks. Simulation and mathematical analysis offer flexibility for the evaluations under various network configurations. However, the accuracy of such analyzing methods largely depends on the approximations made. On the other hand, prototyping can be used to improve the evaluation accuracy by bringing the design closer to reality. In this paper, we propose a FPGA prototype that is general enough to model different video-processing SoCs where different cores communicate via NoC. To model NoC, we accurately implement a fully-synthesized on-chip router supporting multiple virtual channels. For the processing nodes, on the other side, we propose a general and simple traffic generator capable of modeling different synthetic functions (i.e. Poisson and self-similar). Indeed, the application traffic is modeled using 1-D hybrid cellular automata which can effectively generate high quality pseudorandom patterns. Finally, for the energy efficiency, the proposed prototype is capable to support multiple frequency regions. To realize the voltage–frequency island partitioned SoC, we use the utilities that Xilinx FPGA platform offers to design Globally Synchronous Locally Asynchronous (GALS) systems via Delay-Locked Loop elements. 相似文献

12.

Microring fault-resilient photonic network-on-chip for reliable high-performance many-core systems

Michael Meyer Yuichi Okuyama Abderazek Ben Abdallah 《The Journal of supercomputing》2017,73(4):1567-1599

Photonic networks-on-chip (PNoCs) have emerged as a promising alternative to the conventional metal-based networks-on-chip due to their advantages in bandwidth density, power efficiency and propagation speed. Existing works on PNoCs concentrate on architectures of photonic networks with the assumption that the underlying photonic infrastructure operates correctly and reliably. However, the key optical device in PNoC systems, microring resonators (MRs), is very sensitive to temperature fluctuation and manufacturing errors. A single MR failure can cause messages to be misdelivered or lost, which results in bandwidth loss or even complete failure of the whole system. In this paper, we present a fault-tolerant Photonic Network-on-Chip architecture, named FT-PHENIC, which uses minimal redundancy to ensure accuracy of packet transmission even after faulty microring resonators (MRs) are detected. FT-PHENIC is based on a microring fault-resilient photonic router (FTTDOR) and an adaptive path-configuration and routing algorithm. Simulation results show that FT-PHENIC tolerates MR faults quite well up until around when 20 % of the MRs have failed, and has minimal bandwidth degradation and power drawbacks. 相似文献

13.

Sharded Router: A novel on-chip router architecture employing bandwidth sharding and stealing

Junghee Lee Chrysostomos Nicopoulos Hyung Gyu Lee Jongman Kim 《Parallel Computing》2013

Packet-based networks-on-chip (NoC) are considered among the most viable candidates for the on-chip interconnection network of many-core chips. Unrelenting increases in the number of processing elements on a single chip die necessitate a scalable and efficient communication fabric. The resulting enlargement of the on-chip network size has been accompanied by an equivalent widening of the physical inter-router channels. However, the growing link bandwidth is not fully utilized, because the packet size is not always a multiple of the channel width. While slicing of the physical channel enhances link utilization, it incurs additional delay, because the number of flit per packet also increases. This paper proposes a novel router micro-architecture that employs fine-grained bandwidth “sharding” (i.e., partitioning) and stealing in order to mitigate the elevation in the zero-load latency caused by slicing. Consequently, the zero-load latency of the Sharded Router becomes identical with that of a conventional router, whereas its throughput is markedly improved by fully utilizing all available bandwidth. Detailed experiments using a full-system simulation framework indicate that the proposed router reduces the average network latency by up to 19% and the execution time of real multi-threaded workloads by up to 43%. Finally, hardware synthesis analysis verifies the modest area overhead of the Sharded Router over a conventional design. 相似文献

14.

Detailed and clock-driven simulation for HPC interconnection network

Wenhao ZHOU Juan CHEN Chen CUI Qian WANG Dezun DONG Yuhua TANG 《Frontiers of Computer Science》2016,10(5):797-811

Performance and energy consumption of high performance computing (HPC) interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation platform is very important for the research on HPC software and hardware technologies. To effectively evaluate the performance and energy consumption of HPC interconnection networks, this article designs and implements a detailed and clock-driven HPC interconnection network simulation platform, called HPC-NetSim. HPC-NetSim uses applicationdriven workloads and inherits the characteristics of the detailed and flexible cycle-accurate network simulator. Besides, it offers a large set of configurable network parameters in terms of topology and routing, and supports router’s on/off states.We compare the simulated execution time with the real execution time of Tianhe-2 subsystem and the mean error is only 2.7%. In addition, we simulate the network behaviors with different network structures and low-power modes. The results are also consistent with the theoretical analyses. 相似文献

15.

一种基于Chiplet集成技术的超高阶路由器设计

梁崇山戴艺徐炜遐《计算机工程与科学》2022,44(2):207-213

高带宽、低延迟的高阶路由器对于构建大规模可扩展的互连网络有着重要的作用,但是受限于单个路由芯片设计复杂度的不断增加以及摩尔定律、登纳德缩放定律的放缓与停滞,在单个路由芯片上扩展更多的端口数将变得越来越难。Chiplet将多个裸片以特定的方式集成在一个高级封装内,形成具有特定功能的大芯片,以此解决芯片设计中涉及的规模、研制成本和周期等方面的问题。根据Chiplet集成技术的思想,利用已有的路由芯片,提出了一种基于Chiplet的128端口高阶路由器,这种高阶路由器内部是一个由多个Switch Die以二层胖树拓扑构成的网络。通过实际的RTL级代码仿真测试,对比于单芯片的高阶路由器设计方式,所设计的路由器在扩展了更多端口数的同时,还能够达到较好的性能。相似文献

16.

Evaluation of Crossbar Architectures for Deadlock Recovery Routers

《Journal of Parallel and Distributed Computing》2001,61(1):49-78

The performance of interconnection networks is significantly affected by router speed and routing adaptivity, which can be competing factors. To achieve a high-speed, true-fully-adaptive router design, this paper explores the exploitation of dynamic routing behavior identified as routing locality. When routing locality is exploited, it enables the internal crossbar of a router to be partitioned into smaller and faster units without sacrificing true-fully-adaptive routing capabilities. Extensive evaluation of partitioned crossbar designs which exploit routing locality shows that the increased adaptivity offered by deadlock recovery-based routing algorithms can be implemented in routers without sacrificing router speed. The partitioned crossbar designs reduce average message latency by up to 65% and increase maximum network throughput by up to 51%. 相似文献

17.

神威E级原型机互连网络和消息机制

高剑刚卢宏生何王全任秀江陈淑平斯添浩周舟胡舒凯于康魏迪《计算机学报》2021,44(1):222-234

本文描述了神威E级原型机的互连网络和消息机制.神威E级原型机是继神威蓝光、神威?太湖之光之后神威家族的第三代计算机.该计算机作为一台E级计算机的原型机,峰值性能3.13PFlops,其最大的特色之一就是采用28Gbps传输技术,设计开发了新一代的神威高阶路由器和神威高性能网络接口两款芯片,在传统胖树的基础上,设计了双轨... 相似文献

18.

pp-mess-sim: a flexible and extensible simulator for evaluatingmulticomputer networks

Rexford J. Wu-Chang Feng Dolter J. Shin K.G. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(1):25-40

The paper presents pp-mess-sim, an object-oriented discrete-event simulation environment for evaluating interconnection networks in message-passing systems. The simulator provides a toolbox of various network topologies, communication workloads, routing-switching algorithms, and router models. By carefully defining the boundaries between these modules, pp-mess-sim creates a flexible and extensible environment for evaluating different aspects of network design. The simulator models emerging multicomputer networks that can support multiple routing and switching schemes simultaneously; pp-mess-sim achieves this flexibility by associating routing-switching policies, traffic patterns, and performance metrics with collections of packets, instead of the underlying router model. Besides providing a general framework for evaluating router architectures, pp-mess-sim includes a cycle-level model of the PRC, a programmable router for point-to-point distributed systems. The PRC model captures low-level implementation details, while another high-level model facilitates experimentation with general router design issues. Sample simulation experiments capitalize on this flexibility to compare network architectures under various application workloads 相似文献