期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

MMNNN: A tree-based Multicast Mechanism for NoC-based deep Neural Network accelerators

《Microprocessors and Microsystems》2021

Network-on-Chip (NoC) devices have been widely used in multiprocessor systems. In recent years, NoC-based Deep Neural Network (DNN) accelerators have been proposed to connect neural computing devices using NoCs. Such designs dramatically reduce off-chip memory accesses of these platforms. However, the large number of one-to-many packet transfers significantly degrade performance with traditional unicast channels. We propose a multicast mechanism for a NoC-based DNN accelerator called Multicast Mechanism for NoC-based Neural Network accelerator (MMNNN). To do so, we propose a tree-based multicast routing algorithm with excellent scalability and the ability to minimize the number of packets in the network. We also propose a router architecture for single-flit packets. Our proposed router transfers flits to multiple destinations in a single process and has no head-of-line blocking issue, offering higher throughput and lower latency than traditional wormhole router architectures. Simulation results show that our proposed multicast mechanism offers excellent performance in classification latency, average packet latency, and energy consumption. 相似文献

2.

A parametric-based performance evaluation and design trade-offs for interconnect architectures using FPGAs for networks-on-chip

Sani Abba Jeong-A Lee 《Microprocessors and Microsystems》2014

Network-on-Chip (NoC) interconnect fabrics are categorized according to trade-offs among latency, throughput, speed, and silicon area, and the correctness and performance of these fabrics in Field-Programmable Gate Array (FPGA) applications are assessed through experimentation and simulation. In this paper, we propose a consistent parametric method for evaluating the FPGA performance of three common on-chip interconnect architectures namely, the Mesh, Torus and Fat-tree architectures. We also investigate how NoC architectures are affected by interconnect and routing parameters, and demonstrate their flexibility and performance through FPGA synthesis and testing of 392 different NoC configurations. In this process, we found that the Flit Data Width (FDW) and Flit Buffer Depth (FBD) parameters have the heaviest impact on FPGA resources, and that these parameters, along with the number of Virtual Channels (VCs), significantly affect reassembly buffering and routing and logic requirements at NoC endpoints. Applying our evaluation technique to a detailed and flexible cycle accurate simulation, we drive the three NoC architectures using benign (Nearest Neighbor and Uniform) and adversarial (Tornado and Random Permutation) traffic patterns with different numbers of VCs, producing a set of load–delay curves. The results show that by strategically tuning the router and interconnect parameters, the Fat-tree network produces the best utilization of FPGA resources in terms of silicon area, clock frequency, critical path delays, network cost, saturation throughput, and latency, whereas the Mesh and Torus networks showed comparatively high resource costs and poor performance under adversarial traffic patterns. From our findings it is clear that the Fat-tree network proved to be more efficient in terms of FPGA resource utilization and is compliant with the current Xilinx FPGA devices. This approach will assist engineers and architects in establishing an early decision in the choice of right interconnects and router parameters for large and complex NoCs. We demonstrate that our approach substantially improves performance under a large variety of experimentation and simulation which confirm its suitability for real systems. 相似文献

3.

基于多FPGA的NoC多核处理器验证平台设计 总被引：1，自引：0，他引：1

黄晓林潘红兵易伟杨虎凌梦黄辰何书专李丽《计算机工程与设计》2012,33(1):180-185

为了能够灵活地验证和实现自主设计的基于NoC的多核处理器,缩短NoC多核处理器的设计周期,提出了设计集成4片Virtex-6-550T FPGA的NoC多核处理器原型芯片设计/验证平台.分析和评估了NoC多核处理器的规模以及对FPGA硬件资源的需求,在此基础上给出了集成4片FPGA的开发板详细设计方案,并对各主要模块如互联架构、电源、板级时钟分布、接口技术、存储资源等关键设计要点进行阐述.描述了开发板各个主要模块的测试过程和结果,表明了该设计的可行性. 相似文献

4.

An analytical performance model for the Spidergon NoC with virtual channels

Mahmoud Moadeli Alireza Shahrabi Wim Vanderbauwhede Partha Maji 《Journal of Systems Architecture》2010,56(1):16-26

The Spidergon Network-on-Chip (NoC) was proposed to address the demand for a fixed and optimized communication infrastructure for cost-effective multi-processor Systems-on-Chip (MPSoC) development. To deal with the increasing diversity in quality of service requirements of SoC applications, the performance of this architecture needs to be improved. Virtual channels have traditionally been employed to enhance the performance of the interconnect networks. In this paper, we present analytical models to evaluate the message latency and network throughput in the Spidergon NoC and investigate the effect of employing virtual channels. Results obtained through simulation experiments show that the model exhibits a good degree of accuracy in predicting average message latency under various working conditions. Moreover an FPGA implementation of the Spidergon has been developed to provide an accurate analysis of the cost of employing virtual channels in this architecture. 相似文献

5.

NISHA: A fault-tolerant NoC router enabling deadlock-free Interconnection of Subnets in Hierarchical Architectures

《Journal of Systems Architecture》2013,59(7):551-569

Decrease in the Integrated Circuit (IC) feature sizes leads to the increase in the susceptibility to transient and permanent errors. The growing rate of such errors in ICs intensifies the need for a wide range of solutions addressing reliability at various levels of abstractions. Network on Chip (NoC) architecture has been introduced to address the increasing demand for communication bandwidth among processing cores. The structural redundancy inherited in NoC-based system can be leveraged to improve reliability and compensate for the effects of failures. In this paper, we propose a fault-tolerant NoC router NISHA, which stands for No-deadlock Interconnection of Subnets in Hierarchical Architectures. Armed with a new flow control mechanism, as well as an enhanced Virtual Channel (VC) regulator, the proposed router can mitigate the effects of both transient and permanent errors. A Dynamic/Static virtual channel allocation with respect to the local and global traffic is supported in NISHA; thereby, it maintains a deadlock-free state in the presence of routers or link failures in hierarchical topologies. Experimental results show an enhanced operation of NoC applications as well as the decrease in the average latency and energy consumption. 相似文献

6.

复用NoC测试SoC内嵌IP芯核的测试规划研究 总被引：1，自引：0，他引：1

下载免费PDF全文

赵建武师奕兵王志刚《计算机工程与应用》2010,46(15):60-63

测试规划是SoC芯片测试中需要解决的一个重要问题。一种复用片上网络测试内嵌IP芯核的测试规划方法被用于限制测试模式下SoC芯片功耗不超出最大芯片功耗范围,消除测试资源共享所引起的冲突,达到减小测试时间的目的。提出了支持测试规划的无拥塞路由算法和测试扫描链优化配置方法。使用VHDL硬件描述语言实现了在FPGA芯片中可综合的二维Mesh片上网络测试平台,用于片上网络性能参数、路由算法以及基于片上网络的SoC芯片测试方法的分析评估。相似文献

7.

A virtual prototyping system for rapid product development 总被引：4，自引：0，他引：4

S.H. Choi^{Author Vitae} A.M.M. Chan Author Vitae 《Computer aided design》2004,36(5):401-412

This paper describes a virtual prototyping (VP) system that integrates virtual reality with rapid prototyping (RP) to create virtual or digital prototypes to facilitate product development. The proposed VP system incorporates two new simulation methodologies, namely the dexel-based and the layer-based fabrication approaches, to simulate the powder-based and the laminated sheet-based RP processes, respectively. The dexel-based approach deposits arrays of solid strips to form a layer, while the layer-based approach directly forms a complete layer by extruding the slice contours. The layer is subsequently stacked up to fabricate a virtual prototype. The simulation approaches resemble the physical fabrication processes of most RP systems, and are therefore capable of accurately representing the geometrical characteristics of prototypes. In addition to numerical quantification of the simulation results, the system also provides stereoscopic visualisation of the product design and its prototype for detailed analyses. Indeed, the original product design may be superimposed on its virtual prototype, so that areas with dimensional errors beyond design limits may be clearly highlighted to facilitate point-to-point analysis of the surface texture and the dimensional accuracy of the prototype. Hence, the key control parameters of an RP process, such as part orientation, layer thickness and hatch space, may be effectively tuned up for optimal fabrication of physical prototypes in subsequent product development. Furthermore, the virtual prototypes can be transmitted via the Internet to customers to facilitate global manufacturing. As a result, both the lead-time and the product development costs can be significantly reduced. 相似文献

8.

CAP-W: Congestion-aware platform for wireless-based network-on-chip in many-core era

《Microprocessors and Microsystems》2017

In order to fulfill the ever-increasing demand for high-speed and high-bandwidth, wireless-based MCSoC is presented based on a NoC communication infrastructure. Inspiring the separation between the communication and the computation demands as well as providing the flexible topology configurations, makes wireless-based NoC a promising future MCSoC architecture. However, congestion occurrence in wireless routers reduces the benefit of high-speed wireless links and significantly increases the network latency. Therefore, in this paper, a congestion-aware platform, named CAP-W, is introduced for wireless-based NoC in order to reduce congestion in the network and especially over wireless routers. The triple-layer platform of CAP-W is composed of mapping, migration, and routing layers. In order to minimize the congestion probability, the mapping layer is responsible for selecting the suitable free core as the first candidate, finding the suitable first task to be mapped onto the selected core, and allocating other tasks with respect to contiguity. Considering dynamic variation of application behaviors, the migration layer modifies the primary task mapping to improve congestion situation. Furthermore, the routing layer balances utilization of wired and wireless networks by separating short-distance and long-distance communications. Experimental results show meaningful gain in congestion control of wireless-based NoC compared to state-of-the-art works. 相似文献

9.

快速原型虚拟逼真设计原理及体系结构 总被引：2，自引：0，他引：2

曹岩赵汝嘉 ZHAO Ru-jia 《计算机辅助设计与图形学学报》2001,13(1):56-60

在分析设计的特点和发展趋势的基础上,提出快速原型虚拟逼真设计,并对其特点和原理进行了讨论,快速原型虚拟逼真设计是面向并行工具,基于虚拟原型和虚拟环境仿真的设计,强调虚拟原型的快速生成和演化,在关键技术,原型系统以及开发环境三个层次上论述了快速原型虚拟逼真设计的研究,提出快速原型虚拟逼真设计模型以及基于该模型的体系结构。相似文献

10.

Evaluation of low power consumption network on chip routing architecture

《Microprocessors and Microsystems》2021

Network on Chip (NoC) is growing technology whereby multiprocessor state interconnect patterns are formed. NoC technology is adapted to support a variety of multiprocessor requirements. The existing designs do not support the growth requirements of user applications. Because of the complex routing connections, several problems exist about traffic congestion and Power consumption contributing to a network's low efficiency. Traffic Congestion, Power consumption, and latency are a significant concern in Network on Chip architectures because of various dynamic routing connections. The existing models do not consider all the above-mentioned factors and struggle to achieve higher performance. The previous methods do not trigger the circuits according to the traffic condition and maximum power consumption. For this, the proposed High-Speed Virtual Logic Network on Chip router architecture is utilized for controlling the traffic congestion and deadlock issues, reduce the latency by selecting the minimal interval paths. In this research work, an architecture containing a Virtual router is introduced which yields low power consumption resulting in improving the performance of a network by performing the routing in a diagonal direction along with the other directions. Also, the method selects an optimal path according to various conditions that neglect the unnecessary triggering of chips which reduces the power consumption. The proposed model considers the dynamic congestion and route available to perform routing with the least power consumption. By comparing both the architectures, VC Router outperformed 15% of low power consumption for the 8-bit system, 10% of low power consumption for the 16-bit system, and 22% of low power consumption for the 32-bit system. 相似文献

11.

A hardware/software platform for QoS bridging over multi-chip NoC-based systems

Ashkan Beyranvand Nejad Anca Molnos Matias Escudero Martinez Kees Goossens 《Parallel Computing》2013

Recent embedded systems integrate a growing number of intellectual property cores into increasingly large designs. Implementation, prototyping, and verification of such large systems has become very challenging. One of the reasons is that chips/FPGAs resources are limited and therefore it is not always possible to implement the whole design in the traditional system-on-a-chip solutions. The state-of-the-art is to partition such systems into smaller sub-systems to implement each on a separate chip. Consequently, it requires interconnecting separate chips/FPGAs. Since Networks-on-Chip (NoCs) have become common interconnection solutions in embedded designs, we propose to bridge NoC-based SoCs enabling a generic multi-chip systems interconnection. In this context, the contribution of this paper is threefold, (i) we explore the NoC protocol stack to determine the best layer for implementing the off-chip bridge, (ii) we propose a generic hardware architecture for the bridge, and (iii) we develop a new software architecture enabling seamless configuration and communication of multi-chip NoC-based SoCs. Finally, we demonstrate performance, i.e., bandwidth and latency, of the bridge in a multi-FPGA platform, while the bridge guarantees QoS of traffic. The synthesis results indicate the implementation area cost of the bridge is only 1% of Xilinx Virtex6 FPGA. 相似文献

12.

A survey of routing algorithm for mesh Network-on-Chip

Yue WU Chao LU Yunji CHEN 《Frontiers of Computer Science》2016,10(4):591-601

With the rapid development of semiconductor industry, the number of cores integrated on chip increases quickly, which brings tough challenges such as bandwidth, scalability and power into on-chip interconnection. Under such background, Network-on-Chip (NoC) is proposed and gradually replacing the traditional on-chip interconnections such as sharing bus and crossbar. For the convenience of physical layout, mesh is the most used topology in NoC design. Routing algorithm, which decides the paths of packets, has significant impact on the latency and throughput of network. Thus routing algorithm plays a vital role in a wellperformed network. This study mainly focuses on the routing algorithms of mesh NoC. By whether taking network information into consideration in routing decision, routing algorithms of NoC can be roughly classified into oblivious routing and adaptive routing. Oblivious routing costs less without adaptiveness while adaptive routing is on the contrary. To combine the advantages of oblivious and adaptive routing algorithm, half-adaptive algorithms were proposed. In this paper, the concepts, taxonomy and features of routing algorithms of NoC are introduced. Then the importance of routing algorithms in mesh NoC is highlighted, and representative routing algorithms with respective features are reviewed and summarized. Finally, we try to shed light upon the future work of NoC routing algorithms. 相似文献

13.

基于NoC的图像采集系统设计

许川佩占来龙任智新《微型机与应用》2012,31(11):34-37

为了解决单核处理器系统的总线互连所带来的互连延迟、存储带宽和功耗极限等性能提升的瓶颈问题,设计了基于NoC系统的实时图像采集和处理系统。该系统采用FPGA实现图像采集模块、存储、JPEG编解码、资源节点、路由节点及VGA显示等功能。实验结果表明,在NoC系统上使用多核技术代替传统的单处理器,在提高系统并行性方面显示出了NoC的巨大优势。相似文献

14.

基于拥塞控制的片上网络多播路由算法

袁景凌刘华谢威蒋幸《计算机应用》2011,31(10):2630-2633

为了满足片上网络日益丰富的应用要求,多播路由机制被应用到片上网络,以弥补传统单播通信方式的不足。以Mesh和Torus类的片上网络为例,分析了基于路径的3种多播路由算法(即XY路由、UpDown路由和SubPartition路由算法),并研究了相应的拥塞控制策略。通过模拟实验表明,多播较单播通信具有更小的平均传输延时和更高的网络吞吐量,且负载分配均匀;特别是SubPartition路由算法随着规模增大效果更加明显;提出的多播拥塞控制机制,能更有效地利用多播通信,提高片上网络的性能。相似文献

15.

Multi-hop communications on wireless network-on-chip using optimized phased-array antennas

Ehsan Tavakoli Mahmoud Tabandeh Sara Kaffash Bijan Raahemi 《Computers & Electrical Engineering》2013

Network-on-Chip (NoC) as a promising design approach for on-chip interconnect fabrics could overcome the energy as well as synchronization challenges of the conventional interconnects in the gigascale System-on-Chips (SoC). The advantages of communication performance of traditional wired NoC will no longer be continued by the future technology scaling. Packets that travel between distant nodes of a large scale wired on-chip network significantly suffer from energy dissipation and latency due to the routing overhead at each hop. According to the International Technology Roadmap for Semiconductors annual report, the RFCMOS characteristics will be steadily improved by technology scaling. As the operating frequency of RF devices increases, the size of Si integrated antenna will decrease and it is feasible to employ them as a revolutionary interconnect for intra-chip wireless communications. In this paper, we focus on physical requirements and design challenges of wireless NoC. It is demonstrated that employing an optimum-radiation phased array antenna and multihop communications will increase the reliability of on-chip wireless links by several orders of magnitude using a limited power budget less than 0.1 pJ/bit. 相似文献

16.

基于报文检测的快速自适应NoC容错路由算法

张士鉴韩国栋沈剑良陈庆强《计算机应用研究》2013,30(7):2168-2172

传统的自适应片上网络(NoC)容错路由算法采用一步一比较的方式来确定最优端口, 未能有效降低传输延迟。根据数据包在2D Mesh NoC前若干连续的跳数内最优端口固定的特点, 提出了一种基于报文检测的快速(FPIB)自适应容错路由算法。算法采用跳步比较的方式来减少数据包的路由时间, 并使用模糊优先级策略来进行容错路由计算。实验结果表明, 与uLBDR容错路由算法相比, 该算法能有效地降低平均延迟, 且实现算法的硬件开销更低。相似文献

17.

Bringing NoCs to 65 nm 总被引：1，自引：0，他引：1

Pullini A. Angiolini F. Murali S. Atienza D. De Micheli G. Benini L. 《Micro, IEEE》2007,27(5):75-85

Very deep submicron process technologies are ideal application fields for NoCs, which offer a promising solution to the scalability problem. This article sheds light on the benefits and challenges of NoC-based interconnect design in nanometer CMOS. The experimental results from fully working 65-nm NoC designs and a detailed scalability analysis are presented. The network on chip (NoC) is a promising solution to the scalability problem. NoCs build upon improvements in bus architecture-for example, in terms of topology design. 相似文献

18.

Virtual prototyping and testing of in-vehicle interfaces

《Ergonomics》2012,55(1-3):41-51

Electronic innovations that are slowly but surely changing the very nature of driving need to be tested before being introduced to the market. To meet this need a system for integrated virtual prototyping and testing has been developed. Functional virtual prototypes of various traffic systems, such as driver assistance, driver information, and multimedia systems can now be easily tested in a driving simulator by a rapid prototyping approach. The system has been applied in recent R&D projects. 相似文献

19.

Practical prototyping

Laliberte T. Gosselin C.M. Cote G. 《Robotics & Automation Magazine, IEEE》2001,8(3):43-52

The design of robotic mechanisms is a complex process involving geometric, kinematic, dynamic, tolerance, and stress analyses. In the design of a real system, the construction of a physical prototype is often considered. Indeed, a physical prototype helps the designer to identify the fundamental characteristics and the potential pitfalls of the proposed architecture. However, the design and fabrication of a prototype using traditional techniques is rather long, tedious, and costly. In this context, the availability of rapid prototyping machines can be exploited in order to allow designers of robotic mechanisms to build prototypes rapidly and at a low cost. In the article, the rapid prototyping of mechanisms using a commercially available computer-aided design (CAD) package and a fused deposition modeling (FDM) rapid prototyping machine is presented. A database of lower kinematic pairs (joints) is developed using the CAD package, and parameters of fabrication are determined experimentally for each of the joints. These joints are then used in the design of the prototypes where the links are developed and adapted to the particular geometries of the mechanisms to be built. Also, a procedure is developed to build gears and Geneva mechanisms. Examples of mechanisms are then studied and their design is presented. For each mechanism, the joints are described and the design of the links is discussed. Some of the physical prototypes built using the FDM rapid prototyping machine are shown 相似文献

20.

A generic FPGA prototype for on-chip systems with network-on-chip communication infrastructure

Mohammad Arjomand Amirali Boroumand Hamid Sarbazi-Azad 《Computers & Electrical Engineering》2014

As System-on-Chips (SoCs) grow in complexity and size, proposals of networks-on-chip (NoCs) as the on-chip communication infrastructure are justified by reusability, scalability, and energy efficiency provided by the interconnection networks. Simulation and mathematical analysis offer flexibility for the evaluations under various network configurations. However, the accuracy of such analyzing methods largely depends on the approximations made. On the other hand, prototyping can be used to improve the evaluation accuracy by bringing the design closer to reality. In this paper, we propose a FPGA prototype that is general enough to model different video-processing SoCs where different cores communicate via NoC. To model NoC, we accurately implement a fully-synthesized on-chip router supporting multiple virtual channels. For the processing nodes, on the other side, we propose a general and simple traffic generator capable of modeling different synthetic functions (i.e. Poisson and self-similar). Indeed, the application traffic is modeled using 1-D hybrid cellular automata which can effectively generate high quality pseudorandom patterns. Finally, for the energy efficiency, the proposed prototype is capable to support multiple frequency regions. To realize the voltage–frequency island partitioned SoC, we use the utilities that Xilinx FPGA platform offers to design Globally Synchronous Locally Asynchronous (GALS) systems via Delay-Locked Loop elements. 相似文献