首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 234 毫秒
1.
随着器件、工艺和应用技术的不断发展,片上多处理器已经成为主流技术,而且片上多处理器的规模越来越大、片内集成的处理器核数目越来越多,用于片内处理器核及其它部件之间互连的片上网络逐渐成为影响片上多处理器性能的瓶颈之一.片上网络的拓扑结构定义网络内部结点的物理布局和互连方法,决定和影响片上网络的成本、延迟、吞吐率、面积、容错...  相似文献   

2.
片上多处理器互连技术综述   总被引:3,自引:0,他引:3  
随着器件、工艺和应用技术的不断发展,片上处理器中处理器核的数目必将进一步增加,处理器芯片内部的互连及其通信成为影响处理器性能的重要因素.介绍了目前在片上多处理器中的几种典型互连方法,并简要分析了各种方法的优缺点.  相似文献   

3.
随着VLSI技术和半导体制造工艺的不断发展,多核处理器已经取代了单核处理器.当技术和工艺的发展使片上多处理器中核的数目增加时,各个处理器核之间的互连及其通信就成为制约处理器性能提高的瓶颈.为了能够充分发挥多核处理器的高性能,文中根据当今主流多核处理器的互连方法,通过分析各种互连方法的优势与不足,提出了针对不同的核的数目和结构采用不同的互连方法,指出将新材料、新技术、新器件与已有的成熟的多核互连方式相结合是提高多核互连效率的有效方法,并阐述了未来多核互连的研究方向和发展趋势.  相似文献   

4.
片上二维网络互连性能分析   总被引:1,自引:1,他引:1  
片上互连网络已日益成为影响片上多处理器性能的重要因素之一.几乎所有的互连结构均是在二维网络的基础上演变发展而来的.首先分析了几种常见的内部结点度均为4的二维网络的静态特性,提出了一种新的二维片上网络互连路由结构和通信协议,基于全局均匀随机通信模型,通过改变网络规模和变换通信强度,分析了不同结构网络的动态特性,然后用链接数表示通信成本,提出了一种新的网络互连综合性能评估指标网络单位成本延迟负载能力,最后对二维网络片上互连的综合性能进行了对比分析,指出了其各自适用的场合.  相似文献   

5.
一种递归定义的可扩展片上网络拓扑结构   总被引:1,自引:0,他引:1  
朱晓静 《计算机学报》2011,34(5):924-930
晶体管工艺的持续发展导致片上处理器数的逐渐增多,片上系统的核间通信要求吞吐量高、延时低、可扩展性好,传统的片上总线和crossbar互连结构已无法满足片上系统的通信需求,为此研究者提出新的片上互连结构,称为片上网络.为满足片上网络的特有通信需求,提出了一种可扩展的拓扑结构Rgrid及其路由算法DR,它缩短了片上处理器间...  相似文献   

6.
随着集成电路工艺水平的不断提升以及应用对处理器性能要求的日益增长,验证已成为未来片上多核处理器发展的主要技术瓶颈.文中深入分析了片上多核处理器验证中状态空间大、完备性不足、存储结构与互连网络验证复杂、硅后验证困难等突出问题,系统地总结了片上多核处理器模拟验证、硬件仿真、形式验证、硅后验证等方面的研究进展,并对该领域未来的发展方向进行了分析与展望.  相似文献   

7.
在前面工作的基础上,根据大规模、超大规模片上网络互连结构的性能特点,针对网络所传输信息的不同特性以及对传输的不同要求,提出了一种命令与数据分传的片上网络原型系统HHSR。该原型系统分别在两套具有不同拓扑结构的片上网络中传输命令和数据,选取速度较快且综合性能较好的单环分级互连网络用于命令包的传输,以满足其实时性的要求,选取速度稍慢但成本较低的六边形Mesh网格用于数据包的传输。实验结果表明,这种命令与数据分传的片上网络原型系统在牺牲一定的数据包传送时间和花费一定成本的基础上,保证和提高了命令与控制信息的传送速度,从而保证和提高了整个片上多处理器的性能。  相似文献   

8.
随着单个芯片上集成的处理器的个数越来越多,传统的电互连网络已经无法满足对互连网络性能的需求,需要一种新的互连方式,因此光互连网络技术应运而生.目前,电互连的片上网络在功耗、性能、带宽、延迟等方面遇到了瓶颈,而光互连作为一种新的互连方式引用到片上网络具有低损耗、高吞吐率、低延迟等无可比拟的优势.本文主要探讨了片上光网络的...  相似文献   

9.
Storus:一个二维片上网络拓扑结构   总被引:1,自引:1,他引:1  
随着CMOS工艺集成度持续不断提高,单片多处理器正在成为高性能处理器结构的发展趋势,现有的片上总线结构已不足以满足片上系统设计的互连需求,近年来提出了片上网络这一新的互连结构,片上网络需要解决的问题有:选择合适的拓扑结构、路由算法、流控机制等等.文中为片上网络结构提供了一个新的拓扑结构Storus以及路由算法L2,并使用多种负载模式、多种流控机制对Storus与Torus结构进行模拟分析.模拟结果显示,Storus的平均路由延时约比Torus小2%~15%,使用热点负载模拟时,Storus的饱和吞吐量约为Torus结构的1.2~1.5倍.  相似文献   

10.
随着单芯片上集成处理器数量的增加,片上网络逐渐成为多核处理器中非常有前景的互连结构.互连网络成为片上多处理器功耗的重要消耗部件之一.而输入缓冲器是路由器漏流功耗的最大消耗单元,采用门控电源是降低其漏流功耗的有效手段.自适应缓冲管理策略能够根据网络中通信量,自适应地关闭/打开缓冲的一部分,从而降低路由器漏流功耗.而为了减小对网络延迟的影响,该策略中采用的提前唤醒技术能够隐藏缓冲的唤醒延迟.在网络注入率较低情况下,两项缓冲不关闭策略下的网络延迟几乎不受唤醒延迟影响.模拟结果显示,在4×4的二维Mesh中,即使网络注入率为0.7,漏流功耗的节约率依然可以高达46%;网络注入率小于0.4时,两项缓冲不关闭策略下的网络延迟最大仅仅增加了3.8%.  相似文献   

11.
With the rapid development of semiconductor industry, the number of cores integrated on chip increases quickly, which brings tough challenges such as bandwidth, scalability and power into on-chip interconnection. Under such background, Network-on-Chip (NoC) is proposed and gradually replacing the traditional on-chip interconnections such as sharing bus and crossbar. For the convenience of physical layout, mesh is the most used topology in NoC design. Routing algorithm, which decides the paths of packets, has significant impact on the latency and throughput of network. Thus routing algorithm plays a vital role in a wellperformed network. This study mainly focuses on the routing algorithms of mesh NoC. By whether taking network information into consideration in routing decision, routing algorithms of NoC can be roughly classified into oblivious routing and adaptive routing. Oblivious routing costs less without adaptiveness while adaptive routing is on the contrary. To combine the advantages of oblivious and adaptive routing algorithm, half-adaptive algorithms were proposed. In this paper, the concepts, taxonomy and features of routing algorithms of NoC are introduced. Then the importance of routing algorithms in mesh NoC is highlighted, and representative routing algorithms with respective features are reviewed and summarized. Finally, we try to shed light upon the future work of NoC routing algorithms.  相似文献   

12.
多核处理器(multi—core processor)成为高性能处理器体系结构的研究发展方向,核间的连接方式对多核处理器性能的发挥起着重要作用。从降低节点度、减少网络链路数和缩短网络直径的角度出发,提出了一种用于片上核间互连的新型分层互连网络——基三分层互连网络(THIN),该网络拓扑简单,节点度数低,网络链路数相对较少,并具有明显的层次性和对称性以及良好的扩展性。深入比较了THIN和2-D Mesh的静态度量和无阻塞延迟,比较结果表明:在网络规模较小时,THIN比2-D Mesh更宜于用来构建片上核间的通信网络。  相似文献   

13.
As the number of cores integrated onto a single chip increases, power dissipation and network latency become ever-increasingly stringent. On-chip network provides an efficient and scalable interconnection paradigm for chip multiprocessors (CMPs), wherein one-to-many (multicast) communication is universal for such platforms. Without efficient multicasting support, traditional unicasting on-chip networks will be low efficiency in tackling such multicast communication. In this paper, we propose Dual Partitioning Multicasting (DPM) to reduce packet latency and balance network resource utilization. Specifically, DPM scheme adaptively makes routing decisions based on the network load-balance level as well as the link sharing patterns characterized by the distribution of the multicasting destinations. Extensive experimental results for synthetic traffic as well as real applications show that compared with the recently proposed RPM scheme, DPM significantly reduces the average packet latency and mitigates the network power consumption. More importantly, DPM is highly scalable for future on-chip networks with heavy traffic load and varieties of traffic patterns.  相似文献   

14.
The chip multiprocessor is the most prolific processor design because its many cores enhance system performance. Network on chip (NOC) has been proposed as a promising model to solve the connection problem of the cores. However, a new challenge consists of fully benefiting from the on-chip network and the cores. In this paper, we propose a novel energy-efficient design of a microkernel-based on-chip operating system for an NOC-based manycore system. The operating system (OS) is partitioned into the microkernel and the other OS modules. They are distributed on the network to provide services to the user programs. Our experimental results show that our design can improve system performance with reduced power consumption.  相似文献   

15.
In future, multicore processors with hundreds of cores will collaborate on a single chip. Then, more advanced network-on-chip (NoC) topologies will be needed than today's shared busses for dual core processors. Multistage interconnection networks, which are already used in parallel computers, seem to be a promising alternative. In this paper, a new network topology is introduced that particularly applies to multicast traffic in multicore systems and parallel computers. Those multilayer multistage interconnection networks are described by defining the main parameters of such a topology. Performance and costs of the new architecture are determined and compared to other network topologies. Network traffic consisting of constant size packets and of varying size packets is investigated. It is shown that all kinds of multicast traffic particularly benefit from the new topology.  相似文献   

16.
Current on-chip network and inter-chip interconnection are designed separately. However, this traditional design methodology faces a great challenge: in a multi-chip system, each many-core chip contains hundreds or thousands of processors. The increasing number of on-chip processors must share one input/output unit to interface with the inter-chip interconnection. The increased network usage at the chip interface may create an uneven traffic load in the on-chip network. That is, traffic jams could occur in the chip area around the input/output unit. New technologies, such as through silicon via and silicon interposer, can directly connect networks on chips. These technologies can improve communication performance and reduce power consumption by omitting the input/output unit. This paper proposes a novel routing scheme to deal with the network scalability issues related to the many-core and multi-chip system-in-package paradigm. The proposed scheme can also enhance the fault-tolerance of nano-scale communication in deep-submicron designs.  相似文献   

17.
This paper proposes a routing algorithm for the interconnection of multiple processors based on the shortest-path and deflection-routing principles. The routing algorithm, named SPDRA (Shortest Path and Deflection Routing Algorithm), is applied to multiprocessor systems with a single-stage shuffle physical topology. SPDRA is general-purpose, as opposed to the majority of routing algorithms for multiprocessor systems which are optimized for particular traffic patterns generated by a restricted class of parallel algorithms. The general-purpose nature of SPDRA allows perfomance comparisons with a wide class of routing algorithms for multiprocessor systems that, similar to the single-stage shuffle physical topology, have a fixed node-to-processor ratio. The paper compares SPDRA with hypercube algorithms for bidimensional meshes and torus physical topologies, routing algorithms for hierarchical tridimensional tori, and algorithms for routing permutations in shuffle networks, which constitute the most widely accepted approaches for multiprocessor interconnection. SPDRA exhibits a performance advantage for a broad range of network sizes and, in general, the performance advantage grows as the number of processors increases. However, this paper compares the SPDRA algorithm against a limited set of multiprocessor systems and does not demonstrate a general superiority of SPDRA over all systems with a fixed node-to-processor ratio and, especially, with a growing node-to-processor ratio, such as multistage networks.  相似文献   

18.
A formal approach to MpSoC performance verification   总被引:2,自引:0,他引:2  
Richter  K. Jersak  M. Ernst  R. 《Computer》2003,36(4):60-67
Multiprocessor system on chip designs use complex on-chip networks to integrate different programmable processor cores, specialized memories, and other components on a single chip. MpSoC have been become the architecture of choice in many industries. Their heterogeneity inevitably increases with intellectual-property integration and component specialization. System integration is becoming a major challenge in their design. Simulation is state of the art in MpSoC performance verification, but it has conceptual disadvantages that become disabling as complexity increases. Formal approaches offer a systematic alternative. The article presents a technology that uses event model interfaces and a novel event flow mechanism that extends formal analysis approaches from real-time system design into the multiprocessor system on chip domain.  相似文献   

19.
介绍了一种面向机群系统双环形网络拓扑结构的高速光互联网络适配器的设计和实现方法。该网络适配器基于FPGA技术实现,总线接口采用高速、高带宽的DDR DIMM总线,网络传输介质采用光纤,底层路由协议采用FPGA内部硬件逻辑实现,全方位保证了高带宽、低延迟、高可靠的网络特性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号