期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Supporting faulty banks in NUCA by NoC assisted remapping mechanisms

Kuei-Chung Chang Chen-Yu Chen Chin-Sheng Yu Ching-Wen Chen 《The Journal of supercomputing》2014,67(2):305-323

The many-core SoC is a future trend technology, and the process yield will face many unpredictable challenges. Nonuniform cache architecture (NUCA) can improve the performance of many-core SoC for embedded systems. It embeds a NoC into the cache memory to enhance the data access by distributing traffic loads to several banks in parallel. Providing fault-tolerant mechanism in NUCA is very important because the chip can still work efficiently when some memory banks are unusable. In this paper, we design a specific router working with static and dynamic cache remapping policies to support faulty banks in NUCA. When a L2 cache bank in NUCA is unusable, static remapping policy (SRP) selects a suitable neighbor cache bank according to the collected remapping cost to assist with the cache access by considering cache status and traffic status of the router. We also propose a dynamic remapping policy (DRP) to select the suitable cache bank dynamically at runtime to fit the real loading status of neighbor nodes around the faulty bank. The experimental results show that the average improvement of the SRP is approximated to 26 %, and the average improvement of the DRP is approximated to 28 %. 相似文献

2.

重用感知的非一致缓存迁移策略研究

汪玲黄炎袁光辉《计算机工程》2014,(2):81-85

随着工艺的持续进步,多核处理器集成了越来越多的核以及片上缓存系统,因此利用非一致缓存架构(NUCA)应对片上多核处理器的缓存系统中逐渐增大的线延迟。高效的缓存块迁移策略对整个缓存系统至关重要。当前动态非一致缓存架构(D-NUCA)中的缓存块迁移策略未考虑缓存块的历史访问信息,导致缓存块在不同的bank之间抖动从而增加缓存块的访问延迟。为此,提出一种重用感知的缓存块迁移(RABM)策略,采用缓存块的历史迁移信息来预测将来的缓存块迁移,从而提升D-NUCA的性能以及降低整个缓存系统的功耗。基于PARSEC基准测试程序的全系统仿真结果显示,与D-NUCA相比,基于RABM的D-NUCA可以使每时钟周期指令数平均提高9.6%,片上缓存系统功耗降低14%。相似文献

3.

众核处理器Cache一致性研究综述

韩立敏安建峰高德远樊晓桠任向隆《计算机应用研究》2012,29(11):4011-4016

以瓦片结构众核处理器一致性协议的设计为主线,综述了国内外近年来关于众核处理器cache一致性的相关研究;介绍了不同NUCA结构对一致性协议的影响;分析和对比了几种传统目录一致性协议的特性及其存在的问题;归纳了最新几个面向众核结构一致性协议的设计思想和特性。最后为设计具备应用程序适应性和可扩展性的cache一致性协议指出了几个关键的设计方向。相似文献

4.

多核处理器非一致Cache体系结构延迟优化技术研究综述 总被引：1，自引：0，他引：1

黄安文高军张民选《计算机研究与发展》2012,(Z1):118-124

非一致Cache体系结构(non-uniform cache architecture,NUCA)为解决多核处理器(chip multi-processor)"存储墙"难题提供了新的设计思路.重点关注面向CMP的NUCA延迟优化技术,在介绍若干典型NUCA模型的基础上,分析大容量Cache环境下共享/私有机制中的延迟-容量权衡问题,讨论映射、迁移、复制和搜索等数据管理机制在多核环境下的优缺点.最后,针对基于片上网络(network-on-chip,NoC)互连结构的可扩展CMP体系结构,从NUCA模型优化、数据管理和一致性维护机制3个方面讨论和预测未来CMP NUCA延迟优化领域的发展趋势及面临的挑战性问题. 相似文献

5.

Reliability improvement in private non-uniform cache architecture using two enhanced structures for coherence protocols and replacement policies

Mohammad Maghsoudloo Hamid R. Zarandi 《Microprocessors and Microsystems》2014

In this paper, a comprehensive study is first conducted to investigate the effects of cache coherence protocols and cache replacement policies on the characteristics of NUCA in current many-core processors. The main focus of this study is to analyze the effects of coherence protocols and replacement policies on the vulnerability of caches. The outcomes of this analysis indicate two facts: (i) Differences in handling write operations play an important role to make distinction in favor of or against a cache coherence protocol; (ii) Near-optimal solutions for replacement problem, aimed at enhancing the performance, can also make positive influence on reduction of cache vulnerability factor. Based on the results of first step, two schemes are introduced to enhance the reliability of caches by applying some modification on the structures of cache coherence protocols and cache replacement policies. The first scheme tries to manage sharing of the dirty data items among different same-level caches. The second helps to give priority and more opportunity to old dirty blocks than clean blocks for replacement. The proposed schemes reveal about 18% improvement in MTTF, with negligible performance, bandwidth and energy consumption overhead compared to previous cache structures. 相似文献

6.

Modern architecture for photonic networks-on-chip

Sharma Kapil Sehgal Vivek Kumar 《The Journal of supercomputing》2020,76(12):9901-9921

Development in photonic integrated circuits (PICs) provides a promising solution for on-chip optical computation and communication. PICs provides the best alternative to traditional networks-on-chip (NoC) circuits which face serious challenges such as bandwidth, latency and power consumption. Integrated optics have substantiated the ability to accomplish low-power communication and low-power data processing at ultra-high speeds. In this work, we propose a new architecture for NoC, which might improve overall on-chip network performance by reducing its power consumption, providing large channel capacity for communication, decreasing latency among nodes and reducing hop count. Some of the key features of the proposed architecture are to reduce the waveguide network for communication among nodes, and this architecture can be used as a brick to construct other architectures. In this architecture, we use micro-ring resonator (MRR) and it is used to provide a high bandwidth connection among nodes with a lesser number of waveguide networks. Furthermore, results show that this architecture of PICs provides better performance in terms of low communication latency, low power consumption, high bandwidth. It also provides acceptable FSR value, FWHR value, finesse value and Q-factor of micro-ring resonators used for the design of MRR in this architecture.

相似文献

7.

Exploiting and evaluating the potentials of the link addition method for NoC transient error mitigation

Jiajia Jiao Yuzhuo Fu 《Microprocessors and Microsystems》2014

Transient errors in a Network on Chip (NoC) result in some problems such as network blockage, packets loss or incorrect delivery, which would decrease the network throughput and degrade the successful delivery rate. Many fault tolerant mechanisms, such as error correcting code, retransmission and redundancy for the NoC, have been proposed to mitigate transient errors and guarantee the communication quality. Different from these existing methods, the paper aims at exploiting the potentials of the link addition strategy for transient error alleviation. The regular link addition as well as the customized link addition based on Mesh is designed for alleviating NoC transient errors. The regular design is suitable for the general purpose case while the partially customized design exploits the inherent communication characteristics and the reliability requirement of applications for some specified cases. The experimental results for typical network benchmarks confirm that the proposed link addition methods are effective to improve NoC performance and reliability. (1) In the case of the regular link addition, 4 × 4 Torus brings the throughput to increase by 45.76% and 87.34% over Mesh for the transpose traffic and the uniform traffic respectively. The reliability metrics of Torus over Mesh are up to 56.65% and 12.71% for the transpose traffic and the uniform traffic respectively. (2) The novel customized reliability-aware link addition mechanism makes the throughput improvement up to 17.4%, 53.5% and the reliability metric up to 16.34%, 57.76% over standard Mesh for the transpose traffic and the hotspot traffic respectively. In addition, the area overhead and power consumption of NoCs are also evaluated by the tool—Orion in the paper. 相似文献

8.

Application-driven dynamic bandwidth allocation for two-layer network-on-chip design

《Computers & Electrical Engineering》2014,40(8):317-332

Network-on-Chip (NoC) architecture has been widely used in many multi-core system designs. To improve the communication efficiency and the bandwidth utilization of NoC for various applications, we firstly propose a table-based algorithm for identifying the dominant flows at runtime. Then a two-layer NoC architecture with an application-driven bandwidth allocation scheme is presented, which is capable of identifying heavy-load dataflows and dynamically reconfiguring point-to-point (P2P) connections to optimize the heavy-load traffic. Experimental results reveal that our design (8 × 8 mesh NoC) achieves 28.5% performance improvement and 25.9% power consumption saving compared to the baseline NoC. 相似文献

9.

On an efficient NoC multicasting scheme in support of multiple applications running on irregular sub-networks

Xiaohang WangAuthor Vitae Mei YangAuthor Vitae 《Microprocessors and Microsystems》2011,35(2):119-129

When a number of applications simultaneously running on a many-core chip multiprocessor (CMP) chip connected through network-on-chip (NoC), significant amount of on-chip traffic is one-to-many (multicast) in nature. As a matter of fact, when multiple applications are mapped onto an NoC architecture with applicable traffic isolation constraints, the corresponding sub-networks of these applications are mapped onto actually tend to be irregular. In the literature, multicasting for irregular topologies is supported through either multiple unicasting or broadcasting, which, unfortunately, results in overly high power consumption and/or long network latency. To address this problem, a simple, yet efficient hardware-based multicasting scheme is proposed in this paper. First, an irregular oriented multicast strategy is proposed. Literally, following this strategy, an irregular oriented multicast routing algorithm can be designed based on any regular mesh based multicast routing algorithm. One such algorithm, namely, Alternative Recursive Partitioning Multicasting (AL + RPM), is proposed based on RPM, which was designed for regular mesh topology originally. The basic idea of AL + RPM is to find the output directions following the basic RPM algorithm and then decide to replicate the packets to the original output directions or the alternative (AL) output directions based on the shape of the sub-network. The experiment results show that the proposed multicast AL + RPM algorithm can consume, on average, 14% and 20% less power than bLBDR (a broadcasting-based routing algorithm) and the multiple unicast scheme, respectively. In addition, AL + RPM has much lower network latency than the above two approaches. To incorporate AL + RPM into a baseline router to support multicasting, the area overhead is fairly modest, less than 5.5%. 相似文献

10.

MoNoC: A monitored network on chip with path adaptation mechanism

《Journal of Systems Architecture》2014,60(10):783-795

Complex systems on chip containing dozens of processing resources with critical communication requirements usually rely on the use of networks on chip (NoCs) as communication infrastructure. NoCs provide significant advantages over simpler infrastructures such as shared busses or point to point communication, including higher scalability, more efficient energy management, higher bandwidth and lower average latency. Applications running on NoCs with more than 10% of bandwidth usage attest that the most significant portion of message latencies refers to buffered packets waiting to enter the NoC, whereas the latency portion that depends on the packet traversing the NoC is sometimes negligible. This work presents an adaptive routing architecture, named Monitored NoC (MoNoC), which is based on a traffic monitoring mechanism and the exchange of high priority control packets. This method enables to adapt paths by choosing less congested routes. Practical experiments show that the proposed path adaptation is a fast process, enabling to transmit packets with smaller latencies, up to 9 times smaller, by using non-congested NoC regions. 相似文献

11.

An efficient numerical-based crosstalk avoidance codec design for NoCs

《Microprocessors and Microsystems》2017

With technology scaling, crosstalk fault has become a serious problem in reliable data transfer through Network on Chip (NoC) channels. The effects of crosstalk fault depend on transition patterns appearing on the wires of NoC channels. Among these patterns, Triplet Opposite Direction (TOD) imposes the worst crosstalk effects. Crosstalk Avoidance Codes (CACs) are the overhead-efficient mechanisms to tackle TODs. The main problem of CACs is their high imposed overheads to NoC routers. To solve this problem, this paper proposes an overhead-efficient coding mechanism called Penultimate-Subtracted Fibonacci (PS-Fibo) to alleviate crosstalk faults in NoC wires. PS-Fibo coding mechanism benefits the novel numerical system that not only completely removes TODs but also, is applicable to a wide range of NoC channel widths. The PS-Fibo coding mechanism is evaluated using BookSim-2 and VHDL-based simulations in the terms of codec efficiency on the crosstalk fault reduction, codec power consumption, codec area occupation and network performance. Evaluation results, carried out for a wide range of NoC channel widths indicate that PS-Fibo can improve power consumption and area occupations of codec and NoC performance with respect to the other state-of-the-art coding mechanisms. 相似文献

12.

Buffer planning for application-specific networks-on-chip design

ShouYi Yin LeiBo Liu ShaoJun Wei 《中国科学F辑(英文版)》2009,52(4):547-558

Networks-on-chip (NoC) is a promising communication architecture for next generation SoC. The size of buffer used in on-chip routers impacts the silicon area and power consumption of NoC dominantly. It is important to plan the total buffer-size and each router buffer-allocation carefully for an efficient NoC design. In this paper, we propose two buffer planning algorithms for application-specific NoC design. More precisely, given the traffic parameters and performance constraints of target application, the proposed algorithms automatically determine minimal buffer budget and assign the buffer depth for each input channel in different routers. The experimental results show that the proposed algorithms can significantly reduce total buffer usage and guarantee the performance requirements. Supported by the National Natural Science Foundation of China (Grant No. 60803018) 相似文献

13.

片上非一致Cache体系结构研究

下载免费PDF全文

贾小敏黄彩霞张民选孙彩霞齐树波《计算机工程与科学》2009,31(8)

随着集成电路制造工艺的发展,片上集成大容量Cache成为微处理器的发展趋势。然而,互连线延迟所占比例越来越大,成为大容量Cache的性能瓶颈,因此需要新的Cache体系结构来克服这些问题。非一致Cache体系结构通过在Cache内部支持多级延迟和数据块迁移来减少Cache的命中时间,提高性能,从而克服互连线延迟对大容量Cache的限制,已经成为微处理器片上存储结构的研究热点。本文回顾了非一致Cache体系结构模型的研究进展,特别是对片上多核处理器中的非一致Cache体系结构模型进行了详细介绍,比较了不同模型的贡献和不足。最后,对非一致Cache体系结构的发展进行了展望。相似文献

14.

A NUCA Substrate for Flexible CMP Cache Sharing 总被引：1，自引：0，他引：1

Jaehyuk Huh J. Changkyu Kim C. Shafi H. Lixin Zhang L. Burger D. Keckler S.W. 《Parallel and Distributed Systems, IEEE Transactions on》2007,18(8):1028-1040

We propose an organization for the on-chip memory system of a chip multiprocessor in which 16 processors share a 16-Mbyte pool of 64 level-2 (L2) cache banks. The L2 cache is organized as a nonuniform cache architecture (NUCA) array with a switched network embedded in it for high performance. We show that this organization can support a spectrum of degrees of sharing: unshared, in which each processor owns a private portion of the cache, thus reducing hit latency, and completely shared, in which every processor shares the entire cache, thus minimizing misses, and every point in between. We measure the optimal degree of sharing for different cache bank mapping policies and also evaluate a per-application cache partitioning strategy. We conclude that a static NUCA organization with sharing degrees of 2 or 4 works best across a suite of commercial and scientific parallel workloads. We demonstrate that migratory dynamic NUCA approaches improve performance significantly for a subset of the workloads at the cost of increased complexity, especially as per-application cache partitioning strategies are applied. We also evaluate the energy efficiency of each design point in terms of network traffic, bank accesses, and external memory accesses. 相似文献

15.

基于NoC结构的图像中值滤波并行处理模式分析

刘佳路铭李哲英《计算机科学》2012,39(1):311-314

给出了一个多处理器NoC结构以实现指定的中值滤波算法。为了提高图像处理的速度,在NoC设计的专用SoC中使用了系统并行机制与基本计算单元指令并行机制相结合的方法。它既可以满足处理速度的要求,又能达到降低功率损耗的目的。对图像处理中的中值滤波处理结构进行了并行设计,可极大地提高处理速度。相似文献

16.

基于云自适应遗传算法的NoC映射研究

许川佩陈征南任智新《计算机工程与应用》2012,48(36):70-74,104

NoC映射是NoC设计中的重要步骤,映射结果的优劣对NoC的QoS约束和通信功耗有着很大的影响。提出一种采用云自适应遗传算法实现NoC映射的方案,该算法利用云模型对传统遗传算法加以改进,以此新方法自动调整遗传算法过程中的交叉概率和变异概率,从而达到优化遗传算法的目的。结合NoC映射中的具体问题,在功耗和延时约束的限制条件下,建立了延时约束下的NoC映射功耗数学模型。实验表明,该方法在NoC映射中取得了良好的效果,降低了通信功耗。相似文献

17.

Lightweight fault localization combined with fault context to improve fault absolute rank

Wang Yong Huang Zhiqiu Li Yong Fang Bingwu 《中国科学:信息科学(英文版)》2017,60(9):1-15

Network-on-Chip (NoC) is a promising replacement of bus architecture due to its better scalability. In state-of-the-art NoCs, each packet contains several fixed-length flits, which facilitates allocations of network resources but brings in many unused bits. In this paper, we propose a novel technique called Stealth-ACK to effectively address the above problem. Stealth-ACK leverages unused bits in head flits of non-ACK packets to carry and stealthily transmit ACK information. Such stealth transmissions of ACK information effectively reduce not only the amount of dedicated ACK packets on NoC, but also the number of unused bits in head flits of non-ACK packets, which significantly reduces wastes on NoC bandwidth. Experimental results show that Stealth-ACK averagely increases the throughput of 16 × 16 2-D mesh NoC by 11.9%, and averagely reduces the NoC latency by 34.8% on application traces of SPLASH-2. Moreover, Stealth-ACK only requires trivial hardware modification to basic router architectures, which incurs negligible power consumption and area cost.

相似文献

18.

Efficient routing techniques in heterogeneous 3D Networks-on-Chip

Michael Opoku Agyeman Ali AhmadiniaAlireza Shahrabi 《Parallel Computing》2013

Three-dimensional Networks-on-Chips (3D NoCs) have recently been proposed to address the on-chip communication demands of future highly dense 3D multi-core systems. Homogeneous 3D NoC topologies have many Through Silicon Vias (TSVs) which have a costly and complex manufacturing process. Also, 3D routers use more memory and are more power hungry than conventional 2D routers. Alternatively, heterogeneous 3D NoCs combine both the area and performance benefits of 2D and 3D static router architectures by using a limited number of TSVs. To improve the performance of heterogeneous 3D NoCs, we propose an adaptive router architecture which balances the traffic in such NoCs. Particularly, experimental results show that our proposed architecture significantly improves the performance up to 75% by replacing 2D static routers with adaptive 2D routers in heterogeneous 3D NoCs, while keeping the maximum clock frequency, power and energy consumption of the adaptive router nearly at the same level as the static router. 相似文献

19.

面向多核NUCA共享数据竞争问题的Bank一致性技术

下载免费PDF全文

吴俊杰潘晓辉《计算机工程与科学》2009,31(11)

非一致Cache体系结构(NUCA)几乎已经成为未来片上大容量cache的发展方向。多核处理器的NUCA结构中,多个处理器核对共享数据的竞争访问,可能导致数据经常处于中部的cache Bank,增加NUCA的访问延迟。本文提出支持数据副本的Bank一致性技术,通过有选择地在NUCA中为访问的处理器核创建不同的数据副本,Bank一致性技术能够缓解多核处理器对共享数据的竞争问题。本文详细地介绍了Bank一致性协议的设计方法。最后,使用全系统模拟器对8个NPB基准测试程序进行了详细评测。实验结果表明,Bank一致性技术能够有效缓解多核处理器中共享数据的竞争访问问题。相比不支持Bank一致性技术的CMP-DNUCA结构,本文的方法能将系统IPC性能平均提升5.95%。相似文献

20.

Scalable architecture for a contention-free optical network on-chip

Somayyeh Koohi Shaahin HessabiAuthor Vitae 《Journal of Parallel and Distributed Computing》2012

This paper proposes CoNoC (Contention-free optical NoC) as a new architecture for on-chip routing of optical packets. CoNoC is built upon all-optical switches (AOSs) which passively route optical data streams based on their wavelengths. The key idea of the proposed architecture is the utilization of per-receiver wavelength in the data network to prevent optical contention at the intermediate nodes. Routing optical packets according to their wavelength eliminates the need for resource reservation at the intermediate nodes and the corresponding latency, power, and area overheads. Since passive architecture of the AOS confines the optical contention to the end-points, we propose an electrical arbitration architecture for resolving optical contention at the destination nodes. By performing a series of simulations, we study the efficiency of the proposed architecture, its power and energy consumption, and the data transmission latency. Moreover, we compare the proposed architecture with electrical NoCs and alternative ONoC architectures under various synthetic traffic patterns. Averaged across different traffic patterns, the proposed architecture reduces per-packet power consumption by 19%, 28%, 29%, and 91% and achieves per-packet energy reduction of 28%, 40%, 20%, and 99% over Columbia, Phastlane, λ

λ

-router, and electrical torus, respectively. 相似文献