期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

林红君王长山《计算机应用》2010,30(12):3176-3179

片上互连网络是片上通信问题的有效解决方案,但存在严重的资源限制。标准拓扑结构难以满足应用的流量需求,同时还导致大量功耗和面积的开销。适用于通用系统的NoC设计难以满足面向服务质量可预测的互连。给出一种面向应用的带宽感知路由技术,针对具体的应用,首先使用基于遗传算法的映射技术获得IP核到网络节点的最佳映射,然后通过带宽感知的路由算法为网络中的每条数据传输生成最短路由,并通过虚信道静态分配保证该路由是无死锁的。为了减少路由表的硬件开销,还结合使用了路由表压缩的方法。仿真结果表明,所提出的路由技术与现有的路由算法相比,具有更好的时延性能。相似文献

2.

访存敏感的增量式MPSoC应用映射

王一拙左琦计卫星王小军石峰《计算机研究与发展》2015,52(5)

现代多处理器片上系统(multiprocessor system-on-chip,MPSoC)通常采用片上网络(network-on-chip,NoC)作为其基本互连结构,应用映射是基于片上网络互连的MPSoC设计中的关键问题,应用映射决定应用划分成的各个任务到片上网络节点的分配.许多基于片上网络互连的MPSoC系统将共享存储作为网络中的独立节点,针对这类MPSoC系统,提出一种访存敏感的增量式动态映射策略.该策略离线分析获取应用的访存特征,运行中当应用到达系统时,根据其访存特征选择不同的映射算法,将热点应用围绕共享存储器布局,非热点应用远离共享存储器布局,并最小化应用间以及应用所含任务间的通信链路竞争.模拟实验表明:与贪恋区域选择加随机节点映射的策略相比较,提出的策略对系统整体通信功耗平均节约34.6％,性能提升可达36.3％,并能适应不同片上网络规模. 相似文献

3.

基于多目标免疫算法的NoC映射优化

吕兴胜李光顺吴俊华《计算机工程》2015,(4)

片上网络映射算法对系统的功耗、可靠性等性能有重大影响。引入新的抗体初始化算子和抗体变异算子,提出一种多目标映射免疫算法,以降低系统功耗,提高系统可靠性,避免产生额外的资源开销。算法中新的抗体初始化算子利用贪心算法产生初始抗体,新的抗体变异算子通过交换IP核位置减小通信距离,对解进行优化,从而降低由变异随机性产生的退化风险。根据网络的动态特性,提出一种新的功耗模型,使得功耗计算更准确。仿真结果表明,该算法能够有效降低功耗,提高可靠性。相似文献

4.

面向应用的片上网络的网络拓扑生成算法

王海琪董社勤《计算机辅助设计与图形学学报》2011,23(9)

针对面向应用的片上网络,提出了一种三阶段的低功耗网络拓扑生成算法.首先基于内核通信量和物理坐标信息做划分驱动的布图规划,以确定内核的摆放位置以及内核和转换器之间的映射关系;其次考虑转换器和网络接口的面积消耗,并把它们的同时插入问题抽象成整数线性规划模型,通过求解此优化模型确定其最佳插入位置,生成互连网络;最后通过路由分配策略确定互连网络上的通信量分布,进一步优化功耗.实验结果表明,该算法平均能节省35.2%的功耗开销以及5.7%的中转转换器数目. 相似文献

5.

拓扑结构感知的片上网络体系结构应用映射与优化 总被引：1，自引：0，他引：1

严明杨子煜赵鹏李思昆《计算机工程与科学》2009,31(Z1)

应用映射是片上网络体系结构研究的关键问题之一,映射结果的好坏会极大地影响体系结构的性能。现有的应用映射方法大多基于特定的网络结构,如2d-mesh、2d-torus等,研究NoC性能或功耗约束的应用映射与优化方法。本文提出了一种拓扑结构感知的基于高层代码转换的片上网络应用映射与优化方法。该方法采用多面体模型对应用的核心循环进行自动并行和局部性优化,并将网络拓扑结构抽象成带权重的有向图,使用该有向图对任务流图进行覆盖,以提高任务的并行性,降低任务间同步和通信开销。实验结果表明,采用优化的映射方法后任务节点间的并行性被充分利用,通信开销降低,整体上提高了片上网络系统性能。相似文献

6.

一种新型片上网络互连结构的仿真和实现 总被引：2，自引：0，他引：2

陈芳露陆雯青虞志益周晓方《小型微型计算机系统》2010,31(5)

综合性能、硬件实现等方面考虑,提出一种基于片上网络的互连拓扑结构-层次化路由结构MLR(Multi-Layer Router).该结构通过层次化设计减小网络直径,具有良好的对称性和扩展性.网络建模仿真和硬件实现结果显示,在不同网络负载和不同IP核节点数的情况下,MLR与传统结构相比,在处理网络通信时,对于网络丢包率、通信延迟和网络吞吐量等网络性能参数均有最多50%-70%的提升;同时通过共享路由的方式,减少了超过20%的芯片面积和40%以上的动态功耗,有效降低了互连结构的硬件开销相似文献

7.

基于星形互连网络的并行快速傅立叶变换算法 总被引：6，自引：0，他引：6

史云涛侯紫峰宋建平《计算机研究与发展》2002,39(5):625-630

星形互连网络是一种易于实现大规模并行计算的互连网络拓扑结构。利用星形互连网络的递归可分解性的多样性，提出了一种基于星形互连网络的并行快速傅立叶变换算法的实现方法。该方法能够有效地减少计算过程中处理器结点之间的通信开销。提出的星图结点和数据的映射应运及实现并行FFT的思想可推广到线性方程组求解、矩阵乘法等其它并行算法在星形互连网络上的实现。相似文献

8.

V-Mesh:面向三维堆叠芯片的低时延低功耗片上网络结构

谭海何月顺靳文兵苏岩《计算机学报》2014,37(10)

针对片上网络直径大、功耗高、可扩展性差以及物理实现复杂的问题,提出了一个低直径、且直径为常数的三维片上网络V-Mesh,并为该网络结构提供了VM路由算法.V-Mesh结构由一层2D Mesh子网和多层行/列互连子网通过三维堆叠技术互连而成,具有功耗低的特点,能支持任意多的节点数,可用于三维堆叠芯片中的节点间互连.相对于一种全互连3D片上网络F-Mesh来说,V-Mesh结构采用行/列互连技术大大减少了其长互连线条数,从而减少了功耗和布线复杂度,可扩展性强.理论分析和实验结果表明,和F-Mesh结构相比,V-Mesh结构的时延与其相当,但能够减少约12.5%的功耗开销.和3DMesh相比,在节点数较多的情况下,其时延能降低23%,吞吐量能提高12%,功耗能降低34%.总的来说,V-Mesh和3D Mesh相比各方面具有明显优势;和F-Mesh的互连性能相当,但其物理实现更为简单,布线量小,可扩展性更好. 相似文献

9.

可靠电压频率岛感知的异步片上网络能耗优化

李贞妮李晶皎金硕巍《信息与控制》2016,45(6):671-676

针对采用2D-Torus拓扑结构且支持电压频率岛（VFI）的异步片上网络能耗优化问题,提出了具有可靠性的、基于电压频率岛的划分和分配及片上网络任务映射的能耗优化方法．该方法采用递进优化的方式,根据IP核的动态处理能耗,不同电压频率岛之间的转换能耗和可靠性带来的能耗开销定义了IP核在电压频率岛之间移动的阈值函数,并通过对阈值函数进行判断完成电压频率岛的划分和分配,应用基于三元相关性量子粒子群优化算法完成处理单元到资源节点的映射,在映射中考虑保证系统可靠性的通信开销,对异步片上网络系统的可靠性进行优化．实验结果表明,该算法可以在不过多消耗能耗的情况下显著的改善片上网络系统的可靠性,且可有效降低NOC系统的能耗. 相似文献

10.

无线传感器网络移动代理路由算法的仿真研究

梁振球陈雅《计算机仿真》2011,28(2)

路由选择问题是无线传感器网络中的核心问题.针对无线传感器网络节点能景、计算能力和通信能力有限等特点,为了降低能量开销和延长网络的生命周期,将蚁群算法和移动代理技术结合.提出一种新的无线传感器路由算法.考虑节点之间距离、路径能量消耗和节点剩余能量情况,使网络中的能量消耗更加均衡,同时对蚁群信息素更新规则进行了改进,加快了蚁群算法收敛到最优解的速度.仿真结果表明,与其它移动代理路由算法相比,算法在全局性和收敛速度方面都有所提高,有效地减少冗余数据传输,降低通信消耗,延长网络的生存周期,为路由传感器设计提供了参考. 相似文献

11.

The 2D digraph-based NoCs: attractive alternatives to?the 2D mesh NoCs

Reza Sabbaghi-Nadooshan Mehdi Modarressi Hamid Sarbazi-Azad 《The Journal of supercomputing》2012,59(1):1-21

This paper proposes two-dimensional directed graphs (or digraphs for short) as a promising alternative to the popular 2D mesh topology for networks-on-chip (NoCs). Mesh is the most popular topology for the NoCs, mainly due to its suitability for on-chip implementation and low cost. However, the fact that a digraph offers a lower diameter than its equivalent linear array of equal cost motivated us to evaluate digraphs as the underlying topology of NoCs. This paper introduces a family of NoC topologies based on three well-known digraphs, namely de Bruijn, shuffle-exchange, and Kautz. We study topological properties of the proposed topologies. We show that the proposed digraph-based topologies have several attractive features including constant node degree, low diameter and cost, and low zero load latency which result in superior performance over the mesh. We introduce a deadlock-free routing algorithm for the proposed NoC topologies and compare NoCs employing the proposed topologies and the mesh topology in terms of power consumption and performance. Simulation results also reveal that the proposed NoC topologies offer higher performance and consume lower power than the mesh NoC. 相似文献

12.

Energy efficient heuristic application mapping for 2-D mesh-based network-on-chip

《Microprocessors and Microsystems》2019

Application mapping in 2-D mesh-based Network-on-Chip (NoC) architecture is an optimization problem in which each application task (e.g., processor or memory units) should be mapped one-to-one onto a network element (switch or router) to optimize performance requirements (e.g., communication energy or communication latency) under certain platform constraints (e.g., bandwidth and/or latency). Network-on-Chip is a scheme that establishes links between limited application-specific components within Multi-Processor System-on-Chip (MPSoC), but it has a vital role to ensure the maximum data transfer rate and reduce total number of physical interconnections. Most of the works on heuristic application mapping for mesh-based NoC design aim to minimize both total communication energy and run-time, however they experience the following issues: (i) relatively high CPU time due to linear search for the task and tile mapping combinations, (ii) consumption of relatively high communication energy due to random tile selection when two or more tiles are equivalent in terms of average weighted distance by their adjacent mapped tasks, and (iii) even after constructive application mapping, some of the tasks consume higher communication energy due to their inappropriate placements. In this paper we present a low time-complexity heuristic mapping algorithm of weighted application graph under permissible bandwidth constraint to minimize communication energy of 2-D mesh-based NoC architecture. The experimental results of multimedia benchmarks, as well as randomly generated samples show the low communication energy as well as time-complexity under bandwidth constraints in comparison to the recent heuristic application mapping approaches. In our approach, the communication energy is also close to the optimal solution obtained by Integer Linear Programming (ILP). 相似文献

13.

不规则IP模块到2维NoC结构的映射方法研究 总被引：1，自引：0，他引：1

李光顺吴俊华马光胜《计算机科学》2008,35(1):31-33

提出了一种新的基于NoC(Network on Chip)的不规则IP模块映射方法.其基本思想是把较大的IP模块分解成几个小的IP虚模型,或把几个较小的IP模块组合成一个IP虚模型,使得每个IP虚模型能映射到NoC结构的一个资源节点上.通过计算曼哈顿距离和输入/输出度,可以确定每个通信节点中缓冲区的大小.根据计算的通信代价可以对初始映射结果进行调整,从而可以避免通信拥塞,降低系统的功耗. 相似文献

14.

基于虫孔交换的NoC映射和通讯参数自动化设计方法研究

下载免费PDF全文

曹亚菲王大伟李思昆《计算机工程与科学》2010,32(11):111-113

NoC映射和通讯参数设计是NoC设计过程中非常重要的部分,其结果直接影响NoC的性能、面积和功耗。本文将NoC映射问题和通讯参数设计问题统一考虑,首先对NoC映射问题进行了形式化定义,然后提出了基于虫孔交换的NoC延迟性能分析方法,根据应用的通讯延迟约束,将应用模型映射到NoC拓扑结构上,并自动设计出NoC通讯参数。实验表明,本文所提出的延迟性能分析方法比以往方法精确7%~13%,映射结果和通讯参数设计更优。相似文献

15.

基于虫孔交换的NoC映射和通讯参数自动化设计方法研究

曹亚菲王大伟李思昆《计算机工程与科学》2010,32(11)

NoC映射和通讯参数设计是NoC设计过程中非常重要的部分,其结果直接影响NoC的性能、面积和功耗。本文将NoC映射问题和通讯参数设计问题统一考虑,首先对NoC映射问题进行了形式化定义,然后提出了基于虫孔交换的NoC延迟性能分析方法,根据应用的通讯延迟约束,将应用模型映射到NoC拓扑结构上,并自动设计出NoC通讯参数。实验表明,本文所提出的延迟性能分析方法比以往方法精确7%～13%,映射结果和通讯参数设计更优。相似文献

16.

基于云自适应遗传算法的NoC映射研究

许川佩陈征南任智新《计算机工程与应用》2012,48(36):70-74,104

NoC映射是NoC设计中的重要步骤,映射结果的优劣对NoC的QoS约束和通信功耗有着很大的影响。提出一种采用云自适应遗传算法实现NoC映射的方案,该算法利用云模型对传统遗传算法加以改进,以此新方法自动调整遗传算法过程中的交叉概率和变异概率,从而达到优化遗传算法的目的。结合NoC映射中的具体问题,在功耗和延时约束的限制条件下,建立了延时约束下的NoC映射功耗数学模型。实验表明,该方法在NoC映射中取得了良好的效果,降低了通信功耗。相似文献

17.

A Communication-Driven Routing Technique for Application-Specific NoCs

R. Tornero J. M. Orduña A. Mejia J. Flich J. Duato 《International journal of parallel programming》2011,39(3):357-374

Networks on Chip (NoCs) have been shown as an efficient solution to the complex on-chip communication problems derived from the increasing number of processor cores. One of the key issues in the design of NoCs is the reduction of both area and power dissipation. As a result, two-dimensional meshes have become the preferred topology, since it offers low and constant link delay. Unfortunately, manufacturing defects or even real-time failures often make the resulting topology to become irregular, preventing the use of traditional routing algorithms. This scenario shows the need for topology-agnostic routing algorithms that provide a valid routing solution when applied over any topology. This paper proposes a new communication-driven routing technique that optimizes the network performance for Application-Specific NoCs. This technique combines a flexible, topology-agnostic routing algorithm with a communication-aware mapping technique that matches the traffic generated by the application with the available network bandwidth. Since the mapping technique can be pruned as needed in order to fit either quality function values or time constraints, this technique can be adapted to fit with different computational costs. The evaluation results show that it significantly improves network performance in terms of both latency and power consumption. 相似文献

18.

Integrated core selection and mapping for mesh based Network-on-Chip design with irregular core sizes

《Journal of Systems Architecture》2015,61(9):410-422

Network-on-Chip (NoC) has been proposed to replace traditional bus based System-on-Chip (SoC) architecture to address the global communication challenges in nanoscale technologies. A major challenge in NoC based system design is to select Intellectual Property (IP) cores for implementing tasks and associate the selected cores to the routers to optimize cost and performance. These are commonly known as the process of core selection and application mapping respectively. In this paper, integrated core selection and mapping problem has been addressed. Mesh architecture has been considered for experimentation. The integrated core selection and mapping problem takes as input the application task graph, topology graph and a core library. It outputs the selected cores for the tasks and their mapping onto the topology graph, such that, all communication requirements of the application are satisfied. The cores present in a core library may perform more than one task and have non-uniform sizes. For this, a technique based on Particle Swarm Optimization (PSO) has been proposed to select cores from the given core library and map the resultant core graph onto mesh based architectures. An efficient heuristic for mapping has also been proposed, which maps the selected cores onto mesh based architectures, considering non-uniform core sizes. Comparisons have been carried out with step-by-step core selection and mapping approach and also with mapping algorithms that exist in the literature. Significant reductions have been observed in terms of communication cost over all the cases. Area comparisons have also been made. On average, improvement of 13.05% in communication cost and 2.07% in area have been observed. The proposed approach has also been compared in dynamic environment and significant reductions in the average network latency could be observed. On average, improvement of 5.48% in average network latency and 15.68% in network throughput has been observed. Comparison of energy consumption has also been done in both the cases. 相似文献

19.

Characterizing the impact of process variation on 45 nm NoC-based CMPs

C. Hernández^{Author Vitae} A. Roca Author VitaeJ. Flich Author Vitae F. Silla Author VitaeJ. Duato Author Vitae 《Journal of Parallel and Distributed Computing》2011,71(5):651-663

Current integration scales make possible to design chip multiprocessors with a large amount of cores interconnected by a NoC. Unfortunately, they also bring process variation, posing a new burden to processor manufacturers.Regarding the NoC, variability causes that the delays of links and routers do not match those initially established at design time. In this paper we analyze how variability affects the NoC by applying a new variability model to 100 instances of an 8 × 8 mesh NoC synthesized using 45 nm technology. We also show that GALS-based NoCs present communication bottlenecks due to the slower components of the network, which cause congestion, thus reducing performance. This performance reduction finally affects the applications being executed in the CMP because they may be mapped to slower areas of the chip. In this paper we show that using a mapping algorithm that considers variability data may improve application execution time up to 50%. 相似文献

20.

Packet triggered prediction based task migration for network-on-chip

Tianzhou ChenAuthor Vitae Weiwei FuAuthor VitaeBin Xie Chao WangAuthor Vitae 《Microprocessors and Microsystems》2014

The development of IC technology makes Network-on-Chip (NoC) an attractive architecture for future massive parallel systems. Task migration optimize the overall communication performance of NoCs since the changing phases of execution make static task mapping insufficient. It is well-known that the communication behavior of many applications are predictable, which makes it feasible to use prediction to guide task migration. The triggering of activating a task migration is also important. In this paper, we first defined and analyzed predictabilities of applications, and then compared different ways of triggering for migration. We then modified the Genetic Algorithm (GA) based task remapping and proposed two other task migration algorithms: Simple Exchange (SE) and Benefit Assess (BA). A mechanism called node lock is also used to reduce unnecessary and costly migrations. Simulation results on real applications from PARSEC benchmark suites show that the SE, BA and GA algorithms can reduce 21.4%, 34.0% and 34.9% of number of hops, and 17.3%, 27.2% and 26.3% in terms of average latency respectively, compared with the system without task migration; BA and SE reduce 72.0% and 78.7% of migrations without significant performance degradation compared with GA, and the node lock mechanism can further remove 37.3% and 46.0% of migrations while achieving almost the same performance. 相似文献