首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
根据多媒体处理单元的访存特点,提出一种面向高性能多媒体SoC的分组访存调度算法.该算法将访存请求按照访存ID和页地址分组,以访存组为单位进行乱序调度,并通过维护相同ID访存请求之间的顺序保证访存的正确性:综合考虑访存单元的访存效率和服务质量要求,在每个访存单元独立的调度周期内提供最低带宽保障服务.将该分组访存调度算法应用于访存调度装置,实际应用仿真结果表明,与已有基于带宽分配的访存调度算法相比,文中算法在保障访存单元带宽需求的同时降低了访存延迟,并将平均带宽利用率提高了15%.  相似文献   

2.
获取访存依赖:并发程序动态分析基础技术综述   总被引:1,自引:1,他引:0  
蒋炎岩  许畅  马晓星  吕建 《软件学报》2017,28(4):747-763
并发错误难触发、难调试、难检测.为应对这一挑战,已有动态程序分析技术通过观测或控制并发程序执行实现其质量保障.由于并发程序不确定性主要来自共享内存,实现其动态分析的基本问题即是获取线程访问共享内存的顺序,即获取访存依赖.提出访存依赖获取技术的综述框架,包含四个评价指标(即时性、准确性、高效性、简化性)、两种方法(在线追踪、离线合成)、两类应用(轨迹分析、并发控制).通过对已有技术的总结和分析框架中的空白,对未来可能的研究方向予以展望.  相似文献   

3.
随着多媒体So C中具备密集访存能力的设备数量增加,设备之间频繁争抢存储体资源,严重影响访存性能.为此提出一种面向多媒体So C的存储体访存负载均衡划分方法.通过操作系统对物理内存的管理,将设备所访问的数据映射到独立的存储体中,避免争抢频繁的设备共享存储体,减少设备间的访存冲突;划分过程基于数据量、延迟分析设备访存行为与访存冲突之间的关系,并以此来均衡各存储体的访问负载,同时提升多个设备的访存性能.该方法不依赖特殊硬件也无需修改上层应用,提供了一种透明的纯软件优化手段.将文中方法应用于真实的多媒体So C的实验结果表明,与基于带宽优先的划分方法相比,该方法在提高带宽利用率的同时降低访存延迟,将解码帧率提升8.4%~12.3%;并且在保证服务质量的情况下,可以通过进一步降低内存工作频率来减少系统功耗.  相似文献   

4.
SMP机群系统因其良好的性价比、卓越的可扩展性与可用性,逐渐成为当前高性能计算机领域的主流结构.这种结点内共享存储、结点间消息传递的两级混合结构是目前并行计算研究的热点,在单个SMP结点中,总线和内存带宽是否满足CPU和I/O的需求对于访存密集型应用的性能影响很大。本文针对访存密集型应用的特点测试分析了在SMP机群中访存冲突对系统性能的影响,结果表明我们的SMP结点存在性能瓶颈,这种量化分析对于设计大规模的基于SMP的机群系统有很好的指导意义.  相似文献   

5.
近年来,集成CPU和GPU的多处理器片上系统(multiprocessor system-on-chips,MPSoC),凭借兼顾GPU核心的并行计算能力和CPU核心的通用计算能力,已经广泛应用于工业控制、汽车电子、智慧医疗等领域.为了充分发挥CPU-GPU MPSoC的性能,开放计算语言(open computing language,OpenCL)逐渐成为一种主流的应用程序编写标准.然而,在将OpenCL应用部署到CPU-GPU MPSoC的过程中,现有研究工作大多忽略了对芯片温度和使用寿命的管理,导致处理器核心在执行应用时超过了峰值温度,甚至永久性故障的提前发生,无法保证OpenCL应用的长久稳定运行.为了弥补上述缺点,提出了一种包含静态和动态应用调度技术的方法.静态应用调度技术是基于改进交叉熵策略,将OpenCL应用的特性充分考虑在内,有效提高了OpenCL应用设计点的寻优效率.动态应用调度技术是基于反馈控制策略,克服了传统方案中无法有效应对系统运行时新到应用的缺陷,能够最小化新到应用的平均延迟.实验表明,所提方法可以将应用的平均延迟降低34.58%,同时满足温度、能耗、使用寿...  相似文献   

6.
针对嵌入式片上多处理器MPSoC(multiple processor system on chip)平台下任务并行化分配的问题,从理论上对任务调度进行了建模,针对模型中的任务间依赖问题,给出了层次任务图的分析模型;结合量化后函数的开销和OpenMP并行化思想,提出了基于复制分治调度策略的并行方案;以例子滤波算法为例,对任务并行化进行验证,实验结果分析表明,本文提出的并行化方案,合理的对任务进行分配,改善了多处理器的负载平衡,降低了处理器间通讯开销,具有较大的加速比,满足嵌入式多核平台下任务并行化的需求.  相似文献   

7.
8.
基于遗传算法提出了溢出代码和访存压力敏感的机器学习来调试寄存器分配的权值函数。不同于以往采用目标程序的运行时间作为适应值,通过静态分析寄存器分配产生的溢出代码和基本块中的访存压力来构建适应值,以减少学习时间。这些分析被限定在热点函数中,在保证适应值精度的同时进一步加快了学习速度。实验表明,快速学习仅需要考虑热点函数的编译时间,整个CPU2000CINT测试集在5 h内即可学习完毕。大部分CPU2000CINT测试例子的性能得到了提高。其中perlbmk的性能提升最高可达到7.2%。  相似文献   

9.
基于程序访存模式的低功耗存储技术   总被引:1,自引:0,他引:1  
与不断提升的计算能力相适应,移动手持设备上的存储系统结构越来越复杂,容量越来越大.这种趋势导致存储系统,主要是片上缓存和主存,在系统总能耗的占比中不断攀升.在当前手持设备多由电池驱动并且电池容量十分有限的情况下,存储系统的低功耗设计就显得十分重要.虽然现有的存储器件提供了一定的硬件节能支持,但是只有与应用程序的访存行为的规律相结合,才能充分发挥硬件的节能潜力.对现有的各种低功耗存储技术进行了梳理和总结,给出程序的访存模式的概念,归纳出访存模式在3个方面的内涵,并进一步详细介绍了程序的访存模式在片上缓存和主存低功耗技术中的应用.最后,展望未来结合访存模式进行低功耗存储系统研发的可能方向.  相似文献   

10.
任务分配是多处理器SoC功能实现与性能优化的重要步骤,严重影响着多处理器SoC系统的处理性能与效率.文中针对多媒体应用程序向异构多处理器SoC的任务分配问题,提出图结点多着色模型来描述任务分配问题,并使用进化蚁群算法进行任务分配.在任务分配的过程中,首先对多媒体应用进行预处理,包括应用特征分析、并行任务划分与功能模型生成;然后启动进化蚁群算法进行分配空间探索,直至找到满足条件的高质量任务分配方案.实验结果表明,相对于采用基本蚁群算法与遗传算法的任务分配方法,文中方法可以获得高质量的分配方案,并较大幅度地加快了任务分配空间探索的收敛速度.  相似文献   

11.
Dynamic task mapping for Network-on-Chip based systems   总被引:1,自引:0,他引:1  
Efficiency of Network-on-Chip (NoC) based multi-processor systems largely depends on optimal placement of tasks onto processing elements (PEs). Although number of task mapping heuristics have been proposed in literature, selecting best technique for a given environment remains a challenging problem. Keeping in view the fact that comparisons in original study of each heuristic may have been conducted using different assumptions, environment, and models. In this study, we have conducted a detailed quantitative analysis of selected dynamic task mapping heuristics under same set of assumptions, similar environment, and system models. Comparisons are conducted with varying network load, number of tasks, and network size for constantly running applications. Moreover, we propose an extension to communication-aware packing based nearest neighbor (CPNN) algorithm that attempts to reduce communication overhead among the interdependent tasks. Furthermore, we have conducted formal verification and modeling of proposed technique using high level Petri nets. The experimental results indicate that proposed mapping algorithm reduces communication cost, average hop count, and end-to-end latency as compared to CPNN especially for large mesh NoCs. Moreover, proposed scheme achieves up to 6% energy savings for smaller mesh NoCs. Further, results of formal modeling indicate that proposed model is workable and operates according to specifications.  相似文献   

12.
High demand 3-D scenes on embedded systems draw the developers’ attention to use the whole resources of current low-power processors and add dedicated hardware as a graphic accelerator unit to deal with real-time realistic scene rendering. Photon mapping, as one of the most powerful techniques to render highly realistic 3-D images by high amounts of floating-point operations, is very time-consuming. To use the advantages of multiprocessor systems to make 3-D scenes, parallel photon-mapping rendering on a homogeneous multiprocessor SoC (MPSoC) platform along with a mesh NoC by an adaptive wormhole routing method to communicate packets among cores is proposed in this paper. To make efficient use of the MPSoC platform to carry out photon-mapping rendering, many methods concerning the increase of load balancing, the efficient use of memory, and the decrease of communication cost to achieve a scalable application are explored in this paper. The resulting MPSoC platform is verified and evaluated by cycle-accurate simulations for different sizes of the mesh NoC. As expected, the proposed methods can obtain excellent load balancing and achieve a maximum of 44.3 times faster on an 8-by-8 MPSoC platform than on a single-core MPSoC platform.  相似文献   

13.
14.
3-D Networks-on-Chip(NoC) emerge as a potent solution to address both the interconnection and design complexity problems facing future Multiprocessor System-on-Chips(MPSoCs).Effective run-time mapping on such 3-D NoC-based MPSoCs can be quite challenging,as the arrival order and task graphs of the target applications are typically not known a priori,which can be further complicated by stringent energy requirements for NoC systems.This paper thus presents an energy-aware run-time incremental mapping algorithm(ERIM) for 3-D NoC which can minimize the energy consumption due to the data communications among processor cores,while reducing the fragmentation effect on the incoming applications to be mapped,and simultaneously satisfying the thermal constraints imposed on each incoming application.Specifically,incoming applications are mapped to cuboid tile regions for lower energy consumption of communication and the minimal routing.Fragment tiles due to system fragmentation can be gleaned for better resource utilization.Extensive experiments have been conducted to evaluate the performance of the proposed algorithm ERIM,and the results are compared against the optimal mapping algorithm(branch-and-bound) and two heuristic algorithms(TB and TL).The experiments show that ERIM outperforms TB and TL methods with significant energy saving(more than 10%),much reduced average response time,and improved system utilization.  相似文献   

15.
延迟优化的片上网络低功耗映射*   总被引:2,自引:1,他引:2  
片上网络(NoC)是解决传统基于总线的片上系统(SoC)所面临的功耗、延迟、同步和信号完整性等挑战的有效解决方案。功耗和延迟是NoC设计中的重要约束和性能指标,在设计的各个阶段都存在着优化空间。基于蚁群优化算法,通过通信链路上并发通信事件的均匀分布来降低NoC映射阶段的功耗和延迟。仿真实验表明,与链路通信量负载均衡的方法相比,该方案能进一步在拓扑映射阶段优化功耗和延迟。  相似文献   

16.
随着集成电路技术的迅速发展,芯片的集成度不断提高,片上众多处理单元间的高效互连成为关键问题,因而相继出现了片上系统(system-on-chip, SoC)和二维片上网络(two-dimensional network-on-chip, 2D NoC).当二维片上网络在多方面达到瓶颈时,三维片上网络(three-dimensional network-on-chip, 3D NoC)应运而生.三维片上网络已引起学术界和产业界的高度重视,三维片上网络低功耗映射是其中的1个关键问题.之前的研究曾提出过一种基于改进遗传算法的3D NoC低功耗映射算法,并收到了良好的仿真效果.但当问题规模变大时,计算量随之增大、运行效率明显降低.针对这一问题,对3D NoC中面向功耗优化的二次改进遗传算法任务映射机制进行研究,提出了一种新的3D NoC低功耗映射算法,并对该映射算法进行了仿真实验.实验结果表明,在种群规模较大的条件下,该算法不仅能够继续降低功耗,而且能够大幅度地减少映射算法的运行时间.  相似文献   

17.
Abstract

Network-on-Chip provides a packet-based and scalable inter-connected structure for spiking neural networks. However, existing neural mapping methods just distribute all neurons of a population into an on-chip network core or nearby cores sequentially. As there is no connection among population, the population based mapping degrades inter-neuron communicating performance between different cores. This paper presents a Cross-LAyer based neural MaPping method that maps synaptic connected neurons belonging to adjacent layers into the same on-chip network node. In order to adapt to various input patterns, the strategy also takes input spike rate into consideration and remap neurons for improving mapping efficiency. The method helps to reduce inter-core communication cost. The experimental results demonstrate the efficient results of the proposed mapping strategy in the aspect of spike transfer latency as well as dynamic energy cost improvement. In the applications of handwritten digits and edge extraction, in which the type of interconnection among neurons is different, the neural mapping algorithm reduces spike average transfer latency by maximum 42.83%, and reduces dynamic energy by maximum 36.29%.  相似文献   

18.
This paper describes a technique for performing mapping and scheduling of tasks belonging to an executable application into a NoC-based MPSoC, starting from its UML specification. A toolchain is used in order to transform the high-level UML specification into a middle-level representation, which takes the form of an annotated task graph. Such an input task graph is used by an optimization engine for the sake of carrying out the design space exploration. The optimization engine relies on a Population-based Incremental Learning (PBIL) algorithm for performing mapping and scheduling of tasks into the NoC. The PBIL algorithm is also proposed for dynamic mapping of tasks in order to deal with failure events at runtime. Simulation results are promising and exhibit a good performance of the proposed solution when problem size is increased.  相似文献   

19.
20.
软硬件划分与调度是软硬件协同设计的关键环节,是经典的组合优化问题。本文针对调度与软硬件划分问题提出一种高效的启发式算法。调度算法根据任务的出度及软件计算时间对任务赋予不同的优先级,出度越大,优先级越高,出度相同的情况下,软件计算时间越大,优先级越高。划分算法首先寻找关键路径,然后将关键路径上具有最高受益面积比的任务交由硬件去实现。每次迭代更新当前关键路径的调度长度及剩余硬件面积。继续循环,直到剩余的硬件面积不再满足关键路径上的任何一个软件任务所需的硬件面积的要求为止,这样使得硬件面积的使用率比较高。实验表明,该算法对已有算法的改进可达到38%。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号