共查询到20条相似文献,搜索用时 109 毫秒
1.
根据多媒体处理单元的访存特点,提出一种面向高性能多媒体SoC的分组访存调度算法.该算法将访存请求按照访存ID和页地址分组,以访存组为单位进行乱序调度,并通过维护相同ID访存请求之间的顺序保证访存的正确性:综合考虑访存单元的访存效率和服务质量要求,在每个访存单元独立的调度周期内提供最低带宽保障服务.将该分组访存调度算法应用于访存调度装置,实际应用仿真结果表明,与已有基于带宽分配的访存调度算法相比,文中算法在保障访存单元带宽需求的同时降低了访存延迟,并将平均带宽利用率提高了15%. 相似文献
2.
并发错误难触发、难调试、难检测.为应对这一挑战,已有动态程序分析技术通过观测或控制并发程序执行实现其质量保障.由于并发程序不确定性主要来自共享内存,实现其动态分析的基本问题即是获取线程访问共享内存的顺序,即获取访存依赖.提出访存依赖获取技术的综述框架,包含四个评价指标(即时性、准确性、高效性、简化性)、两种方法(在线追踪、离线合成)、两类应用(轨迹分析、并发控制).通过对已有技术的总结和分析框架中的空白,对未来可能的研究方向予以展望. 相似文献
3.
随着多媒体So C中具备密集访存能力的设备数量增加,设备之间频繁争抢存储体资源,严重影响访存性能.为此提出一种面向多媒体So C的存储体访存负载均衡划分方法.通过操作系统对物理内存的管理,将设备所访问的数据映射到独立的存储体中,避免争抢频繁的设备共享存储体,减少设备间的访存冲突;划分过程基于数据量、延迟分析设备访存行为与访存冲突之间的关系,并以此来均衡各存储体的访问负载,同时提升多个设备的访存性能.该方法不依赖特殊硬件也无需修改上层应用,提供了一种透明的纯软件优化手段.将文中方法应用于真实的多媒体So C的实验结果表明,与基于带宽优先的划分方法相比,该方法在提高带宽利用率的同时降低访存延迟,将解码帧率提升8.4%~12.3%;并且在保证服务质量的情况下,可以通过进一步降低内存工作频率来减少系统功耗. 相似文献
4.
SMP机群系统因其良好的性价比、卓越的可扩展性与可用性,逐渐成为当前高性能计算机领域的主流结构.这种结点内共享存储、结点间消息传递的两级混合结构是目前并行计算研究的热点,在单个SMP结点中,总线和内存带宽是否满足CPU和I/O的需求对于访存密集型应用的性能影响很大。本文针对访存密集型应用的特点测试分析了在SMP机群中访存冲突对系统性能的影响,结果表明我们的SMP结点存在性能瓶颈,这种量化分析对于设计大规模的基于SMP的机群系统有很好的指导意义. 相似文献
5.
近年来,集成CPU和GPU的多处理器片上系统(multiprocessor system-on-chips,MPSoC),凭借兼顾GPU核心的并行计算能力和CPU核心的通用计算能力,已经广泛应用于工业控制、汽车电子、智慧医疗等领域. 为了充分发挥CPU-GPU MPSoC的性能,开放计算语言(open computing language,OpenCL)逐渐成为一种主流的应用程序编写标准.然而,在将OpenCL应用部署到CPU-GPU MPSoC的过程中,现有研究工作大多忽略了对芯片温度和使用寿命的管理,导致处理器核心在执行应用时超过了峰值温度,甚至永久性故障的提前发生,无法保证OpenCL应用的长久稳定运行. 为了弥补上述缺点,提出了一种包含静态和动态应用调度技术的方法. 静态应用调度技术是基于改进交叉熵策略,将OpenCL应用的特性充分考虑在内,有效提高了OpenCL应用设计点的寻优效率. 动态应用调度技术是基于反馈控制策略,克服了传统方案中无法有效应对系统运行时新到应用的缺陷,能够最小化新到应用的平均延迟. 实验表明, 所提方法可以将应用的平均延迟降低34.58%,同时满足温度、能耗、使用寿命的约束. 相似文献
6.
针对嵌入式片上多处理器MPSoC(multiple processor system on chip)平台下任务并行化分配的问题,从理论上对任务调度进行了建模,针对模型中的任务间依赖问题,给出了层次任务图的分析模型;结合量化后函数的开销和OpenMP并行化思想,提出了基于复制分治调度策略的并行方案;以例子滤波算法为例,对任务并行化进行验证,实验结果分析表明,本文提出的并行化方案,合理的对任务进行分配,改善了多处理器的负载平衡,降低了处理器间通讯开销,具有较大的加速比,满足嵌入式多核平台下任务并行化的需求. 相似文献
7.
8.
基于遗传算法提出了溢出代码和访存压力敏感的机器学习来调试寄存器分配的权值函数。不同于以往采用目标程序的运行时间作为适应值,通过静态分析寄存器分配产生的溢出代码和基本块中的访存压力来构建适应值,以减少学习时间。这些分析被限定在热点函数中,在保证适应值精度的同时进一步加快了学习速度。实验表明,快速学习仅需要考虑热点函数的编译时间,整个CPU2000CINT测试集在5 h内即可学习完毕。大部分CPU2000CINT测试例子的性能得到了提高。其中perlbmk的性能提升最高可达到7.2%。 相似文献
9.
基于程序访存模式的低功耗存储技术 总被引:1,自引:0,他引:1
与不断提升的计算能力相适应,移动手持设备上的存储系统结构越来越复杂,容量越来越大.这种趋势导致存储系统,主要是片上缓存和主存,在系统总能耗的占比中不断攀升.在当前手持设备多由电池驱动并且电池容量十分有限的情况下,存储系统的低功耗设计就显得十分重要.虽然现有的存储器件提供了一定的硬件节能支持,但是只有与应用程序的访存行为的规律相结合,才能充分发挥硬件的节能潜力.对现有的各种低功耗存储技术进行了梳理和总结,给出程序的访存模式的概念,归纳出访存模式在3个方面的内涵,并进一步详细介绍了程序的访存模式在片上缓存和主存低功耗技术中的应用.最后,展望未来结合访存模式进行低功耗存储系统研发的可能方向. 相似文献
10.
任务分配是多处理器SoC功能实现与性能优化的重要步骤,严重影响着多处理器SoC系统的处理性能与效率.文中针对多媒体应用程序向异构多处理器SoC的任务分配问题,提出图结点多着色模型来描述任务分配问题,并使用进化蚁群算法进行任务分配.在任务分配的过程中,首先对多媒体应用进行预处理,包括应用特征分析、并行任务划分与功能模型生成;然后启动进化蚁群算法进行分配空间探索,直至找到满足条件的高质量任务分配方案.实验结果表明,相对于采用基本蚁群算法与遗传算法的任务分配方法,文中方法可以获得高质量的分配方案,并较大幅度地加快了任务分配空间探索的收敛速度. 相似文献
11.
《Journal of Systems Architecture》2015,61(7):293-306
Efficiency of Network-on-Chip (NoC) based multi-processor systems largely depends on optimal placement of tasks onto processing elements (PEs). Although number of task mapping heuristics have been proposed in literature, selecting best technique for a given environment remains a challenging problem. Keeping in view the fact that comparisons in original study of each heuristic may have been conducted using different assumptions, environment, and models. In this study, we have conducted a detailed quantitative analysis of selected dynamic task mapping heuristics under same set of assumptions, similar environment, and system models. Comparisons are conducted with varying network load, number of tasks, and network size for constantly running applications. Moreover, we propose an extension to communication-aware packing based nearest neighbor (CPNN) algorithm that attempts to reduce communication overhead among the interdependent tasks. Furthermore, we have conducted formal verification and modeling of proposed technique using high level Petri nets. The experimental results indicate that proposed mapping algorithm reduces communication cost, average hop count, and end-to-end latency as compared to CPNN especially for large mesh NoCs. Moreover, proposed scheme achieves up to 6% energy savings for smaller mesh NoCs. Further, results of formal modeling indicate that proposed model is workable and operates according to specifications. 相似文献
12.
13.
AbstractNetwork-on-Chip provides a packet-based and scalable inter-connected structure for spiking neural networks. However, existing neural mapping methods just distribute all neurons of a population into an on-chip network core or nearby cores sequentially. As there is no connection among population, the population based mapping degrades inter-neuron communicating performance between different cores. This paper presents a Cross-LAyer based neural MaPping method that maps synaptic connected neurons belonging to adjacent layers into the same on-chip network node. In order to adapt to various input patterns, the strategy also takes input spike rate into consideration and remap neurons for improving mapping efficiency. The method helps to reduce inter-core communication cost. The experimental results demonstrate the efficient results of the proposed mapping strategy in the aspect of spike transfer latency as well as dynamic energy cost improvement. In the applications of handwritten digits and edge extraction, in which the type of interconnection among neurons is different, the neural mapping algorithm reduces spike average transfer latency by maximum 42.83%, and reduces dynamic energy by maximum 36.29%. 相似文献
14.
15.
片上互连网络为多核体系结构提供了高效的通信支持。目前的片上网络通常采用单向传输链路,链路资源利用率较低。为了实现链路带宽资源高效分配、进而高效利用链路带宽资源,提出了一种新的双向链路调度算法,并设计了一种支持此算法的双向链路路由器。这种新型的路由器结构能够在不影响路由原有数据通道条件下,提供一条旁路数据通道来快速传输数据。实验结果表明,应用该双向链路路由器可使Mesh网络饱和吞吐率和链路平均利用率分别得到最大83.3%和24.53%的提升。 相似文献
16.
《Journal of Systems Architecture》2013,59(7):429-440
This paper describes a technique for performing mapping and scheduling of tasks belonging to an executable application into a NoC-based MPSoC, starting from its UML specification. A toolchain is used in order to transform the high-level UML specification into a middle-level representation, which takes the form of an annotated task graph. Such an input task graph is used by an optimization engine for the sake of carrying out the design space exploration. The optimization engine relies on a Population-based Incremental Learning (PBIL) algorithm for performing mapping and scheduling of tasks into the NoC. The PBIL algorithm is also proposed for dynamic mapping of tasks in order to deal with failure events at runtime. Simulation results are promising and exhibit a good performance of the proposed solution when problem size is increased. 相似文献
17.
This paper presents a high level power estimation methodology for a Network-on-Chip (NoC) router, that is capable of providing cycle accurate power profile to enable power exploration at system level. Our power macro model is based on the number of flits passing through a router as the unit of abstraction. Experimental results show that our power macro model incurs less than 5% average absolute cycle error compared to gate level analysis. The high level power macro model allows network power to be readily incorporated into simulation infrastructures, providing a fast and cycle accurate power profile, to enable power optimization such as power-aware compiler, core mapping, and scheduling techniques for CMP. As a case study, we demonstrate the use of our model for evaluating the effect of different core mappings using SPLASH-2 benchmark showing the utility of our power macro model. 相似文献
18.
C. HernándezAuthor Vitae A. Roca Author VitaeJ. Flich Author Vitae F. Silla Author VitaeJ. Duato Author Vitae 《Journal of Parallel and Distributed Computing》2011,71(5):651-663
Current integration scales make possible to design chip multiprocessors with a large amount of cores interconnected by a NoC. Unfortunately, they also bring process variation, posing a new burden to processor manufacturers.Regarding the NoC, variability causes that the delays of links and routers do not match those initially established at design time. In this paper we analyze how variability affects the NoC by applying a new variability model to 100 instances of an 8 × 8 mesh NoC synthesized using 45 nm technology. We also show that GALS-based NoCs present communication bottlenecks due to the slower components of the network, which cause congestion, thus reducing performance. This performance reduction finally affects the applications being executed in the CMP because they may be mapped to slower areas of the chip. In this paper we show that using a mapping algorithm that considers variability data may improve application execution time up to 50%. 相似文献
19.
《Journal of Systems Architecture》2014,60(3):293-304
This paper proposes a novel Colored Petri Net (CPN) based dynamic scheduling scheme, which aims at scheduling real-time tasks on multiprocessor system-on-chip (MPSoC) platforms. Our CPN based scheme addresses two key issues on task scheduling problems, dependence detecting and task dispatching. We model inter-task dependences using CPN, including true-dependences, output-dependences, anti-dependences and structural dependences. The dependences can be detected automatically during model execution. Additionally, the proposed model takes the checking of real-time constraints into consideration. We evaluated the scheduling scheme on the state-of-art FPGA based multiprocessor hardware system and modeled the system behavior using CPN tools. Simulations and state space analyses are conducted on the model. Experimental results demonstrate that our scheme can achieve 98.9% of the ideal speedup on a real FPGA based hardware prototype. 相似文献
20.
为了满足系统芯片对通信带宽的要求,片上网络正逐渐取代总线成为当前多核及众核系统的主流互连方案,然而由于芯片特征尺寸的不断减小,芯片内发生故障的概率显著增加.为了提供可靠的片上通信,提出一种低成本的可重构路由算法.该算法基于无共享边界的矩形故障模型,按照故障区与网络边界的相对位置对故障区进行分类;针对不同类型的故障区定义了具体的路由器状态更新策略;重构后的片上网络可以容忍任意数目、任意分布的路由器以及链路故障.与当前容错设计方案不同,文中算法不需要增加虚拟通道来保证网络的无死锁特性,因此具有低成本、高可靠的特性.仿真实验结果表明,文中算法适用于处理器与缓存,或缓存与缓存之间的片上通信. 相似文献