期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

王桂彬杨学军徐新海林一松李鑫《软件学报》2011,22(9):2222-2234

以类OpenMP的并行程序为研究对象,在满足性能约束的条件下,结合异构系统并行循环调度和处理器动态电压调节技术优化系统功耗.首先建立了异构系统功耗感知的并行循环调度问题基本模型;然后,通过分析方法给出异构系统并行循环调度的能耗下界,该下界可用于评估功耗优化方法的实际效率;进而将异构系统并行循环调度问题归纳为整数规划问题,在此基础上,提出了处理器内循环再调度方法进一步降低功耗.最后,以CPU-GPU异构系统为平台评测了10个典型kernel程序.实验结果表明,该方法可以有效降低系统功耗,提高系统效能. 相似文献

2.

基于通信感知任务划分的异构系统低功耗优化方法

王桂彬《小型微型计算机系统》2011,32(12)

针对由通用微处理器和专用加速部件构成的异构并行系统,提出结合通信感知的并行任务划分和动态电压频率调节技术的异构系统能耗优化方法,该方法旨在将并行任务图划分并映射在异构处理单元,在满足性能约束的条件下最小化系统能耗.在目前典型异构并行系统中,主处理器与加速部件大都通过系统总线连接,必然引入不可忽略的通信开销,因此通信感知的任务划分技术是该问题的关键.提出了基于整数线性规划的静态最优能耗优化方法和基于遗传算法的动态能耗优化方法.并通过一个典型科学计算应用验证了本文方法的有效性. 相似文献

3.

异构并行系统能耗优化分析模型

王桂彬杨学军唐滔徐新海《软件学报》2012,23(6):1382-1396

随着处理器功耗不断增大,功耗问题逐渐成为高性能计算机系统设计与实现的首要问题.当前,异构系统已成为高性能计算机的发展趋势之一.与传统同构体系结构相比,异构体系结构具有更高的理论峰值性能和能效,但是如何在满足应用性能的条件下充分发掘异构系统的能效优势,仍是一个挑战性问题.通过将应用程序抽象为由串行段和并行段组成的一般程序模型,建立了异构并行系统能耗优化模型通过分析方法依次给出并行段以及全程序(多程序段)能耗最优时处理器间满足的关系,分别给出了时间约束下能耗最优的处理器频率选择算法.最后,以CPU-GPU异构系统为平台,通过8个典型应用程序验证了方法的有效性. 相似文献

4.

异构多处理器SoC 的应用算法性能优化方法 总被引：1，自引：0，他引：1

赵鹏严明李思昆《软件学报》2011,22(7):1475-1487

在嵌入式多媒体处理领域中,多处理器片上系统(multi-processor system-on-chip,简称MPSoC)的应用越来越广泛.多媒体处理MPSoC通常采用"主处理器核+多个异构协处理器核"的主流体系结构.该结构兼顾了MPSoC系统的通用性与灵活性、性能与功耗,但也向MPSoC的性能优化方法提出了更高的要求.针对异构MPSoC上的多媒体应用算法,提出了一种MPSoC多媒体处理性能优化方法.该方法经过应用特征分析、循环仿射划分、应用向MPSoC各处理器核的映射,实现了优化的数据局部性与多级并行性,从而提高了异构MPSoC上多媒体应用算法的性能.实验结果表明,该方法对于多媒体应用算法在异构MPSoC上的处理性能优化方面取得了明显效果. 相似文献

5.

面向异构多处理器设备的自适应命令解释系统

刘文卿李栋崔莉《软件学报》2017,28(S1):11-19

智能化赋予了物联网更深刻的实用价值,但是在计算能力强与功耗低的之间寻求性能最优是目前物联网设备极难解决的问题.异构多处理器结构与单一或者同构的多处理器相比可以结合不同处理器的优势,同时满足高计算能力与低功耗的系统需求,但异构多处理器结构下软件编程难度大的问题以及如何优化顶层应用在多处理器设备上的运行性能都是目前亟待解决的技术难题.针对以上问题,设计并实现了一个面向异构多处理器设备的自适应命令解释系统.首先,该系统允许用户将物联网应用安装到设备上,应用程序以命令脚本形式呈现;其次,系统设计了命令在异构多处理器设备上的自动分发算法,该算法考虑性能和功耗的多维参数,在满足时间上限的条件下最优化应用执行能耗.最后,提出了针对同时满足不同用户应用需求的解决方案,在物联网设备的资源受限的条件下,根据具体用户使用习惯,提出了一种基于用户使用历史的命令解释系统自适应方案,可以根据用户个性化习惯自动完成命令解释系统的自适应部署和运行时优化. 相似文献

6.

异构系统的异步应用级Checkpointing技术

贾佳《计算机工程与科学》2011,33(11)

应用级checkpointing技术是同构系统上最为常用和成熟的容错技术,但在异构系统下的应用还处于起步阶段,还没有一套严谨合理的针对异构系统架构和故障模型特点的实现方案和配置方法。针对这一现况,本文基于CUDA异构系统的体系结构和编程模型,对CUDA程序在CPU和GPU上的执行模式进行分析,提出了一种面向异构系统应用级checkpointing技术的异步执行机制,并基于这一机制对异构系统的检查点优化设置问题进行讨论,设计了一套优化方案。最后在CUDA平台下通过三个实例验证了这一技术的可行性和实用性,并进行了性能评估。结果表明,这种面向CPU-GPU的异构系统的应用级checkpointing异步执行机制是行之有效的,相比CPU-GPU同步执行的checkpointing机制在设置上更为灵活,优化空间更大。而本文基于这一机制所提出的检查点优化设置方法也有效地减少了check-pointing的开销,从而获得了更高的容错性能。相似文献

7.

一种面向CPU-GPU 异构系统的容错方法

徐新海杨学军林宇斐林一松唐滔《软件学报》2011,22(10):2538-2552

近年来,为了缓解日益严重的功耗问题,异构并行体系结构已成为超级计算机发展的一个重要趋势.图形处理器(graphics processing unit,简称GPU)凭借其超高的计算性能和性能功耗比,作为一种高效的加速部件已被广泛应用于高性能计算领域.但是,GPU先天的可靠性缺陷势必加剧超级计算机的可靠性问题.目前,国际上关于CPU-GPU异构系统容错技术的研究工作主要将GPU从异构系统中独立出来,以每次调用为粒度对其进行容错处理.设计了一种面向CPU-GPU异构系统的Lazy容错方法,给出了基于编译指导命令的容错框架及其约束,并讨论了相关的编译实现和优化方法,最后通过实验验证了该方法的正确性.实验结果表明,与现有的容错方法相比,利用所设计的LazyFT容错方法对GPGPU(general purpose computation on graphics hardware)程序进行容错处理,可以明显降低容错代价. 相似文献

8.

PID控制器的性能监控与评估

刘小艳张泉灵苏宏业《计算机与应用化学》2010,27(1)

为了解决目前工业系统中普遍使用的PID控制器由于工况变化等原因引起的系统发生时变而导致所在回路PID控制器性能可能下降的问题,本文提出了一种专门针对PID控制器进行性能评估、优化及监控的方法,即:PID循环评估优化算法。该算法利用系统闭环输入输出数据,使用基于MVC(minimum variance control)的PID最小方差准则,来对PID控制器的性能进行评估,并且计算出在最小方差意义下最优PID控制器参数;评估过程结果与现实系统输出方差进行比较,做为PID参数在线优化的判断依据,当现实系统性能低于某一标准的时候对控制器进行优化处理。在整个算法中,通过输入输出数据的处理与判断,利用评估优化后的PID参数对系统进行控制,并再次回到最初的输入输出数据的处理和判断过程,实现在控制过程中的系统性能监控。本文的计算机仿真试验验证了该方法的有效性,由图可看出,系统发生渐变和突变后,当输出方差超过了程序限定的标准时,在1 300秒内系统能自动评估并施行优化而达到稳定。该循环评估优化算法现实了在对系统进行性能评估监控的同时,能按照一定条件作为标准对系统的PID参数进行优化,最终使得系统具有自我监控评估和自我优化的能力... 相似文献

9.

基于增量式Q学习的固定翼无人机跟踪控制性能优化

下载免费PDF全文

赵振根程磊《控制与决策》2024,39(2):391-400

针对固定翼无人机纵向控制的高性能需求,提出一种控制系统性能优化结构.该结构包括一个使系统稳定的标称控制器和一个参与性能优化的增量式控制器.控制系统增量式的实现不会改变原有的控制系统,而是仅对标称控制系统做控制输入的补偿与控制性能的优化.基于Q学习理论进行增量式控制器设计,针对状态信息完全可获得的系统,设计一种基于状态反馈的增量式Q学习算法.当状态信息不能完全获得时,利用系统输入、输出和参考信号数据,设计一种基于输出反馈的增量式Q学习算法.两种增量式控制器均是在数据驱动环境下自适应学习增量式控制律,无需提前知道系统动力学模型以及标称控制器的控制增益.此外,证明了增量式Q学习方法在满足持续激励条件的激励噪声下,对Q函数贝尔曼方程的求解没有偏差.最后,通过对F-16飞行器纵向模型实例的仿真验证该方法的有效性. 相似文献

10.

性能非对称多核处理器下异构感知调度技术

赵姗杨秋松李明树《软件学报》2019,30(4):1164-1190

为了满足应用程序的多样化需求,异构多核处理器出现并逐渐进入市场,其中的处理核心（core）具有不同的微架构或者指令集架构（ISA）,为应用提供多样化特性支持,比如指令级并行（ILP）、内存级并行（MLP）,这些核心协同工作满足整个计算系统的优化目标,比如高性能、低功耗或者良好的能效.然而,目前主流的调度技术主要是针对传统同构处理器架构设计,没有考虑异构硬件能力的差异性.在异构多核处理器环境下,调度技术如何感知硬件的异构特性,为不同类型的应用程序提供更加合适和匹配的硬件资源,这是值得探索的问题.对近年来在该研究领域的成果进行了综述研究,特别是在性能非对称多核处理器架构下,异构调度技术面临的优化目标、分析模型、调度决策和算法评估等主要问题进行了分析和描述,并依次对相关技术进行了系统的总结,最后从软硬件融合的角度对今后的研究工作进行了展望. 相似文献

11.

A Heterogeneous Mixed-Mode Execution Model for Massively Parallel Systems

《Journal of Parallel and Distributed Computing》1999,56(1):2-16

In this paper, we consider a massively parallel system that is composed of heterogeneous processors, that is, processors with different processing power, and that combines the advantages of the SIMD and MIMD architectures. The heterogeneous mixed-mode (HeMM) execution model is composed of two main components, which operate in the well-known SIMD and MIMD paradigms. The main computing power comes from a component that is composed of a massive number of processors and operates in a data parallel manner. The other component is composed of a few (or even one) fast processors which operate in the MIMD paradigm. The operation of a small number of processors in an MIMD paradigm has been well demonstrated through actual systems. The processors in this component add flexibility to the execution of the parallel programs such that it adjusts to the changing parallelism of the program to enhance the performance. Based on this execution model we analyze the gains in performance that is obtainable by this new system. We show that substantial performance gains can be obtained by using the HeMM system. 相似文献

12.

HPP controller: a system controller for high performance computing

Fei CHEN Zheng CAO Kai WANG Xuejun AN Ninhui SUN 《Frontiers of Computer Science》2010,4(4):456

This paper introduces the design of a hyper parallel processing (HPP) controller, which is a system controller used in heterogeneous high performance computing systems. It connects several heterogeneous processors via HyperTransport (HT) interfaces, a commercial Infiniband HCA card with PCI-express interface, and a customized global synchronization network with self-defined high-speed interface. To accelerate intra-node communication and synchronization, global address space is supported and some dedicated hardware is integrated in the HPP controller to enable intra-node memory and shared I/O resources. On the prototype system with the HPP controller, evaluation results show that the proposed design achieves high communication efficiency, and obvious acceleration to synchronization operations. 相似文献

13.

HPP controller: a system controller for high performance computing

Fei Chen Zheng Cao Kai Wang Xuejun An Ninhui Sun 《Frontiers of Computer Science in China》2010,4(4):456-465

This paper introduces the design of a hyper parallel processing (HPP) controller, which is a system controller used in heterogeneous high performance computing systems. It connects several heterogeneous processors via HyperTransport (HT) interfaces, a commercial Infiniband HCA card with PCI-express interface, and a customized global synchronization network with self-defined high-speed interface. To accelerate intra-node communication and synchronization, global address space is supported and some dedicated hardware is integrated in the HPP controller to enable intra-node memory and shared I/O resources. On the prototype system with the HPP controller, evaluation results show that the proposed design achieves high communication efficiency, and obvious acceleration to synchronization operations. 相似文献

14.

Adaptive coordinated control of engine speed and battery charging voltage

Jiangyan ZHANG Xiaohong JIAO 《控制理论与应用》2008,6(1):69-73

In this paper, the control problem of auxiliary power unit (APU) for hybrid electric vehicles is investigated.An adaptive controller is provided to achieve the coordinated control between the engine speed and the battery charging voltage. The proposed adaptive coordinated control laws for the throttle angle of the engine and the voltage of the powerconverter can guarantee not only the asymptotic tracking performance of the engine speed and the regulation of the battery charging voltage, but also the robust stability of the closed loop system under external load changes. Simulation results are given to verify the performance of the proposed adaptive controller. 相似文献

15.

Adaptive coordinated control of engine speed and battery charging voltage

Jiangyan ZHANG Xiaohong JIAO 《控制理论与应用(英文版)》2008,6(1):69-73

In this paper, the control problem of auxiliary power unit （APU） for hybrid electric vehicles is investigated. An adaptive controller is provided to achieve the coordinated control between the engine speed and the battery charging voltage. The proposed adaptive coordinated control laws for the throttle angle of the engine and the voltage of the power-converter can guarantee not only the asymptotic tracking performance of the engine speed and the regulation of the battery charging voltage, but also the robust stability of the closed loop system under external load changes. Simulation results are given to verify the performance of the proposed adaptive controller. 相似文献

16.

异构多核处理器体系结构设计研究 总被引：2，自引：0，他引：2

陈芳园张冬松王志英《计算机工程与科学》2011,33(12):27-36

多核技术成为当今处理器发展的重要方向,异构多核处理器由于可将不同类型的计算任务分配到不同类型的处理器核上并行处理,从而为不同需求的应用提供更加灵活、高效的处理机制而成为当今研究的热点.本文从体系结构的角度探讨了异构多核处理器设计中的关键点,从内核结构、互连方式、存储系统、操作系统支持、测试与验证、动态电压调节等方面分析... 相似文献

17.

Corollaries to Amdahl's Law for Energy 总被引：1，自引：0，他引：1

《Computer Architecture Letters》2008,7(1):25-28

This paper studies the important interaction between parallelization and energy consumption in a parallelizable application. Given the ratio of serial and parallel portion in an application and the number of processors, we first derive the optimal frequencies allocated to the serial and parallel regions in the application to minimize the total energy consumption, while the execution time is preserved (i.e., speedup = 1). We show that dynamic energy improvement due to parallelization has a function rising faster with the increasing number of processors than the speed improvement function given by the well-known Amdahl's Law. Furthermore, we determine the conditions under which one can obtain both energy and speed improvement, as well as the amount of improvement. The formula we obtain capture the fundamental relationship between parallelization, speedup, and energy consumption and can be directly utilized in energy aware processor resource management. Our results form a basis for several interesting research directions in the area of power and energy aware parallel processing. 相似文献