首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 125 毫秒
1.
为了明确重构能否优化程序的WCET,分析了WCET估计的基本原理,进而明确了WCET优化的基本原则.以其为依据,从传统重构方法中甄选出七种用于WCET优化.实验结果表明,源码重构可以优化WCET,但其优化效果受到目标处理器的配置、程序的控制结构、编译器的优化级别等多方面的影响.相对于传统基于编译器的性能优化技术,源码重构更适合于实时程序开发的早期编码阶段.合理应用源码重构将有助于及时修复时效缺陷、确保软件的时效安全.  相似文献   

2.
给出了一种基于模拟器的动态性能数据的统计方法.通过该方法,可以高效地实现对应用程序的分支执行概率、关键路径的统计.这种方法相比于依靠编译器来实现的性能分析工具而言,它具备实现简单,复杂度低等优势.  相似文献   

3.
文章根据安徽省电力公司ERP系统在运行过程对二次开发程序性能优化方法进行了总结,介绍了ABAP程序性能分析方法,详述ABAP程序常用和非常用性能优化方法,并对程序优化后的运行效果进行了总结。  相似文献   

4.
随着信息化建设的深入和普及,信息系统已经成为了社会的生产、生活重要组成部分,信息系统由各类型复杂的软、硬件组成,功能逻辑结构复杂,数据种类多样,系统的性能犹如系统的生命,是系统正常运行服务的关键,越来越受到人们的重视。如何优化系统性能,是系统设计研发者们必须考虑的问题。性能优化目标只有一个就是提高系统性能,但是性能分析优化的方法策略却多种多样,如系统的架构优化,程序的逻辑优化,内存、I/O、网络、磁盘优化,数据库优化等等。如何选择合适的优化方法,解决性能问题,是系统性能优化的关键。  相似文献   

5.
为对CUDA并行程序内核性能进行分析和预测,从而指导并行程序设计及性能优化,提出一种性能预测框架.1)从GPU编程模型和设备架构细节入手,以线程束为研究单位,通过整合与GPU程序用时密切相关的软硬件基本特征,定义了并行空间闲置度、流处理器线程束负载、并行效应因子等高层次性能相关特征.2)基于上述特征,框架针对线程负载均衡型GPU程序,评估内核函数在不同问题规模以及执行配置下的执行时间.3)依据性能评估原理提出了内核函数执行配置参数的优化策略.验证实验结果表明,该框架在两种典型情境下对现有程序性能的平均预测准确率分别达到89%和94%,客观归纳了高层次特征与程序性能间的相关关系,且能定性分析并行算法性能水平.  相似文献   

6.
针对TMS320C64X系列芯片特点,结合H.264/AVC编码器算法结构,给出了具体的优化实现方案,包括编译器优化、Cache优化、DMA优化及关键代码线性汇编优化.结合实例详细介绍了高效率线性汇编代码的编写方法,本方法能在提高数据吞吐量的同时提高程序并行度.  相似文献   

7.
GCC基于IA-64谓词执行的IF转换技术研究   总被引:1,自引:0,他引:1  
编译器在提高程序运行速度和发挥处理器性能方面起着非常关键的作用,尤其在IA-64体系结构中,其性能的发挥在很大程序上依赖于相应的编译器.IF转换利用IA-64对谓词执行的支持,删除程序中的某些控制流,将控制依赖转换为数据依赖,以便于优化.文章介绍了IA-64体系结构,并详细分析了GCC中基于IA-64谓词执行的IF转换算法.  相似文献   

8.
在功能分析的基础上,提出了一种基于APM技术的面向业务性能监测分析的网络性能管理解决方案.通过监控系统各个层面的性能数据,将网络业务、服务作为重要的被管理对象,强调了从应用角度对网络性能的监测及分隔故障,不仅能够使用户可以直观了解关键业务应用的运行效率,监控关键业务的应用性能,快速进行故障定位,还能发现并解决性能瓶颈,提供性能优化方案并指导系统建设.  相似文献   

9.
二、数据库的优化在数据库所在的平台 (系统 )的优化完成之后 ,接下来要做的就是对数据库的优化。用户对数据库的性能抱怨最多的就是响应的速度太慢 ,需要浪费很多时间在等待程序响应上面 ,有时甚至会引起超时从而导致应用的失败。例如在做九七用户电话号码业务历史查询、1 1 2数据转储时。对于数据库的各项性能指标的监视首先可以通过一些系统附带的软件来完成 ,比如 Serv-er Manager和 OEM( Oracle Enterprise Manager)等等。不管你使用特定的监视工具还是直接查看数据库系统数据字典中的记录数据 ,都要首先明确数据库优化的问题集中…  相似文献   

10.
WinPcap性能的测试与优化   总被引:3,自引:0,他引:3  
通过测试研究了影响WinPcap性能的因素并对其性能进行优化.首先使用IXIA测试仪搭建实验环境,以此环境分别测试在不同主频及内存情况下WinPcap性能情况,通过测试发现其性能均受到影响,影响程度随着包长的变化而不同,尤其是小包时影响更明显.然后分析了如果将WinPcap应用于高速网络时所需要具备的条件.最后通过对WinPcap内存拷贝库数据进行优化,使WinPcap在处理短包时的性能有了明显的改善.  相似文献   

11.
As DSP (Digital Signal Processing) applications become more complex, there is also a growing need for new architectures supporting efficient high-level language compilers. We try to synthesize a new DSP processor architecture by adding several DSP processor specific features to a RISC core that has a compiler friendly structure, such as many general-purpose registers and orthogonal instructions. The synthesized digital signal processor supports single-cycle MAC (Multiply-and-ACcumulate), direct memory access, automatic address generation, and hardware looping capabilities in addition to ordinary RISC instructions. The compiler for the new architecture is quickly implemented by developing a code-converter that modifies the assembly codes that are generated by the RISC compiler. The performance effects of adding each of these as well as all the combined features are evaluated using seven DSP-kernel benchmarks, a QCELP vocoder, and an MPEG video decoder. The effects of CPU clock frequency change due to the addition of these features are also considered. Finally, we also compare the performances with several existing DSP processors, such as TMS320C3x, TMS320C54x, and TMS320C5x.  相似文献   

12.
毛嵩  杨昉  阳辉  王军 《电视技术》2007,31(11):95-96
针对原有DTMB调制器性能测试系统提出了硬件改进与算法优化的方案.通过对比两个方案实施前后的DTMB调制器测试精度,验证了该方案的有效性.  相似文献   

13.
面向VLIW结构的高性能代码生成技术   总被引:1,自引:1,他引:0  
DSP处理器通过采用VLIW结构获得了高性能,同时也增加了编译器为其生成汇编代码的难度.代码生成器作为编译器的代码生成部件,是VLIW结构能够发挥性能的关键.由此提出并实现了一种基于可重定向编译框架的代码生成器.该代码生成器充分利用VLIW的体系结构特点,支持SIMD指令,支持谓词执行,能够生成高度指令级并行的汇编代码,显著提高应用程序的执行性能.  相似文献   

14.
In this paper, we study the performance impact of dynamic hardware reconfigurations for current reconfigurable technology. As a testbed, we target the Xilinx Virtex II Pro, the Molen experimental platform and the MPEG2 encoder as the application. Our experiments show that slowdowns of up to a factor 1000 are observed when the configuration latency is not hidden by the compiler. In order to avoid the performance decrease, we propose an interprocedural optimization that minimizes the number of executed hardware configuration instructions taking into account constraints such as the “FPGA-area placement conflicts” between the available hardware configurations. The presented algorithm allows the anticipation of hardware configuration instructions up to the application’s main procedure. The presented results show that our optimization produces a reduction of 3 to 5 order of magnitude of the number of executed hardware configuration instructions. Moreover, the optimization allows to exploit up to 97% of the maximal theoretical speedup achieved by the reconfigurable hardware execution.  相似文献   

15.
Embedded and portable systems running multimedia applications create a new challenge for hardware architects. A microprocessor for such applications needs to be easy to program like a general-purpose processor and have the performance and power efficiency of a digital signal processor. This paper presents the codevelopment of the instruction set, the hardware, and the compiler for the Vector IRAM media processor. A vector architecture is used to exploit the data parallelism of multimedia programs, which allows the use of highly modular hardware and enables implementations that combine high performance, low power consumption, and reduced design complexity. It also leads to a compiler model that is efficient both in terms of performance and executable code size. The memory system for the vector processor is implemented using embedded DRAM technology, which provides high bandwidth in an integrated, cost-effective manner. The hardware and the compiler for this architecture make complementary contributions to the efficiency of the overall system. This paper explores the interactions and tradeoffs between them, as well as the enhancements to a vector architecture necessary for multimedia processing. We also describe how the architecture, design, and compiler features come together in a prototype system-on-a-chip, able to execute 3.2 billion operations per second per watt  相似文献   

16.
We present Avalanche, a prototyping framework that addresses the issues of power estimation and optimization for mixed hardware and software embedded systems. Avalanche is based on a generic embedded system architecture consisting of embedded CPU, custom hardware, and a memory hierarchy. For system-level power estimation, given various system parameters like cache sizes, cache policies, and bus width, etc., Avalanche is able to rapidly evaluate/estimate power and performance and thus facilitate comprehensive design space explorations. For system-level power optimization, Avalanche offers different modes reflecting various design scenarios: if no hardware/software partitioning or only partial partitioning has been conducted, Avalanche guides the designer in finding power-aware hardware/software partitioning; when a system has already been partitioned, Avalanche can optimize system parameters such as cache and memory size; if system parameters and partitioning are given, Avalanche applies additional optimizations for power including source-to-source compiler transformations. Avalanche has been deployed during the design phase of real-world applications including an MPEG II encoder in a set-top box design. Extensive design space explorations in terms of power and performance could be conducted within several hours and various optimization techniques led to power reductions of up to 94% without performance losses and only a slight increases in total chip size (i.e., transistor count).  相似文献   

17.
High-performance, reliable, and robust products with a short development schedule are general design aims. FACE was developed to achieve these goals, including the organization of a design flow, a frequency-driven information analyzer, compiler techniques (code generator and instruction optimization), and a hierarchical object design library. This paper explores the design space of a retargetable compiler and a reconfigurable hardware, which combine both software and hardware reprogrammability. The environment, FACE, we have developed allows us to quickly move the functions between software and hardware in a state of flux. Finally, it generates the application specific integrated processor (ASIP) and a compiler for the new ASIP architecture. The case study is considered which demonstrates the efficiency in ASIP design of FACE.  相似文献   

18.
The single instruction multiple data (SIMD) architecture is very efficient for executing arithmetic intensive programs, but frequently suffers from data-alignment problems. The data-alignment problem not only induces extra time overhead but also hinders automatic vectorization of the SIMD compiler. In this paper, we compare three on-chip memory systems, which are single-bank, multi-bank, and multi-port, for the SIMD architecture to resolve the data-alignment problems. The single-bank memory is the simplest, but supports only the aligned accesses. The multi-bank memory requires a little higher complexity, but enables the unaligned accesses and the stride accesses with a bank-conflict limitation. The multi-port memory is capable of both the unaligned and stride accesses without any restriction, but needs quite much expensive hardware. We also developed a vectorizing compiler that can conduct dynamic memory allocation and SIMD code generation. The performances of the three memory systems with our SIMD compiler are evaluated using several digital signal processing kernels and the MPEG2 encoder. The experimental results show that the multi-bank memory can carry out MPEG2 encoding 5.8 times faster, whereas the single-bank memory only achieves 2.9 times speed-up when employed in a multimedia system with a 2-issue host processor and an 8-way SIMD coprocessor. The multi-port memory obviously shows the best performance, which is however an impractical improvement over the multi-bank memory when the hardware cost is considered.  相似文献   

19.
王向前  洪一  王昊  郑启龙 《电子学报》2015,43(8):1656-1661
魂芯DSP是一款字寻址的、分簇结构的、支持SIMD的VLIW处理器.介绍了基于开源编译器基础设施open64开发魂芯编译器的关键技术,包括地址寄存器的优化处理、综合多种启发因子的指令分簇、分簇架构下的寄存器分配和指令调度.介绍了魂芯DSP编译器的体系结构优化关键技术,包括基于依赖分析的向量化、高效指令的使用和零开销循环的识别.并总结开发经验,给出了基于开源编译基础设施开发编译器的若干注意点.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号