期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

GCC后端中四路双精度短向量寄存器的实现 总被引：1，自引：1，他引：0

李春江杜云飞倪晓强王永文杨灿群《计算机科学》2012,39(9):292-295

设计和实现一个新的产品化的编译器通常需要几年时间。基于已有的编译器进行修改和扩展,是研发面向新体系结构的编译器的主要途径。GNU编译器集合(GCC)支持多种高级语言和多种目标处理器平台、文档及源代码开放等。基于GCC的Sparc后端,实现了支持四路双精度SIMD指令的四路双精度短向量寄存器的描述。在此过程中,定义了新的目标机,扩充了一类向量模式,定义了一类新的寄存器约束,实现了四路双精度寄存器的描述,定义了四路双精度SIMD指令的机器描述。对于面向此类SIMD指令的内嵌函数,GCC编译器能够正确使用该类向量寄存器来生成对应的SIMD指令。相似文献

2.

基于GCC实现飞腾处理器向量处理单元的编译器后端

李春江杜云飞倪晓强王永文杨灿群《计算机科学》2013,40(12):19-22

编译器后端是针对特定目标机器的编译器实现,不同的指令集体系结构需要实现不同的编译器后端。面向飞腾处理器中向量处理单元(FT-VPU)的体系结构和指令集,基于GCC编译器实现了编译器后端,使GCC能够正确编译面向FT-VPU的SIMD指令的内嵌函数。从四路双精度SIMD指令的机器描述出发,总结了在GCC后端所做的实现工作。其对基于GCC编译器实现面向特定目标机器的编译器后端有较大的参考价值。相似文献

3.

基于GCC的高性能DSP Matrix向量指令集扩展

下载免费PDF全文

辛乃军陈旭灿孙海燕阳柳罗杰淡孝强王霁《计算机工程与科学》2012,34(1):58-63

自动向量化技术是编译器提高程序并行性的优化方法。随着支持SIMD结构处理器的计算平台的广泛应用,自动向量化技术也成为编译器技术研究的热点。GCC编译器是一种开源、跨平台的编译器。本文基于GCC内部自动向量化算法,结合Matrix芯片的体系结构和指令集特点,完成了Matrix向量指令集在GCC后端扩展,实现了基本的自动向量化支持。测试结果表明,扩展后的编译器能够支持Matrix向量指令集,进行基本的自动向量化,同时支持以内建函数方式开发基于Matrix的并行程序。相似文献

4.

典型编译器自动向量化效果评估与分析

李春江黄娟娟徐颖杜云飞陈娟《计算机科学》2013,40(4):41-46

SIMD(Single-Instruction-Multiple-Data)体系结构在现代处理器体系结构中扮演重要的角色。多种国产高性能通用处理器也大都实现了SIMD结构。SIMD体系结构提供了短向量数据并行处理能力,编译器自动向量化是应用程序获得性能提升的主要手段之一。使用成熟的支持SIMD的商用处理器平台评估典型编译器自动向量化的效果,对于处理器体系结构的设计以及编译器的分析和设计非常有益。采用SPECCPU2006和SPECOMPM2001基准测试程序,评估了典型编译器(包括Intel编译器、PGI编译器和GCC编译器)的自动向量化的效果。并且以产品级的开源编译器GCC为目标,用手工编写的程序片段(主要是多种类型的循环结构)评估了当前GCC编译器自动向量化的效果,并深入分析了GCC编译器中现有的自动向量化的能力和局限。此项工作为进一步研发高效的编译器自动向量化提供了有价值的参考。相似文献

5.

Open Source

汤韬《程序员》2004,(11):13-13

提起 Intel,大家首先想到是它生产的 CPU,不过它可不止生产硬件,它还涉足编译器、性能分析工具、高性能库等软件领域。当然这些工具是针对 Intel 的 CPU 进行开发和优化的。在 x86的架构中,Intel 是目前已知的编译代码质量最好的编译器,相比较以支持尽可能多的平台而著名的 GCC,它能够提升20-30％的性能。不久前,Intel 发布新一版的 C/C++编译器8.1(简称 ICC)。一如既往,它支持 windows 和 linux 两个平台。虽然我们不能指望 ICC 会象 GCC 那样开放源代码, 相似文献

6.

基于LLVM架构的NiosⅡ后端快速移植

任胜兵卢念张万利潘震宇《计算机应用与软件》2011,28(12)

编译器后端移植是目前嵌入式系统研究的重要领域,如何快速实现编译器后端移植仍然是嵌入式系统研究的热点。采用新的编译器架构LLVM,移植NiosⅡ处理器来分析LLVM快速后端移植架构。使用LLVM后端移植架构的TableGen描述NiosⅡ体系结构例如指令、寄存器等,使用完备LLVM C++函数库实现复杂或特殊的操作。TableGen与C++函数库互相配合,最终实现LLVM架构对NiosⅡ后端的支持。实验结果表明与GCC编译器后端移植方法相比,基于LLVM架构的编译器后端移植方法的工作量减少了64.2%～83.9%,大大节省后端移植时间。相似文献

7.

GCC编译器到Trimedia的移植

王超卢文成《计算机工程》2003,29(Z1):132-133

重点分析介绍了GCC编译器源代码的结构和程序的主要函数功能.并据此提出了移植GCC编译器到多媒体处理器Trimedia的一个方案及介绍了其实现的重点. 相似文献

8.

面向VLIW DSP结构的编译器的设计与实现

王敏王红梅张铁军单睿王东辉《微计算机应用》2009,30(7)

VLIW编译器实现指令并行性挖掘、相关性检查、指令调度等职能,对VLIW处理器的性能影响较大.本文基于一款VLIW DSP芯片,利用可重定位编译器IMPACT的前端和代码生成器模板,设计和实现了高性能的VLIW编译器.利用伪数据类型和Intrinsic函数结合,在编译器中构建了对SIMD功能的支持.实验结果显示,对比基于GCC版本的编译器,该编译器生成的指令数平均下降42%,并行包数下降30%. 相似文献

9.

GCC编译器的窥孔优化及在DCT变换中的应用 总被引：1，自引：0，他引：1

雷峰成方滨李慧杰《单片机与嵌入式系统应用》2006,(6):74-76

GCC编译器是一种可重定向的编译程序，其开发的目的是提高GNU系统中程序开发的效率。GCC支持C、C＋＋和JAVA等7种源语言，以及MIPS和ARM等36种体系结构。它具有以下特点：清晰的前端语法树结构；高度概括的抽象机中间语言；简洁的机器描述；支持多源语言开发和多平台移植。相似文献

10.

GCC代码优化技术的研究

石博慧陈英《计算机技术与发展》2004,14(8)

GCC是基于Linux下的开放源码的优化编译系统,可以接收多种高级源语言,广泛支持多平台操作系统.其代码优化机制,不仅能兼顾时间、空间效率,生成高质量的目标代码,而且具有很强的可移植性与可扩充性,是编译器优化研究的目标.通过对GCC的整体结构、优化策略、优化方法与关键技术、中间语言等进行详细的剖析,抽取出完整的GCC优化体系结构.文中集中探讨了GCC的优化策略和实现技术,首先从GCC优化体系的总体规划入手,着重分析了GCC的优化组织策略,设计、引入中间代码RTL的技巧和内涵,进一步研究了GCC所涉及的控制流分析技术、数据流分析技术的实现机制. 相似文献

11.

用表驱动算法在GCC中优化实现指数函数

下载免费PDF全文

杨灿群王锋彭林杨学军《计算机工程与科学》2007,29(5):77-80

科学计算中的许多领域都需要快速而精确地计算超越函数，即exp、log、sin、tan等此类函数。本文采用表驱动算法，结合IA-64体系结构特点，在GCC中优化实现了指数函数（exp），提高了GCC编译器在IA-64系统上的浮点性能，为在IA-64和其它平台上高效实现所有超越函数打下了基础。相似文献

12.

Implementation of the SHA-2 Hash Family Standard Using FPGAs

N.?Sklavos Email author O.?Koufopavlou 《The Journal of supercomputing》2005,31(3):227-248

The continued growth of both wired and wireless communications has triggered the revolution for the generation of new cryptographic algorithms. SHA-2 hash family is a new standard in the widely used hash functions category. An architecture and the VLSI implementation of this standard are proposed in this work. The proposed architecture supports a multi-mode operation in the sense that it performs all the three hash functions (256, 384 and 512) of the SHA-2 standard. The proposed system is compared with the implementation of each hash function in a separate FPGA device. Comparing with previous designs, the introduced system can work in higher operation frequency and needs less silicon area resources. The achieved performance in the term of throughput of the proposed system/architecture is much higher (in a range from 277 to 417%) than the other hardware implementations. The introduced architecture also performs much better than the implementations of the existing standard SHA-1, and also offers a higher security level strength. The proposed system could be used for the implementation of integrity units, and in many other sensitive cryptographic applications, such as, digital signatures, message authentication codes and random number generators. 相似文献

13.

基于麦克风阵列的GCC时延估计算法分析

唐浩洋陈子为黄维《计算机系统应用》2019,28(12):140-145

准确的时延估计（Time Delay Estimation,TDE）是基于到达时间差（Time Difference of Arrival,TDOA）的声源定位技术的前提.在众多时延估计算法中,广义互相关（Generalized Cross Correlation,GCC）算法因其较低的运算复杂度和易于实现的特点得到了广泛的应用.针对不同的噪声情况,GCC时延估计算法利用不同的加权函数来抑制噪声干扰.本文在介绍麦克风阵列模型和GCC时延估计算法的基础上,针对GCC算法的弊端提出了一种改进算法,并在多种信噪比条件下,对部分加权函数的GCC时延估计算法进行了MATLAB仿真,通过比较其时延估计性能和声源定位精度,分析了这些加权函数各自的优劣性. 相似文献

14.

Alleviating convergence problems in Group Support Systems

Mark Pendergast Stephen C. Hayne 《Computer Supported Cooperative Work (CSCW)》1994,3(1):1-28

Not all Group Support Systems are identical, as is demonstrated by their software implementations. We discuss two existing implementations of group support tools and the process models underlying them. We demonstrate that fundamental to both processes is the merging or integration of individual data. Based on this and other empirical research, the Shared Context Model (SCM) of cooperative work is adopted and we show that it supports existing processes and others. We expect that groups will find merging their work easier with the SCM. This model is presented and embedded in the architecture and implementation of four group tools. Because these tools are destined to be used by dispersed groups, synchronously or asynchronously, an object-based communication and control mechanism is incorporated. Finally, as graphics and multi-tasking have been shown to be increasingly important, the tools are implemented in Microsoft Windows for personal computers attached to local area networks. 相似文献

15.

Refinement-based verification of implementations of Stateflow charts

Alvaro Miyazawa Ana Cavalcanti 《Formal Aspects of Computing》2014,26(2):367-405

Simulink’s Stateflow is a graphical notation widely adopted in industry. Since it is frequently used to model safety-critical systems, correctness of implementations of Stateflow charts is a major concern. In previous work, we have shown how we can generate formal models for refinement of Stateflow charts automatically. Here, we define a refinement strategy that supports the automated verification of implementations with respect to these models. We consider the verification of implementations that follow architectural patterns used in the Stateflow code generator. We present a detailed procedure for application of refinement laws. If the implementation is correct, the procedure succeeds. If a law application fails, the implementation is either incorrect or does not use the expected architectural pattern. The very low proof burden associated with the refinement verification makes a high level of automation possible. 相似文献

16.

On the hardware implementation of RIPEMD processor: Networking high speed hashing, up to 2 Gbps

N. Sklavos^{Author Vitae} O. Koufopavlou Author Vitae 《Computers & Electrical Engineering》2005,31(6):361-379

The continued growth of both wired and wireless communications has triggered the revolution for high speed security implementations. RIPEMD hash functions are widely used, in many applications of cryptography. A reconfigurable processor architecture and the VLSI implementation of these functions are proposed in this work. The introduced processor is reconfigurable in the sense that performs alternatively all RIPEMD hash functions. In order to indicate the advantages of the proposed design, each one of these hash functions has also been implemented in a separate hardware device (FPGA). The proposed processor FPGA implementation achieves high speed hashing up to 2 Gbps. Comparing with previous published hardware designs, the proposed processor has higher performance in the range from 22 to 30 times. It also performs much better than the assembly language implementations of the RIPEMD-128 and RIPEMD-160. The proposed processor could be used for the implementation of data integrity units, and in many other sensitive cryptographic applications, such as, digital signatures, message authentication codes and random number generators. 相似文献

17.

Automatic program debugging for intelligent tutoring systems

William R. Murray 《Computational Intelligence》1987,3(1):1-16

Program debugging is an important part of the domain expertise required for intelligent tutoring systems that teach programming languages. This article explores the process by which student programs can be automatically debugged in order to increase the instructional capabilities of these systems. The research presented provides a methodology and implementation for the diagnosis and correction of nontrivial recursive programs. In this approach, recursive programs are debugged by repairing induction proofs in the Boyer-Moore logic. The induction proofs constructed and debugged assert the computational équivalence of student programs to correct exemplar solutions. Exemplar solutions not only specify correct implementations but also provide correct code to replace buggy student code. Bugs in student code are repaired with heuristics that attempt to minimize the scope of repair. The automated debugging of student code is greatly complicated by the tremendous variability that arises in student solutions to nontrivial tasks. This variability can be coped with, and debugging performance improved, by explicit reasoning about computational semantics during the debugging process. This article supports these claims by discussing the design, implementation, and evaluation of Talus, an automatic debugger for LISP programs, and by examining related work in automated program debugging. Talus relies on its abilities to reason about computational semantics to perform algorithm recognition, infer code teleology, and to automatically detect and correct nonsyntactic errors in student programs written in a restricted, but nontrivial, subset of LISP. Solutions can vary significantly in algorithm, functional decomposition, role of variables, data flow, control flow, values returned by functions, LISP primitives used, and identifiers used. Solutions can consist of multiple functions, each containing multiple bugs. Empiricial evaluation demonstrates that Talus achieves high performance in debugging widely varying student solutions to challenging tasks. 相似文献

18.

基于数据对齐属性指导的GCC自动向量化优化

李春江黄娟娟徐颖董钰山《计算机工程与科学》2014,36(6):1011-1017

主流通用处理器都已经实现了多核并行以及处理器核内的SIMD并行。虽然GCC编译器实现了面向SIMD并行的自动向量化,但是编译器针对OpenMP并行程序的自动向量化效果仍很不理想。针对多线程并行的OpenMP程序,基于GCC的OpenMP编译实现,扩展了数据对齐属性指导语句,使编译器在自动向量化时能够进行更准确的数据对齐与否的判断,优化了GCC编译器的自动向量化。相似文献

19.

High speed filtering using reconfigurable hardware

Carlos Perez-Vidal Luis Gracia 《Journal of Parallel and Distributed Computing》2009

相似文献