期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Tuning Compiler Optimizations for Simultaneous Multithreading

Jack L. Lo Susan J. Eggers Henry M. Levy Sujay S. Parekh Dean M. Tullsen 《International journal of parallel programming》1999,27(6):477-503

Simultaneous Multithreading (SMT) is a processor architectural technique that promises to significantly improve the utilization and performance of modern wide-issue superscalar processors. An SM T processor is capable of issuing multiple instructions from multiple threads to a processor's functional units each cycle. Unlike shared-memory multiprocessors, SMT provides and benefits from fine-grained sharing of processor and memory system resources; unlike current uniprocessors, SMT exposes and benefits from inter-thread instruction-level parallelism when hiding long-latency operations. Compiler optimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine, particularly for parallel processors. For example, when targeting shared-memory multiprocessors, parallel programs are compiled to minimize sharing, in order to decrease high-cost inter-processor communication. Therefore, optimizations that are appropriate for these conventional machines may be inappropriate for SMT, which can benefit from finegrained resource sharing within the processor. This paper reexamines several compiler optimizations in the context of simultaneous multithreading. We revisit three optimizations in this light: loop-iteration scheduling, software speculative execution, and loop tiling. Our results show that all three optimizations should be applied differently in the context of SMT architectures: threads should be parallelized with a cyclic, rather than a blocked algorithm; non-loop programs should not be software speculated, and compilers no longer need to be concerned about precisely sizing tiles to match cache sizes. By following these new guidelines, compilers can generate code that improves the performance of programs executing on SMT machines. 相似文献

2.

Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures

Eric Hao Po-Yung Chang Marius Evers Yale N. Patt 《International journal of parallel programming》1998,26(4):449-478

To exploit larger amounts of instruction level parallelism, processors are being built with wider issue widths and larger numbers of functional units. Instruction fetch rate must also be increased in order to effectively exploit the performance potential of such processors. Block-structured ISAs provide an effective means of increasing the instruction fetch rate. We define an optimization, called block enlargement, that can be applied to a block-structured ISA to increase the instruction fetch rate of a processor that implements that ISA. We have constructed a compiler that generates block-structured ISA code, and a simulator that models the execution of that code on a block-structured ISA processor. We show that for the SPECint95 benchmarks, the block-structured ISA improves the performance of an aggressive wide issue, dynamically scheduled processor by 15% while using simpler microarchitectural mechanisms to support wide issue and dynamic scheduling. 相似文献

3.

可重定义目标编译器的原理及设计 总被引：1，自引：0，他引：1

谢丹夏韩果凌程旭《计算机工程与应用》2001,37(7):61-63,72

快速高效地开发编译器对体系结构研究有重要意义。可重定义目标编译器将编译器中与体系结构相关的部分进行了较好的隔离。只需要修改与目标机相关的部分,就可以快速生成新的编译器。该文就可重定义目标编译器的概念、原理、设计和实现方法进行了论述。相似文献

4.

编译基础设施中多目标编译技术探讨 总被引：3，自引：0，他引：3

戴桂兰张素琴田金兰蒋维杜《计算机研究与发展》2003,40(2):312-317

从编译基础设施的基本概念出发，着重讨论了编译器后端构造所涉及的关键技术；比较全面地总结并评述了具有代表性的公共编译设施及春采用的中间表示技术、后端构造技术和相关工具；并探讨了编译器后端构造研究中存在的一些问题及相应的解决方案。相似文献

5.

共享主存多SIMD结构及编译技术研究

张为华臧斌宇《计算机科学与探索》2009,3(1):18-25

目前多媒体应用已经成为各种运算平台的主要应用类型。随着多媒体应用的多样性和复杂性,共享主存多SIMD结构逐渐成为主从多核结构中多媒体加速部件的首要选择。总结了目前共享主存多SIMD结构的特征,同时深入分析了共享主存多SIMD编译优化的主要问题以及相关编译技术。相似文献

6.

Practical Compiler Techniques on Efficient Multithreaded Code Generation for OpenMP Programs 总被引：1，自引：0，他引：1

Tian Xinmin; Girkar Milind; Bik Aart; Saito Hideki 《Computer Journal》2005,48(5):588-601

相似文献

7.

Compiler Techniques for Efficient Communications in Circuit Switched Networks for Multiprocessor Systems

Shao Shuyi Jones Alex K. Melhem Rami 《Parallel and Distributed Systems, IEEE Transactions on》2009,20(3):331-345

In this paper we explore compiler techniques for achieving efficient communications on circuit switching interconnection networks. We propose a compilation framework for identifying communication patterns and compiling these patterns as network configuration directives. This has the potential of providing significant performance benefits when connections can be established in the network prior to the actual communications. The framework includes a flexible and powerful communication pattern representation scheme that captures the property of communication patterns and allows manipulation of these patterns. In this way, communication phases can be identified within the application. Additionally, we extend the classification of static and dynamic communications to include persistent communications. Persistent communications are a subclass of dynamic communications that remain unchanged for large segments of the application execution. An experimental compiler has been developed to implement the framework. This compiler is capable of detecting both static and persistent communications within an application. We show that for the NAS Parallel Benchmarks, 100% of the point-to-point communications can be classified as either static or persistent and 100% of the collectives are either static or persistent with the exception of IS. Simulation-based performance analysis demonstrates the benefit of using our compiler techniques for achieving efficient communications in multiprocessor systems. 相似文献

8.

Hlpsl2Cpp——一个安全协议编译器

周天凌黄连生《计算机应用研究》2007,24(6):123-126

安全协议的人工实现是一个低效且易错的过程.安全协议编译器Hlpsl2Cpp可以自动从用HLPSL语言描述的安全协议生成C 的协议实现代码.Hlpsl2Cpp节省了人工实现协议的大量重复劳动,避免了人工实现安全协议带来的各种讹误和实现相关漏洞. 相似文献

9.

Run-Time Techniques for Exploiting Irregular Task Parallelism on Distributed Memory Architectures

Cong Fu Tao Yang 《Journal of Parallel and Distributed Computing》1997,42(2):486

Automatic scheduling for directed acyclic graphs (DAG) and its applications for coarse-grained irregular problems such as largen-body simulation have been studied in the literature. However, solving irregular problems with mixed granularities such as sparse matrix factorization is challenging since it requires efficient run-time support to execute a DAG schedule. In this paper, we investigate run-time optimization techniques for executing general asynchronous DAG schedules on distributed memory machines and discuss an approach for exploiting parallelism from commuting operations in the DAG model. Our solution tightly integrates the run-time scheme with a fast communication mechanism to eliminate unnecessary overhead in message buffering and copying. We present a consistency model incorporating the above optimizations, and take advantage of task dependence properties to ensure the correctness of execution. We demonstrate the applications of this scheme in sparse matrix factorizations and triangular equation solving for which actual speedups are difficult to obtain. We provide a detailed experimental study on Meiko CS-2 to show that the automatically scheduled code has achieved good performance for these difficult problems, and the run-time overhead is small compared to total execution times. 相似文献

10.

基于开放编译器的内存泄漏检测机制

孙青岩陈平《计算机工程》2004,30(20):42-44

内存泄漏是程序设计中经常出现的问题,会降低系统性能,甚至耗尽内存空间导致系统崩溃。文章采用反射和开放编译技术,对开放编译器OpenC 进行了扩展与改进,设计并实现了一个CC 动态内存泄漏检测工具,以帮助开发和测试人员查找内存泄漏。相似文献

11.

Special Issue on Special-purpose Architectures for Real-Time Imaging,Part 2

《Real》1997,3(5):305

相似文献

12.

YH-2C语言优化编译器的设计

下载免费PDF全文

李松树姚益平《计算机工程与科学》1997,19(4):68-72

Ｃ语言优化编译器是借助于微机上的Ｉｎｔｅｌ８０３８６Ｃ语言编译器为研制平台，采用交叉编译的方法设计实现的，它是我国自行设计的第一个从底层开发实现的巨型机Ｃ语言编译器。本文首先给出了ＹＨ－２Ｃ语言优化编译器的设计原理，然后详细介绍了其主要系统组成和技术特点，最后指出了我们以后进一步要做的工作相似文献

13.

构造并行化系统交互环境的若干关键技术 总被引：5，自引：0，他引：5

杨博王鼎兴郑纬民《软件学报》2001,12(5):698-705

交互式并行化系统通过提供友好的交互功能并引入用户知识来协助程序的并行化,是解决自动并行化能力不足的一条有效途径.描述了一个并行化系统交互环境TIPSIE(interactive en vironment of Tsinghua interactive parallelizing system),并就构造该环境的性能预测、增量编译和数据相关查询等关键技术进行了讨论.实验结果表明,这些技术能够有效地提高系统的并行化能力和效率. 相似文献

14.

集群政务协同业务平台架构及关键技术研究

张学旺李舟军沈伟《计算机科学》2010,37(4):158

集群政务协同平台包括六大系统。论述了该平台的应用架构和技术架构,阐述了其主要关键技术:多个政务应用的规模集成和协同应用、业务模型驱动的SOA架构通用开发平台、统一资源管理、Web服务安全增强。运行实践表明,平台能够最大限度地整合利用省(市)级政务平台的软硬件资源和政务业务资源,使农村区县基于省(市)级平台构建各自的政务平台,从而实现全省(市)政务平台的城乡统筹建设和维护,提高政务管理和协同办公的效率。相似文献

15.

基于契约式设计的Java编译器实现

张嘉铭张思博赵建军《微型电脑应用》2007,23(3):14-16

本文通过实现一个基于VeriJava语法与语义检查、验证的编译器,试图在编译阶段通过对方法,类等先决条件与后置条件的验证,在逻辑上保证方法的正确性,帮助开发人员在开发的过程中及时找到设计错误或协调沟通问题,促进交流与理解,使开发过程更为完善。相似文献

16.

NaraView: An Interactive 3D Visualization System for Parallelization of Programs

Mariko Sasakura Kazuki Joe Yoshitoshi Kunieda Keijiro Araki 《International journal of parallel programming》1999,27(2):111-129

For effective use of parallelizing compilers, an interactive environment which allows users to find more parallelism is needed. As the first step towards building such an environment, we have developed a program visualization system named NaraView. In this paper, we describe two visualization methods in NaraView. One is Program Structure View which illustrates the hierarchical loop structure of a given program and suggests which parts of the program can be parallelized. Another is the Data Dependence View which visualizes each data dependence on every variable or array element which is accessed in a specific loop. By using these views, users can easily understand which part of the program can be parallelized further. We also show several examples to demonstrate the efficiency of these methods. 相似文献

17.

Architectures of Graphic Processors for Interactive 2D Graphics

Guy Fontenier Pascal Gros 《Computer Graphics Forum》1988,7(2):79-89

Interactive 2-D systems have benefited greatly from the improvements in 1C technology. Today, the trend is to relieve the host computer from low level tasks through increasing the graphic system's computational power. The introduction of video RAMs has solved the problem of contention for memory cycles between the display generator and the video refresh controller. The improvements in graphic controllers have led from the first fixed instructions controllers to today's third generation of programmable graphic processors, able to support computer graphic interface standards. This article will present this evolution, and focus on a 2-D graphic processor designed at the Imagery, Instrumentation and Systems Laboratory, based on the separation of graphic generation and memory management functions. 相似文献

18.

Compiler support for floating-point computation

Charles Farnum 《Software》1988,18(7):701-709

Predictability is a basic requirement for compilers of floating-point code—it must be possible to determine the exact floating-point operations that will be executed for a particular source-level construction. Experience shows that many compilers fail to provide predictability, either because of an inadequate understanding of its importance or from an attempt to produce locally better code. Predictability can be attained through careful attention to code generation and a knowledge of the common pitfalls. Most language standards do not completely define the precision of floating-point operations, and so a good compiler must also make a good choice in assigning precisions of subexpression computation. Choosing the widest precision that will be used in the expression usually gives the best trade-off between efficiency and accuracy. Finally, certain optimizations are particularly useful for floating-point and should be included in a compiler aimed at scientific computation. But predictability is more important than efficiency; obtaining incorrect answers fast helps no one. 相似文献

19.

Compiler Analysis of the Value Ranges for Variables

《IEEE transactions on pattern analysis and machine intelligence》1977,(3):243-250

Programs can be analyzed to determine bounds on the ranges of values assumed by variables at various points in the program. This range information can then be used to eliminate redundant tests, verify correct operation, choose data representations, select code to be generated, and provide diagnostic information. Sophisticated analyses involving the proofs of complex assertions are sometimes required to derive accurate range information for the purpose of proving programs correct. The performance of such algorithms may be unacceptable for the routine analysis required during the compilation process. This paper presents a discussion of mechanical range analysis employing techniques practical for use in a compiler. This analysis can also serve as a useful adjunct to the more sophisticated techniques required for program proving. 相似文献

20.

A Case Study on Compiler Optimizations for the Intel® CoreTM 2 Duo Processor

Aart J. C. Bik David L. Kreitzer Xinmin Tian 《International journal of parallel programming》2008,36(6):571-591

The complexity of modern processors poses increasingly more difficult challenges to software optimization. Modern optimizing compilers have become essential tools for leveraging the power of recent processors by means of high-level optimizations to exploit multi-core platforms and single-instruction-multiple-data (SIMD) instructions, as well as advanced code generation to deal with microarchitectural performance aspects. Using the Intel^® Core^TM 2 Duo processor and Intel Fortran/C++ compiler as a case study, this paper gives a detailed account of the sort of optimizations required to obtain high performance on modern processors. 相似文献