共查询到20条相似文献,搜索用时 0 毫秒
1.
Jack L. Lo Susan J. Eggers Henry M. Levy Sujay S. Parekh Dean M. Tullsen 《International journal of parallel programming》1999,27(6):477-503
Simultaneous Multithreading (SMT) is a processor architectural technique that promises to significantly improve the utilization and performance of modern wide-issue superscalar processors. An SM T processor is capable of issuing multiple instructions from multiple threads to a processor's functional units each cycle. Unlike shared-memory multiprocessors, SMT provides and benefits from fine-grained sharing of processor and memory system resources; unlike current uniprocessors, SMT exposes and benefits from inter-thread instruction-level parallelism when hiding long-latency operations. Compiler optimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine, particularly for parallel processors. For example, when targeting shared-memory multiprocessors, parallel programs are compiled to minimize sharing, in order to decrease high-cost inter-processor communication. Therefore, optimizations that are appropriate for these conventional machines may be inappropriate for SMT, which can benefit from finegrained resource sharing within the processor. This paper reexamines several compiler optimizations in the context of simultaneous multithreading. We revisit three optimizations in this light: loop-iteration scheduling, software speculative execution, and loop tiling. Our results show that all three optimizations should be applied differently in the context of SMT architectures: threads should be parallelized with a cyclic, rather than a blocked algorithm; non-loop programs should not be software speculated, and compilers no longer need to be concerned about precisely sizing tiles to match cache sizes. By following these new guidelines, compilers can generate code that improves the performance of programs executing on SMT machines. 相似文献
2.
Eric Hao Po-Yung Chang Marius Evers Yale N. Patt 《International journal of parallel programming》1998,26(4):449-478
To exploit larger amounts of instruction level parallelism, processors are being built with wider issue widths and larger numbers of functional units. Instruction fetch rate must also be increased in order to effectively exploit the performance potential of such processors. Block-structured ISAs provide an effective means of increasing the instruction fetch rate. We define an optimization, called block enlargement, that can be applied to a block-structured ISA to increase the instruction fetch rate of a processor that implements that ISA. We have constructed a compiler that generates block-structured ISA code, and a simulator that models the execution of that code on a block-structured ISA processor. We show that for the SPECint95 benchmarks, the block-structured ISA improves the performance of an aggressive wide issue, dynamically scheduled processor by 15% while using simpler microarchitectural mechanisms to support wide issue and dynamic scheduling. 相似文献
3.
可重定义目标编译器的原理及设计 总被引:1,自引:0,他引:1
快速高效地开发编译器对体系结构研究有重要意义。可重定义目标编译器将编译器中与体系结构相关的部分进行了较好的隔离。只需要修改与目标机相关的部分,就可以快速生成新的编译器。该文就可重定义目标编译器的概念、原理、设计和实现方法进行了论述。 相似文献
4.
编译基础设施中多目标编译技术探讨 总被引:3,自引:0,他引:3
从编译基础设施的基本概念出发,着重讨论了编译器后端构造所涉及的关键技术;比较全面地总结并评述了具有代表性的公共编译设施及春采用的中间表示技术、后端构造技术和相关工具;并探讨了编译器后端构造研究中存在的一些问题及相应的解决方案。 相似文献
5.
目前多媒体应用已经成为各种运算平台的主要应用类型。随着多媒体应用的多样性和复杂性,共享主存多SIMD结构逐渐成为主从多核结构中多媒体加速部件的首要选择。总结了目前共享主存多SIMD结构的特征,同时深入分析了共享主存多SIMD编译优化的主要问题以及相关编译技术。 相似文献
6.
7.
Shao Shuyi Jones Alex K. Melhem Rami 《Parallel and Distributed Systems, IEEE Transactions on》2009,20(3):331-345
In this paper we explore compiler techniques for achieving efficient communications on circuit switching interconnection networks. We propose a compilation framework for identifying communication patterns and compiling these patterns as network configuration directives. This has the potential of providing significant performance benefits when connections can be established in the network prior to the actual communications. The framework includes a flexible and powerful communication pattern representation scheme that captures the property of communication patterns and allows manipulation of these patterns. In this way, communication phases can be identified within the application. Additionally, we extend the classification of static and dynamic communications to include persistent communications. Persistent communications are a subclass of dynamic communications that remain unchanged for large segments of the application execution. An experimental compiler has been developed to implement the framework. This compiler is capable of detecting both static and persistent communications within an application. We show that for the NAS Parallel Benchmarks, 100% of the point-to-point communications can be classified as either static or persistent and 100% of the collectives are either static or persistent with the exception of IS. Simulation-based performance analysis demonstrates the benefit of using our compiler techniques for achieving efficient communications in multiprocessor systems. 相似文献
8.
安全协议的人工实现是一个低效且易错的过程.安全协议编译器Hlpsl2Cpp可以自动从用HLPSL语言描述的安全协议生成C 的协议实现代码.Hlpsl2Cpp节省了人工实现协议的大量重复劳动,避免了人工实现安全协议带来的各种讹误和实现相关漏洞. 相似文献
9.
Automatic scheduling for directed acyclic graphs (DAG) and its applications for coarse-grained irregular problems such as largen-body simulation have been studied in the literature. However, solving irregular problems with mixed granularities such as sparse matrix factorization is challenging since it requires efficient run-time support to execute a DAG schedule. In this paper, we investigate run-time optimization techniques for executing general asynchronous DAG schedules on distributed memory machines and discuss an approach for exploiting parallelism from commuting operations in the DAG model. Our solution tightly integrates the run-time scheme with a fast communication mechanism to eliminate unnecessary overhead in message buffering and copying. We present a consistency model incorporating the above optimizations, and take advantage of task dependence properties to ensure the correctness of execution. We demonstrate the applications of this scheme in sparse matrix factorizations and triangular equation solving for which actual speedups are difficult to obtain. We provide a detailed experimental study on Meiko CS-2 to show that the automatically scheduled code has achieved good performance for these difficult problems, and the run-time overhead is small compared to total execution times. 相似文献
10.
内存泄漏是程序设计中经常出现的问题,会降低系统性能,甚至耗尽内存空间导致系统崩溃。文章采用反射和开放编译技术,对开放编译器OpenC 进行了扩展与改进,设计并实现了一个CC 动态内存泄漏检测工具,以帮助开发和测试人员查找内存泄漏。 相似文献
11.
12.
C语言优化编译器是借助于微机上的Intel80386C语言编译器为研制平台,采用交叉编译的方法设计实现的,它是我国自行设计的第一个从底层开发实现的巨型机C语言编译器。本文首先给出了YH-2C语言优化编译器的设计原理,然后详细介绍了其主要系统组成和技术特点,最后指出了我们以后进一步要做的工作 相似文献
13.
14.
15.
16.
Mariko Sasakura Kazuki Joe Yoshitoshi Kunieda Keijiro Araki 《International journal of parallel programming》1999,27(2):111-129
For effective use of parallelizing compilers, an interactive environment which allows users to find more parallelism is needed. As the first step towards building such an environment, we have developed a program visualization system named NaraView. In this paper, we describe two visualization methods in NaraView. One is Program Structure View which illustrates the hierarchical loop structure of a given program and suggests which parts of the program can be parallelized. Another is the Data Dependence View which visualizes each data dependence on every variable or array element which is accessed in a specific loop. By using these views, users can easily understand which part of the program can be parallelized further. We also show several examples to demonstrate the efficiency of these methods. 相似文献
17.
Interactive 2-D systems have benefited greatly from the improvements in 1C technology. Today, the trend is to relieve the host computer from low level tasks through increasing the graphic system's computational power. The introduction of video RAMs has solved the problem of contention for memory cycles between the display generator and the video refresh controller. The improvements in graphic controllers have led from the first fixed instructions controllers to today's third generation of programmable graphic processors, able to support computer graphic interface standards. This article will present this evolution, and focus on a 2-D graphic processor designed at the Imagery, Instrumentation and Systems Laboratory, based on the separation of graphic generation and memory management functions. 相似文献
18.
Charles Farnum 《Software》1988,18(7):701-709
Predictability is a basic requirement for compilers of floating-point code—it must be possible to determine the exact floating-point operations that will be executed for a particular source-level construction. Experience shows that many compilers fail to provide predictability, either because of an inadequate understanding of its importance or from an attempt to produce locally better code. Predictability can be attained through careful attention to code generation and a knowledge of the common pitfalls. Most language standards do not completely define the precision of floating-point operations, and so a good compiler must also make a good choice in assigning precisions of subexpression computation. Choosing the widest precision that will be used in the expression usually gives the best trade-off between efficiency and accuracy. Finally, certain optimizations are particularly useful for floating-point and should be included in a compiler aimed at scientific computation. But predictability is more important than efficiency; obtaining incorrect answers fast helps no one. 相似文献
19.
《IEEE transactions on pattern analysis and machine intelligence》1977,(3):243-250
Programs can be analyzed to determine bounds on the ranges of values assumed by variables at various points in the program. This range information can then be used to eliminate redundant tests, verify correct operation, choose data representations, select code to be generated, and provide diagnostic information. Sophisticated analyses involving the proofs of complex assertions are sometimes required to derive accurate range information for the purpose of proving programs correct. The performance of such algorithms may be unacceptable for the routine analysis required during the compilation process. This paper presents a discussion of mechanical range analysis employing techniques practical for use in a compiler. This analysis can also serve as a useful adjunct to the more sophisticated techniques required for program proving. 相似文献
20.
Aart J. C. Bik David L. Kreitzer Xinmin Tian 《International journal of parallel programming》2008,36(6):571-591
The complexity of modern processors poses increasingly more difficult challenges to software optimization. Modern optimizing compilers have become essential tools for leveraging the power of recent processors by means of high-level optimizations to exploit multi-core platforms and single-instruction-multiple-data (SIMD) instructions, as well as advanced code generation to deal with microarchitectural performance aspects. Using the Intel® CoreTM 2 Duo processor and Intel Fortran/C++ compiler as a case study, this paper gives a detailed account of the sort of optimizations required to obtain high performance on modern processors. 相似文献