期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Data-Parallel Formulation for Divide and Conquer Algorithms

Amor M.; Arguello F.; Lopez J.; Plata O.; Zapata E. L. 《Computer Journal》2001,44(4):303-320

相似文献

2.

Pierre-Etienne Moreau Christophe Ringeissen Marian Vittek 《Electronic Notes in Theoretical Computer Science》2001,44(2)

Implementation of a rule-based transformation engine consists of several tasks with various abstraction levels. We present a new tool called mtom for the efficient implementation of rule-based transformations. This engine should help to bridge the gap between rewriting implementations and practical applications. It aims at implementing well-identified parts of complex applications where the use of rewriting is natural or crucial. These parts are specified using rewrite rules and integrated with the rest of the application, which is kept in a classical imperative language such as C, C++ or Java. Our tool, which can be viewed as a Yacc-like pre-processor, does not depend on a given term representation, rather it accepts implementation of terms (or term like data-types) of yet existing applications and it permits to define and execute rewrite rules upon those types. From our experiences, this system is well-suited for industrial use as well as for implementations of rule-based languages. The paper introduces several features supported by mtom. 相似文献

3.

UC: A Set-Based Language for Data-Parallel Programming

Bagrodia R. Chandy M. Dhagat M. 《Journal of Parallel and Distributed Computing》1995,28(2)

UC is a data-parallel extension of C designed for scientific computations on synchronous and asynchronous parallel architectures. The primary constructs of the language include sets, reductions, and parallel and asynchronous composition. Its communication model is that of a globally addressable memory, with no syntactic distinction between local and remote data references, Unlike most existing data-parallel languages, UC programs may be synchronized at multiple levels of granularity, from a strict expression-level synchronization to a coarser statement or function-level synchronization. This paper describes the language and its implementation on the Connection Machine CM-2. Experimental measurements that compare the performance of the UC compiler with that of programs written in commercial parallel languages such as CM Fortran, C*, and *Lisp are also presented. 相似文献

4.

Reasoning About Data-Parallel Array Assignment

《Journal of Parallel and Distributed Computing》1995,27(1):79-85

Three representations of data-parallel array assignment-generalized array assignment, FORTRAN 90 array assignment and HPF array assignment-are compared by deriving their axiomatic inference rules. The goals are (i) to identify shortcomings of representations of data-parallel array assignment in existing programming languages and suggest improvements and (ii) to clarify the semantics of particular formulations of array assignment 相似文献

5.

A Comparison of Implementation Strategies for Nonuniform Data-Parallel Computations

Salvatore Orlando Raffaele Perego 《Journal of Parallel and Distributed Computing》1998,52(2):489

Data-parallel languages allow programmers to easily express parallel computations by means of high-level constructs. To reduce overheads, the compiler partitions the computations among the processors at compile-time, on the basis of the static data distribution suggested by the programmer. When execution costs are nonuniform and unpredictable, some processors may be assigned more work than others. Workload imbalance can be mitigated by cyclically distributing data and associated computations or by employing adaptive strategies which build a more balanced schedule at run-time, on the basis of the actual execution costs. This paper discusses static and hybrid (static+dynamic) scheduling strategies which can be used to balance the workloads derived from the execution of nonuniform parallel loops. A multidimensional flame simulation kernel has been used to evaluate different implementation strategies on a Cray T3E. We fed the benchmark code with synthetic input data sets built on the basis of a load imbalance model and we report and compare the results obtained. 相似文献

6.

A Data-Parallel Approach for Real-Time MPEG-2 Video Encoding

Akramullah S. M. Ahmad I. Liou M. L. 《Journal of Parallel and Distributed Computing》1995,30(2)

In this paper, we present a fine-grained parallel implementation of the MPEG-2 video encoder an the Intel Paragon XP/S parallel computer. We use a data-parallel approach and exploit parallelism within each frame, unlike some of the previous approaches that employ multiple processing of several disjoint video sequences. This makes our encoder suitable for real-time applications where the complete video sequence may not be present on the disk and may become available on a frame-by-frame basis with time. The Express parallel programming environment is employed as the underlying message-passing system making our encoder portable across a wide range of parallel and distributed architectures. The encoder also provides control over various parameters such as the number of processors in each dimension, the size of the motion search window, buffer management, and bitrate. Moreover, it has the flexibility to allow the inclusion of fast and new algorithms for different stages of the codec into the program, replacing current algorithms. Comparisons of execution times, speedups, and frame encoding rates using different numbers of processors are provided. An analysis of frame data distribution among multiple processors is also presented. In addition, our study reveals the degrees of parallelism and bottlenecks in the various computational modules of the MPEG-2 algorithm. We have used two motion estimation techniques and five different video sequences for our experiments. Using maximum parallelism by dividing one block per processor, an encoding rate higher than 30 frames/s has been achieved. 相似文献

7.

编译基础设施中多目标编译技术探讨 总被引：3，自引：0，他引：3

戴桂兰张素琴田金兰蒋维杜《计算机研究与发展》2003,40(2):312-317

从编译基础设施的基本概念出发，着重讨论了编译器后端构造所涉及的关键技术；比较全面地总结并评述了具有代表性的公共编译设施及春采用的中间表示技术、后端构造技术和相关工具；并探讨了编译器后端构造研究中存在的一些问题及相应的解决方案。相似文献

8.

A Library-Based Approach to Task Parallelism in a Data-Parallel Language

Ian Foster David R. Kohr Jr. Rakesh Krishnaiyer Alok Choudhary 《Journal of Parallel and Distributed Computing》1997,45(2):284

Pure data-parallel languages such as High Performance Fortran version 1 (HPF) do not allow efficient expression of mixed task/data-parallel computations or the coupling of separately compiled data-parallel modules. In this paper, we show how these common parallel program structures can be represented, with only minor extensions to the HPF model, by using a coordination library based on the Message Passing Interface (MPI). This library allows data-parallel tasks to exchange distributed data structures using calls to simple communication functions. We present microbenchmark results that characterize the performance of this library and that quantify the impact of optimizations that allow reuse of communication schedules in common situations. In addition, results from two-dimensional FFT, convolution, and multiblock programs demonstrate that the HPF/MPI library can provide performance superior to that of pure HPF. We conclude that this synergistic combination of two parallel programming standards represents a useful approach to task parallelism in a data-parallel framework, increasing the range of problems addressable in HPF without requiring complex compiler technology. 相似文献

9.

一个用于数据并行语言计算划分的时序优化模型 总被引：2，自引：0，他引：2

余华山胡长军黄其军丁文魁许卓群《软件学报》2001,12(10):1434-1446

一个程序中数据并行语句的计算划分(CP)对该程序的运行性能有决定性的作用.尽管人们对这一问题已经进行了广泛的研究,但这些研究的重点都集中在如何提高被选择计算划分的空间局部性上.针对并行循环结构的计算划分问题,提出了一个时序优化模型.在该模型中,一个计算划分被表示成一个有向图,在把并行语句中的操作映射到各个处理器的同时,给出了被分配到不同处理器上的操作之间的相关性.对于一条数据并行语句,时序优化模型对它的每个计算划分选择方案分别采用多种有效的优化策略进行优化;并综合考虑各个计算划分选择方案的负载平衡性、处理器间的操作依赖性、数据访问的空间局部性和时间局部性四个方面的因素,估算每个方案的执行效率;最后从这些方案中选择一个执行效率最优的方案作为该语句的计算划分.作者已在HPF编译器p-HPF采用时序优化模型实现了对FORALL结构的支持.实验结果表明,该模型具有非常好的通用性,对不同领域多种数据并行问题均取得了理想的加速比.同时,只需略微改动,该模型也可用于其他类型数据并行语句的计算划分. 相似文献

10.

Compiler Verification and Compiler Architecture

Gerhard Goos 《Electronic Notes in Theoretical Computer Science》2002,65(2)

We study issues in verifying compilers for modern imperative and object-oriented languages. We take the view that it is not the compiler but the code generated by it which must be correct. It is this subtle difference that allows for reusing standard compiler architecture, construction methods and tools also in a verifying compiler.Program checking is the main technique for avoiding the cumbersome task of verifying most parts of a compiler and the tools by which they are generated. Program checking remaps the result of a compiler phase to its origin, the input of this phase, in a provably correct manner. We then only have to compare the actual input to its regenerated form, a basically syntactic process. The correctness proof of the generation of the result is replaced by the correctness proof of the remapping process. The latter turns out to be far easier than proving the generating process correct.The only part of a compiler where program checking does not seem to work is the transformation step which replaces source language constructs and their semantics, given, e.g., by an attributed syntax tree, by an intermediate representation, e.g., in SSA-form, which is expressing the same program but in terms of the target machine. This transformation phase must be directly proven using Hoare logic and/or theorem-provers. However, we can show that given the features of today's programming languages and hardware architectures this transformation is to a large extent universal: it can be reused for any pair of source and target language. To achieve this goal we investigate annotating the syntax tree as well as the intermediate representation with constraints for exhibiting specific properties of the source language. Such annotations are necessary during code optimization anyway. 相似文献

11.

A Formally Verified Compiler Back-end

Xavier Leroy 《Journal of Automated Reasoning》2009,43(4):363-446

This article describes the development and formal verification (proof of semantic preservation) of a compiler back-end from Cminor (a simple imperative intermediate language) to PowerPC assembly code, using the Coq proof assistant both for programming the compiler and for proving its soundness. Such a verified compiler is useful in the context of formal methods applied to the certification of critical software: the verification of the compiler guarantees that the safety properties proved on the source code hold for the executable compiled code as well. 相似文献

12.

Synthesizing MPI Implementations from Functional Data-Parallel Programs

Tristan?Aubrey-Jones Email author Bernd?Fischer 《International journal of parallel programming》2016,44(3):552-573

Distributed memory architectures such as Linux clusters have become increasingly common but remain difficult to program. We target this problem and present a novel technique to automatically generate data distribution plans, and subsequently MPI implementations in C++, from programs written in a functional core language. The main novelty of our approach is that we support distributed arrays, maps, and lists in the same framework, rather than just arrays. We formalize distributed data layouts as types, which are then used both to search (via type inference) for optimal data distribution plans and to generate the MPI implementations. We introduce the core language and explain our formalization of distributed data layouts. We describe how we search for data distribution plans using an adaptation of the Damas–Milner type inference algorithm, and how we generate MPI implementations in C++ from such plans. 相似文献

13.

Implementation of a Portable Nested Data-Parallel Language

《Journal of Parallel and Distributed Computing》1994,21(1):4-14

相似文献

14.

A Vectorizing Compiler for Multimedia Extensions 总被引：6，自引：0，他引：6

N. Sreraman R. Govindarajan 《International journal of parallel programming》2000,28(4):363-400

In this paper, we present an implementation of a vectorizing C compiler for Intel's MMX (Multimedia Extension). This compiler would identify data parallel sections of the code using scalar and array dependence analysis. To enhance the scope for application of the subword semantics, our compiler performs several code transformations. These include strip mining, scalar expansion, grouping and reduction, and distribution. Thereafter inline assembly instructions corresponding to the data parallel sections are generated. We have used the Stanford University Intermediate Format (SUIF), a public domain compiler tool, for our implementation. We evaluated the performance of the code generated by our compiler for a number of benchmarks. Initial performance results reveal that our compiler generated code produces a reasonable performance improvement (speedup of 2 to 6.5) over the the code generated without the vectorizing transformations/inline assembly. In certain cases, the performance of the compiler generated code is within 85% of the hand-tuned code for MMX architecture. 相似文献

15.

IDL编译器的Java实现

姜淑娟殷兆麟闫大顺《计算机工程与应用》2000,36(5):110-111

文章从阐述接口定义语言（ＩＤＬ）编译器在ＣＯＲＢＡ开发模型中的作用开始,提出了ＩＤＬ编译器的开发模型,然后讨论了编译器的实现过程中的技术问题。相似文献

16.

BenQ FP71E /FP785

火星《个人电脑》2005,11(2)

过去的一段时间里,液晶显示器逐步开始取代传统CRT显示器的地位,但由于液晶本身的特性限制,与CRT显示器相比,色彩饱和度较低、色彩范围较窄,受到了不少用户的抱怨。为了改变这种状况,各液晶显示器生产厂商一直在液晶面板方面进行改进,同时也在显示器的电路中加入新的数字信号处相似文献

17.

Visualizing the Performance of SPMD and Data-Parallel Programs

《Journal of Parallel and Distributed Computing》1993,18(2):129-146

Observing the activities of a complex parallel computer system is no small feat, and relating these observations to program behavior is even harder. In this paper, we present a general measurement approach that is applicable to a large class of scalable programs and machines, specifically SPMD and data-parallel programs executing on distributed memory computer systems. The combined instrumentation and visualization paradigm, called VISTA, is based on our experiences in programming and monitoring applications running on an nCUBE 2 computer and a MasPar MP-1 computer. The key is that performance data are treated similarly to any distributed data in the context of the programming models and presented via a hierarchy of multiple views. Because of the data-parallel mapping of program onto machine, we can view the performance as it relates to each processor, processor cluster, or the processor ensemble and as it relates to the data structures of the program. We illustrate the utility of VISTA by example. 相似文献

18.

Compiler Infrastructure

Rudi Eigenmann Sam Midkiff 《International journal of parallel programming》2013,41(6):751-752

相似文献

19.

BenQ FP231W／BenQ FP2091

承健《个人电脑》2004,10(7):40-40,42

对普通的桌面用户来说，17英寸的液晶显示器就可以提供足够的分辨率和显示面积，但是很多专业应用领域会需要更大显示面积和更高的分辨率。要不是价格太高，我们相信电脑发烧友也不会拒绝购买更大的显示器。BenQ最近推出了多款从19英寸到23英寸的高端LCD显示器，可以满足这些高端用户的需要，《个人电脑》实验室测试了其中的FP231W和FP2091。相似文献

20.

A Compiler Intermediate Representation for Reconfigurable Fabrics

Zhi Guo Betul Buyukkurt John Cortes Abhishek Mitra Walild Najjar 《International journal of parallel programming》2008,36(5):493-520

Configurable computing relies on the expression of a computation as a circuit. Its main purpose is the hardware based acceleration of programs. Configurable computing has received renewed interest with the recent rapid increase in both size and speed of FPGAs. One of the major obstacles in the way of wider adoption of (re)configurable computing is the lack of high-level tools that support the efficient mapping of programs expressed in high-level languages (HLL) to reconfigurable fabrics. The major difficulty in such a mapping is the translation from a temporal execution model to a spatial execution model. An intermediate representation (IR) is the central structure around which tools such as compilers and synthesis tools are built. In this paper we propose an IR specifically designed for reconfigurable fabrics: CIRRF (Compiler Intermediate Representation for Reconfigurable Fabrics). We describe the design of CIRRF and its initial implementation as part of the ROCCC compiler for translating C code to VHDL. CIRRF is designed to support the creation of a datapath and the scheduling of operations on it. It provides support for buffers, look-up tables, predication and pipelining in the datapath. One of the important features of CIRRF, and ROCCC, is its support for the import of pre-designed IP cores into the original C source code allowing the user to leverage the huge wealth of existing IP cores while programming the configurable platform using a HLL. Using experiments and examples we show that CIRRF is a solid foundation to generate high-performance hardware. 相似文献