首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We investigate the claim that functional languages offer low-cost parallelism in the context of symbolic programs on modest parallel architectures. In our investigation we present the first comparative study of the construction of large applications in a parallel functional language, in our case in Glasgow Parallel Haskell (GP H). The applications cover a range of application areas, use several parallel programming paradigms, and are measured on two very different parallel architectures. On the applications level the most significant result is that we are able to achieve modest wall-clock speedups (between factors of 2 and 10) over the optimised sequential versions for all but one of the programs. Speedups are obtained even for programs that were not written with the intention of being parallelised. These gains are achieved with a relatively small programmer-effort. One reason for the relative ease of parallelisation is the use of evaluation strategies, a new parallel programming technique that separates the algorithm from the co-ordination of parallel behaviour. On the language level we show that the combination of lazy and parallel evaluation is useful for achieving a high level of abstraction. In particular we can describe top-level parallelism, and also preserve module abstraction by describing parallelism over the data structures provided at the module interface (‘data-oriented parallelism’). Furthermore, we find that the determinism of the language is helpful, as is the largely implicit nature of parallelism in GP H. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

2.
Eden is a parallel extension of the functional language Haskell. On behalf of parallelism Eden overrides Haskell's pure lazy approach, combining a non-strict functional application with eager process creation and eager communication. We desire to investigate alternative semantics for Eden in order to analyze the consequences of some of the decisions adopted during the language design. In this paper we show how to implement in Maude the operational semantics of Eden in such a way that semantic rules can be modified easily. Moreover, other semantic features can be implemented by means of parameterized modules that allow to instantiate in different ways several parameters of the semantics but without modifying the semantic rules.  相似文献   

3.
An architecture is presented for the parallel execution of sequential Prolog. The architecture is based on a pipeline of unification processors and designed to work as a co-processor to a more conventional, UNIX based workstation. The unification processors execute highly optimized compiled Prolog code; however, the basic concept of the architecture could also increase the performance of interpreter based systems. It will be shown that even programs that do not exhibit any of the ‘classical’ forms of parallelism (i.e. AND-, OR-parallelism, etc.) can be effectively mapped onto the proposed architecture. The presented architecture should also prove very effective as a multi-user Prolog machine executing several independent Prolog programs in parallel. In contrast to other attempts to execute sequential Prolog in parallel we do not restrict the use of any of the standard Prolog language features such as dynamic assert/retract, CUT, etc. Simulation results show that peak execution rates of over 1000 KLIPS can be obtained.  相似文献   

4.
Although the problem of increasing the speed of rule-based programs has been studied for a long while, so far the level of parallelism achieved under various parallel processing schemes fails to meet expectations of high performance gains. Most of the work has focused on manipulating existing rule programs to accommodate parallel architectures. However, without changing the inherently sequential algorithms encoded in rule form and without using intrinsic parallel rule languages to exploit parallelism, speedup is severely limited. In this paper we demonstrate that to maximize parallelism, manipulations must take place at the algorithmic level in addition to the program level. However, traditional rule languages are not equipped to easily express parallel algorithms. Thus, an inherently parallel rule language must be devised to enable maximum parallelism. We present our initial specification of such a language, named PARULEL. Preliminary performance results are detailed for one test case program. PARULEL is one part of a larger research effort that considers four key approaches to parallel rule processing: inherently parallel execution semantics; the use of redaction meta-rules to allow programmer control of the set of parallel instantiations; an incremental evaluation algorithm that incorporates, in an efficient manner, updates to the initial set of assumptions; and an algorithm that maps a program to a parallel architecture.  相似文献   

5.
There are many paradigms being promoted and explored for programming parallel computers, including modified sequential languages, new imperative languages and applicative languages. SISAL is an applicative language which has been designed by a consortium of industrial and research organizations for the specification and execution of parallel programs. It allows programs to be written with little concern for the structure of the underlying machine, thus the programmer is free to explore different ways of expressing the parallelism. A major problem with applicative languages has been their poor efficiency at handling large data structures. To counter this problem SISAL includes some advanced memory management techniques for reducing the amount of data copying that occurs. In this paper we discuss the implementation of some image processing benchmarks in SISAL and C to evaluate the effectiveness of the memory management code. In general, the SISAL program was easier to code than the C (augmented with the PARMACS macros) because we were not concerned with the parallel implementation details. We found that the SISAL performance was in general comparable to C, and that it could be brought in line with an efficient parallel C implementation by some programmer-specified code transformations.  相似文献   

6.
The data parallel meta language (DPML) and its associated Fortran source code rewriter (DP77) support architecture independent, high performance climate and weather prediction models. The language allows the data domains over which a program operates, the communication patterns required between elements of those data domains, and some or all of the calculations of a program to be expressed at a very high level. DPML uses explicit data parallelism to express the inherent parallelism of the models, with the result that programs are easily compilable into target machine code. DP77 uses information from the DPML program to translate Fortran routines into the host specific Fortran form required for their parallel execution within the model. This paper describes the general strategy behind the development of DPML, discusses its language features using examples drawn from climate modelling, and provides details of the mechanism it uses for incorporating Fortran into data parallel programs. Encouraging results are reported for DPML versions of the standard weather benchmark models executing on vector, SIMD, and MIMD (shared memory) machines. While the paper is set within the framework of climate modelling, the technique has obvious wider implications.  相似文献   

7.
A new technique for estimating and understanding the speed improvement that can result from executing a program on a parallel computer is described. The technique requires no additional programming and minimal effort by a program's author. The analysis begins by tracing a sequential program. A parallelism analyzer uses information from the trace to simulate parallel execution of the program. In addition to predicting parallel performance, the parallelism analyzer measures many aspects of a program's dynamic behavior. Measurements of six substantial programs are presented. These results indicate that the three symbolic programs differ substantially from the numeric programs and, as a consequence, cannot be automatically parallelized with the same compilation techniques  相似文献   

8.
The inadequacies of conventional parallel languages for programming multicomputers are identified. The C* language is briefly reviewed, and a compiler that translates C* programs into C programs suitable for compilation and execution on a hypercube multicomputer is presented. Results illustrating the efficiency of executing data-parallel programs on a hypercube multicomputer are reported. They show the speedup achieved by three hand-compiled C* programs executing on an N-Cube 3200 multicomputer. The first two programs, Mandelbrot set calculation and matrix multiplication, have a high degree of parallelism and a simple control structure. The C* compiler can generate relatively straightforward code with performance comparable to hand-written C code. Results for a C* program that performs Gaussian elimination with partial pivoting are also presented and discussed  相似文献   

9.
Because multicore CPUs have become the standard with all major hardware manufacturers, it becomes increasingly important for programming languages to provide programming abstractions that can be mapped effectively onto parallel architectures. Stream processing is a programming paradigm where computations are expressed as independent actors that communicate via FIFO data-channels. The coarse-grained parallelism exposed in stream programs facilitates such an efficient mapping of actors onto the underlying multicore hardware. We propose a stream-parallel programming abstraction that extends object-oriented languages with stream-programming facilities. StreamPI consists of a class hierarchy for actor-specification together with a language-independent runtime system that supports the execution of stream programs on multicore architectures. We show that the language-specific part of StreamPI, i.e., the class hierarchy, can be implemented as a library-level programming language extension. A library-level extension has the advantage that an existing programming language implementation need not be touched. Legacy-code can be mixed with a stream-parallel application, and the use of sequential legacy code with actors is supported. Unlike previous approaches, StreamPI allows dynamic creation and subsequent execution of stream programs. StreamPI actors are typed. Type-safety is achieved through type-checks at stream graph creation time. We have implemented StreamPI??s language-independent runtime system and language interfaces for Ada?2005 and C++ for Intel multicore architectures. We have evaluated StreamPI for up to 16 cores on a two?CPU 8-core Intel Xeon X7560 server, and we provide a performance comparison with StreamIt?(Gordon et al. in International Conference on Architectural Support for Programming Languages and Operating Systems, 2006), which is the de facto standard for stream-parallel programming. Although our approach provides greater programming flexibility than StreamIt, the performance of StreamPI compares favorably to the static compilation model of StreamIt.  相似文献   

10.
Hood allows debugging Haskell programs by introducing observation marks. In this paper, we show how to use pHood (a parallel version of Hood) to analyze properties of programs written using parallel dialects of Haskell. In particular, we have tested the tool with programs written in the languages GpH and Eden (although it also works with other parallel dialects), and we have implemented methods to perform analyses of unneeded work as well as an analysis of duplicated work. By doing so, we show the usefulness and versatility of the tool.  相似文献   

11.
Motivated by the advent of powerful hardware such as SMP machines and execution environments such as Grids, research in parallel programming has gained much attention within the distributed computing community. There is a substantial body of efforts in the form of parallel libraries and frameworks that supply developers with programming tools to exploit parallelism in their applications. Still, many of these efforts prioritize performance over other important characteristics such as code invasiveness, ease of use and independence of the underlying executing hardware/environment. In this paper, we present EasyFJP, a new approach for semi-automatically injecting parallelism into sequential Java applications that offers a convenient balance to these four aspects. EasyFJP is based upon the popular fork/join parallel pattern, and combines implicit, application-level parallelism with explicit, non-invasive application tuning. Experiments performed with several classic CPU-intensive benchmarks and a real-world application confirm that EasyFJP effectively addresses these problems while delivers very competitive performance.  相似文献   

12.
Programming multiprocessor parallel architectures is a complex task. This paper describes a block-structured scientific programming language, BLAZE, designed to simplify this task. BLAZE contains array arithmetic, ‘forall’ loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow.

A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture, while neglecting the remainder. This paper describes the features of BLAZE, and show how this language would be used in typical scientific programming.  相似文献   


13.
The widespread use of parallel machines has been hampered by the difficulty of mapping applications onto them effectively. The difficulty arises because current programming languages require the programmer to specify a problem to be solved at a low level of abstraction in an imperative form. Thus the programmer must immediately encode an architecture-specific algorithm detailing every communication and calculation. This process is prone to error and complicates the reuse of software.

An alternative approach is to specify the problem to be solved at a high-level in a functional language. Meaning-preserving program transformations can then be used to derive a parallel algorithm. Such algorithms can be run on parallel graph-reduction or dataflow machines which automatically exploit the implicit parallelism in a functional language program. Such automatic decomposition techniques, however, are not yet capable of fully yielding the extra performance offered by the parallel hardware.

We show how, by including an architecture specification with the problem specification, and extending the amount of transformation performed, it is possible to produce functional language code that explicity expresses the calculations and communications to be performed by the processors. This simplifies compilation, yields faster programs and enables parallel software to be developed for a wide variety of parallel computer architectures.

A goal-seeking transformation methodology has been developed which enables a high-level functional specification of the problem and a high-level functional abstraction of the target computer architecture to be systematically manipulated to produce an efficient parallel algorithm tailored to the target architecture. As the transformations start from very high-level specifications, the discovery of new algorithms is facilitated.

A case study is used to demonstrate the effectiveness of the technique. We show how a high level specification for sort can be transformed with a pipeline architecture specification to give a mergesort and how the same specification with a dynamic-message-passing architecture specification can be transformed to a novel parallel quicksort.  相似文献   


14.
论文致力于对图像处理算法的串行C程序进行子字并行分析,并重定向到带有多媒体扩展的通用处理器和多媒体专用嵌入式微处理器。图像处理算法的特点决定其是内在可并行的,这种并行粒度介于数据并行(DLP)和指令级并行(ILP)之间,称之为子字并行。但是,当前的编译技术很难充分挖掘和定位程序基本块内的子字并行,对此设计了一种基于流图程序表示的编译方法,能够从串行程序中显式地定位子字并行。扩展了编译器的功能,增加了特定的模式库,基于模式识别的控制流和数据流分析后,产生特定的子字并行流图(SWFG,Sub-WordFlowGraph),并将该图作为中间表示,提供给子字并行指令选择,进而实现有效的子字并行代码产生。  相似文献   

15.
This paper begins by describing BSL, a new logic programming language fundamentally different from Prolog. BSL is a nondeterministic Algol-class language whose programs have a natural translation to first order logic; executing a BSL program without free variables amounts to proving the corresponding first order sentence. A new approach is proposed for parallel execution of logic programs coded in BSL, that relies on advanced compilation techniques for extracting fine grain parallelism from sequential code. We describe a new “Very Long Instruction Word” (VLIW) architecture for parallel execution of BSL programs. The architecture, now being designed at the IBM Thomas J. Watson Research Center, avoids the synchronization and communication delays (normally associated with parallel execution of logic programs on multiprocessors), by determining data dependences between operations at compile time, and by coupling the processing elements very tightly, via a single central shared register file. A simulator for the architecture has been implemented and some simulation results are reported in the paper, which are encouraging.  相似文献   

16.
In this paper we show how implicit parallelism in Java programs can be made explicit by a restructuring compiler using the multi-threading mechanism of the language. In particular, we focus on automatically exploiting implicit parallelism in loops and multi-way recursive methods. Expressing parallelism in Java itself clearly has the advantage that the transformed program remains portable. After compilation of the transformed Java program into byte-code, speedup can be obtained on any platform on which the Java byte-code interpreter supports the true parallel execution of threads. Moreover, we will see that the transformations presented in this paper only induce a slight overhead on uni-processors. © 1997 John Wiley & Sons, Ltd.  相似文献   

17.
Now that multicore chips are common, providing an approach to parallel programming that is usable by regular programmers has become even more important. This cloud has one silver lining: providing useful speedup on a program is useful in and of itself, even if the resulting performance is lower than the best possible parallel performance on the same program. To help achieve this goal, Yada is an explicitly parallel programming language with sequential semantics. Explicitly parallel, because we believe that programmers need to identify how and where to exploit potential parallelism, but sequential semantics so that programmers can understand and debug their parallel programs in the way that they already know, i.e. as if they were sequential.The key new idea in Yada is the provision of a set of types that support parallel operations while still preserving sequential semantics. Beyond the natural read-sharing found in most previous sequential-like languages, Yada supports three other kinds of sharing. Writeonce locations support a single write and multiple reads, and two kinds of sharing for locations updated with an associative operator generalise the reduction and parallel-prefix operations found in many data-parallel languages. We expect to support other kinds of sharing in the future.We have evaluated our Yada prototype on eight algorithms and four applications, and found that programs require only a few changes to get useful speedups ranging from 2.2 to 6.3 on an 8-core machine. Yada performance is mostly comparable to parallel implementations of the same programs using OpenMP or explicit threads.  相似文献   

18.
陈俊朴 《计算机工程》2009,35(10):33-36
网络处理器具有并行体系结构,而其高级语言往往具有串行语义。对串行程序进行并行化编译要求引入同步,而同步的优劣又影响生成代码的执行效率。针对网络处理器上的程序,提出一个对同步进行优化的程序划分算法以增加程序的并行性。实验数据表明,在一些有代表性的网络应用上,该算法可提高程序的并行性,并提升性能。  相似文献   

19.
Decomposition abstraction is the process of organizing and specifying decomposition strategies for the exploitation of parallelism available in an application. In this paper we develop and evaluate declarative primitives for rule-based programs that expand opportunities for parallel execution. These primitives make explicit, implicit relations among the data and similarly among the rules. The semantics of the primitives are presented in a general object-based framework such that they may be applied to most rule-based programming languages. We show how the additional information provided by the decomposition primitives can be incorporated into a semantic-based dependency analysis technique. The resulting analysis reveals parallelism at compile time that is very difficult, if not impossible, to discover by traditional syntactic analysis techniques. Simulation results demonstrate scalable and broadly available parallelism  相似文献   

20.
This paper describes Cicero, a set of language constructs to allow constructive protocol specifications. Unlike other protocol specification languages, Cicero gives programmers explicit control over protocol execution, and facilitates both sequential and parallel implementations, especially for protocols above the transport-layer. It is intended to be used in conjunction with domain-specific libraries, and is quite different in philosophy and mode of use from existing protocol specification languages. A feature of Cicero is the use of event patterns to control synchrony, asynchrony, and concurrency in protocol execution, which helps programmers build robust protocol implementations. Event-pattern driven execution also enables implementers to exploit parallelism of varying grains in protocol execution. Event patterns can also be translated into other formal models, so that existing verification techniques may be used  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号