首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We propose a new paradigm for programming multiprocessor systems, 2DT (two-dimensional transformations). 2DT programs are composed of local computations on linear data (columns) and global transformations on 2-dimensional combinations of columns (2D-arrays). Local computations can be expressed in a functional or imperative base language; a typed variant of Backus' FP is chosen for 2DT-FP. The level of abstraction of 2DT makes it suitable for programming a relevant set of algorithms, in general any algorithms, whose data can be easily mapped to 2-dimensional representations. A set of examples tries to prove this claim. An interleaving semantics for 2DT-FP is given, exposing the potential for parallel execution of 2DT-FP programs. The claim is proved that any sequential and thus any parallel execution will deliver the same result. The implementation realized on the Intel Hypercube is described. Supported by the Deutsche Forschungsgemeinschaft, DFG  相似文献   

2.
The Muse (multiple sequential Prolog engines) approach has been used to make a simple and efficient OR-parallel implementation of the full Prolog language. The performance results of the Muse system on bus-based multiprocessor machines have been presented in previous papers. This paper discusses the implementation and performance results of the Muse system on switch-based multiprocessors (the BBN Butterfly GP1000 and TC2000). The results of Muse execution show that high real speedups can be achieved for Prolog programs that exhibit coarse-grained parallelism. The scheduling overhead is equivalent to around 8–26 Prolog procedure calls per task on the TC2000. The paper also compares the Muse results with corresponding results for the Aurora OR-parallel Prolog system. For a large set of benchmarks, the results are in favor of the Muse system.  相似文献   

3.
4.
The JR concurrent programming language extends Java with additional concurrency mechanisms, which are built upon JR's operations and capabilities. JR operations generalize methods in how they can be invoked and serviced. JR capabilities act as reference to operations. Recent changes to the Java language and implementation, especially generics, necessitated corresponding changes to the JR language and implementation. This paper describes the new JR language features (known as JR2) of generic operations and generic capabilities. These new features posed some interesting implementation challenges. The paper describes our initial implementation (JR21) of generic operations and capabilities, which works in many, but not all, cases. It then describes the approach our improved implementation (JR24) uses to fully implement generic operations and capabilities. The paper also describes the benchmarks used to assess the compilation and execution time performances of JR21 and JR24. The JR24 implementation reduces compilation times, mainly due to reducing the number of files generated during JR program translation, without noticeably impacting execution times.  相似文献   

5.
The optimization of the execution time of a parallel algorithm can be achieved through the use of an analytical cost model function representing the running time. Typically the cost function includes a set of parameters that model the behavior of the system and the algorithm. In order to reach an optimal execution, some of these parameters must be fitted according to the input problem and to the target architecture. An optimization problem can be stated where the modeled execution time for the algorithm is used to estimate the parameters. Due to the large number of variable parameters in the model, analytical minimization techniques are discarded. Exhaustive search techniques can be used to solve the optimization problem, but when the number of parameters or the size of the computational system increases, the method is impracticable due to time restrictions. The use of approximation methods to guide the search is also an alternative. However, the dependence on the algorithm modeled and the bad quality of the solutions as a result of the presence of many local optima values in the objective functions are also drawbacks to these techniques. The problem becomes particularly difficult in complex systems hosting a large number of heterogeneous processors solving non-trivial scientific applications. The use of metaheuristics allows for the development of valid approaches to solve general problems with a large number of parameters. A well-known advantage of metaheuristic methods is the ability to obtain high-quality solutions at low running times while maintaining generality. We propose combining the parameterized analytical cost model function and metaheuristic minimization methods, which contributes to a novel real alternative to minimize the parallel execution time in complex systems. The success of the proposed approach is shown with two different algorithmic schemes on parallel heterogeneous systems. Furthermore, the development of a general framework allows us to easily develop and experiment with different metaheuristics to adjust them to particular problems.  相似文献   

6.
This paper describes MATISSE, a compiler able to translate a MATLAB subset to C targeting embedded systems. MATISSE uses LARA, an aspect‐oriented programming language, to specify additional information and transformations to the input MATLAB code, for example, insertion of code for initialization of variables, and specification of types and shapes of variables. The compiler is being developed bearing in mind flexibility, multitarget and multitoolchain support, allowing for the generation of several implementations in C from the same reference code in MATLAB. In this paper, we also present a number of techniques being employed in MATLAB to C compilation, such as element‐wise mapping operations, matrix views, weak types, and intrinsics. We validate these techniques using MATISSE and a set of representative benchmarks. More specifically, we evaluate the compiler with a set of 31 benchmarks using an embedded system board and a desktop computer. The results show speedups up to 1.8× by employing information provided by LARA aspects, when compared with C code generated without additional user information. When compared with the execution time of the original code running on MATLAB, the execution time of the generated C code achieved a geometric mean speedup of 13×. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

7.
8.
This paper will introduce the function systems approach to study the mathematicalproperties of Backus functional programming language FP.Here,a function system isdefined as a set of functions o FP.In the paper,we define the notions of complete,orthogonaland orthonormal systems,and prove their principal properties.These properties are used in thediscussion of the completeness of FP program algebra and properties of the expansion ofprogram in orthogonal systems.  相似文献   

9.
This work proposes an execution model for massively parallel systems aiming at ensuring the communications overlap by the computations. This model is named SCAC : Synchronous Communication Asynchronous Computation. This weakly-coupled model separates the execution of communication phases from those of computation in order to facilitate their overlapping, thus covering the data transfer time. To allow the simultaneous execution of these phases, we propose an approach based on three levels : two globally-centralized/locally-distributed hierarchical control levels and a parallel computation level. A generic and parametric implementation of the SCAC model was performed to fit different applications. This implementation allows the designer to choose the system components (from pre-designed ones) and to set its parameters in order to build the adequate SCAC configuration for the target application. An analytical estimation is proposed to predict the execution time of an application running in SCAC mode, in order to facilitate the parallel program design and the SCAC architecture configuration. The SCAC model was validated by simulation, synthesis and implementation on an FPGA platform, with different examples of parallel computing applications. The comparison of the results obtained by the SCAC model with other models has shown its effectiveness in terms of flexibility and speed-up.  相似文献   

10.
This paper describes compiler techniques that can translate standard OpenMP applications into code for distributed computer systems. OpenMP has emerged as an important model and language extension for shared-memory parallel programming. However, despite OpenMP's success on these platforms, it is not currently being used on distributed system. The long-term goal of our project is to quantify the degree to which such a use is possible and develop supporting compiler techniques. Our present compiler techniques translate OpenMP programs into a form suitable for execution on a Software DSM system. We have implemented a compiler that performs this basic translation, and we have studied a number of hand optimizations that improve the baseline performance. Our approach complements related efforts that have proposed language extensions for efficient execution of OpenMP programs on distributed systems. Our results show that, while kernel benchmarks can show high efficiency of OpenMP programs on distributed systems, full applications need careful consideration of shared data access patterns. A naive translation (similar to OpenMP compilers for SMPs) leads to acceptable performance in very few applications only. However, additional optimizations, including access privatization, selective touch, and dynamic scheduling, resulting in 31% average improvement on our benchmarks.  相似文献   

11.
Concurrent C, a superset of C providing parallel programming facilities, is considered. A uniprocessor version of Concurrent C was first implemented. After experience with this version, the Concurrent C implementation was extended to run on two types of multiple processor systems: a set of computers connected by a local area network (the distributed version) and a shared-memory multiprocessor (the multiprocessor version). Experience with implementing and using these versions of Concurrent C is described. Specifically, the language changes triggered by the multiple processor implementations, some sample programs, a comparison of the execution times on various systems, and the suitability of these multiple processor architectures are discussed  相似文献   

12.
Adaptive Executions of Multi-Physics Coupled Applications on Batch Grids   总被引:1,自引:0,他引:1  
Long running multi-physics coupled parallel applications have gained prominence in recent years. The high computational requirements and long durations of simulations of these applications necessitate the use of multiple systems of a Grid for execution. In this paper, we have built an adaptive middleware framework for execution of long running multi-physics coupled applications across multiple batch systems of a Grid. Our framework, apart from coordinating the executions of the component jobs of an application on different batch systems, also automatically resubmits the jobs multiple times to the batch queues to continue and sustain long running executions. As the set of active batch systems available for execution changes, our framework performs migration and rescheduling of components using a robust rescheduling decision algorithm. We have used our framework for improving the application throughput of a foremost long running multi-component application for climate modeling, the Community Climate System Model (CCSM). Our real multi-site experiments with CCSM indicate that Grid executions can lead to improved application throughput for climate models.  相似文献   

13.
This paper presents a new language that integrates the real-time and distributed paradigms within the framework of a concurrent logic language. Concurrent logic languages (CLLs) are capable of expressing concurrence, communication and nondeterminism in a natural way. That is, the intrinsic parallel semantics of the concurrent logic languages makes them well-suited for distributed programming. The proposed language is particularly suitable for loosely coupled systems and it contains mechanisms for distributed and real-time process control. A new execution model for concurrent logic languages is presented, which enables efficient distributed execution and real-time control. The model is introduced by giving an operational semantics for the language and the new model's implementation is discussed, including the definition of a new abstract machine and its implementation on a network of Unix workstations. Although the sequential core is not optimized, some previous results are discussed, showing the feasibility of the language's execution model for distributed real-time systems. The language is currently being used as the kernel language for a distributed simulation and validation tool for communication protocols.  相似文献   

14.
This paper presents a new language that integrates the real-time and distributed paradigms within the framework of a concurrent logic language. Concurrent logic languages (CLLs) are capable of expressing concurrence, communication and nondeterminism in a natural way. That is, the intrinsic parallel semantics of the concurrent logic languages makes them well-suited for distributed programming. The proposed language is particularly suitable for loosely coupled systems and it contains mechanisms for distributed and real-time process control. A new execution model for concurrent logic languages is presented, which enables efficient distributed execution and real-time control. The model is introduced by giving an operational semantics for the language and the new model's implementation is discussed, including the definition of a new abstract machine and its implementation on a network of Unix workstations. Although the sequential core is not optimized, some previous results are discussed, showing the feasibility of the language's execution model for distributed real-time systems. The language is currently being used as the kernel language for a distributed simulation and validation tool for communication protocols.  相似文献   

15.
Sidle系统是运行在SUN工作站网络上的一组实用程序,利用空闲的处理机资源进行大粒度的并行计算.同其它远程执行设备相比,它能支持程序内部并行和嵌套的远程执行,允许一个服务员机接受多个远程执行任务.本文介绍了这些特点和透明性的实现方法.  相似文献   

16.
17.
McGovern  Amy  Moss  Eliot  Barto  Andrew G. 《Machine Learning》2002,49(2-3):141-160
The execution order of a block of computer instructions on a pipelined machine can make a difference in running time by a factor of two or more. Compilers use heuristic schedulers appropriate to each specific architecture implementation to achieve the best possible program speed. However, these heuristic schedulers are time-consuming and expensive to build. We present empirical results using both rollouts and reinforcement learning to construct heuristics for scheduling basic blocks. In simulation, the rollout scheduler outperformed a commercial scheduler on all benchmarks tested, and the reinforcement learning scheduler outperformed the commercial scheduler on several benchmarks and performed well on the others. The combined reinforcement learning and rollout approach was also very successful. We present results of running the schedules on Compaq Alpha machines and show that the results from the simulator correspond well to the actual run-time results.  相似文献   

18.
In data-parallel programming, operations are performed simultaneously on all elements of large data structures. Backus′s FP functional language promotes this view. FP provides a large set of data rearrangement primitives, and a useful set of functional combining forms that are applied to entire data structures. We describe an FP compiler that generates programs capable of exploiting data-parallelism. The FP compiler deduces the type and shape of objects through type inference, and generates efficient parallel implementations of combining forms. In addition, the compiler determines the effects of data rearrangement functions at compile-time, thereby avoiding creation of large intermediate data structures, and reducing interprocessor communication overhead. FP and its compiler are formally specified, reducing ambiguity concerning constructs of the language and results of the compiler. Performance and speed-ups achieved from our compilation and optimization techniques are demonstrated with timings from a prototype implementation on the Connection Machine CM-2.  相似文献   

19.

Centrality measures or indicators of centrality identify most relevant nodes of graphs. Although optimized algorithms exist for computing of most of them, they are still time consuming and are even infeasible to apply to big enough graphs like the ones representing social networks or extensive enough computer networks. In this paper, we present a parallel implementation in C language of some optimal algorithms for computing of some indicators of centrality. Our parallel version greatly reduces the execution time of their sequential (non-parallel) counterpart. The proposed solution relies on threading, allowing for a theoretical improvement in performance close to the number of logical processors (cores) of the single computer in which it is running. Our software has been tested in several platforms, including the supercomputer Calendula, in which we achieved execution times close to 18 times faster when running our parallel implementation instead of our sequential one. Our solution is multi-platform and portable, working on any machine with several logical processor which is capable of compiling and running C language code.

  相似文献   

20.
One interpretive approach for handling concurrency is to provide an interpreter instance for each executing language‐level process. Such an approach has mainly been applied to concurrent implementations of logic and functional languages. This paper describes the use of this approach in constructing an interpreter for an imperative, distributed programming language from an existing compiler and run‐time support system (RTS). Primary design goals were to exploit the existing compiler to the extent possible as well as to have minimal impact on the RTS used to support concurrency. We have been successful in meeting these goals. Additionally, performance results show our interpreter's execution times compare favorably to the times required for compilation, linkage, and execution of small programs or programs with a significant number of calls to the RTS; on such programs, our interpreter's performance also compares favorably to that of the standard Java implementation. However, for larger programs and programs with fewer calls to the underlying RTS, the conventional compiler‐based implementation outperforms the interpreter implementation. For many distributed programs in which network costs dominate, the performances of the two implementations differ little. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号