首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The paper presents a dataflow execution model, DIALOG, for logic programs which operates on an intermediate virtual machine. The virtual machine is granulated at clause argument level to exploit argument parallelism through unification. The model utilises a new variable binding scheme that eliminates dereference operations for accessing variables, and therefore supports OR-parallelism in the highly distributed dataflow environment. The model has been implemented in Occam. A conventional dataflow architecture in support of the model has been simulated as a testbed for the evaluation. The simulation indicates some encouraging results and suggests future improvements.  相似文献   

2.
An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods. Recommended by: Patrick Valduriez  相似文献   

3.
This paper begins by describing BSL, a new logic programming language fundamentally different from Prolog. BSL is a nondeterministic Algol-class language whose programs have a natural translation to first order logic; executing a BSL program without free variables amounts to proving the corresponding first order sentence. A new approach is proposed for parallel execution of logic programs coded in BSL, that relies on advanced compilation techniques for extracting fine grain parallelism from sequential code. We describe a new “Very Long Instruction Word” (VLIW) architecture for parallel execution of BSL programs. The architecture, now being designed at the IBM Thomas J. Watson Research Center, avoids the synchronization and communication delays (normally associated with parallel execution of logic programs on multiprocessors), by determining data dependences between operations at compile time, and by coupling the processing elements very tightly, via a single central shared register file. A simulator for the architecture has been implemented and some simulation results are reported in the paper, which are encouraging.  相似文献   

4.
This paper presents a parallel execution model for exploiting AND-parallelism in Horn Clause logic programs. The model is based upon the generator-consumer approach, and can be implemented efficiently with small run-time overhead. Other related models that have been proposed to minimize the run-time overhead are unable to exploit the full parallelism inherent in the generator-consumer approach. Furthermore, our model performs backtracking more intelligently than these models. We also present two implementation schemes to realize our model: one has a coordinator to control the activities of processes solving different literals in the same clause; and the other achieves synchronization by letting processes pass messages to each other in a distributed fashion. Trade-offs between these two schemes are then discussed. This work was supported by Army Research Office grant #DAAG29-84-K-0060 to the Artificial Intelligence Laboratory at the University of Texas at Austin.  相似文献   

5.
A predicate/transition net model for a subset of Horn clause logic programs is presented. The syntax, transformation procedure, semantics, and deduction process for the net model are discussed. A possible parallel implementation for the net model is described, which is based on the concepts of communicating processes and relations. The proposed net model offers a syntactical variant of Horn clause logic and has two distinctions from other existing schemes for the logic programs: representation formalism and the deduction method. The net model provides an approach towards the solutions of the separation of logic from control and the improvement of the execution efficiency through parallel processing for the logic programs. The abstract nature of the net model also lends itself to different implementation strategies.<>  相似文献   

6.
Parallel execution of a programR (intuitively regarded as a partial order) is usually modeled by sequentially executing one of the total orders (interleavings) into which it can be embedded. Our work deviates from this serialization principle by usingtrue concurrency to model parallel execution. True concurrency is represented via completions ofR tosemi total orders, called time diagrams. These orders are characterized via a set of conditions (denoted byCt), yielding orders or time diagrams which preserve some degree of the intended parallelism inR. Another way to express semi total orders is to use re-writing or derivation rules (denoted byCx) which for any programR generates a set of semi-total orders. This paper includes a classification of parallel execution into three classes according to three different types ofCt conditions. For each class a suitableCx is found and a proof of equivalence between the set of all time diagrams satisfyingCt and the set of all terminalCx derivations ofR is devised. This equivalence between time diagram conditions and derivation rules is used to define a novel notion of correctness for parallel programs. This notion is demonstrated by showing that a specific asynchronous program enforces synchronous execution, which always halts, showing that true concurrency can be useful in the context of parallel program verification.  相似文献   

7.
Curare, the program restructurer described in this paper automatically transforms a sequential Lisp program into an equivalent concurrent program that runs on a multiprocessor.Data dependences constrain the program's concurrent execution because, in general, two conflicting statements cannot execute in a different order without affecting the program's result. Not all dependences are essential to produce the program's result.Curare attempts to transform the program so it computes its result with fewer conflicts. An optimized program will execute with less synchronization and more concurrency. Curare then examines loops in a program to find those that are unconstrained or lightly constrained by dependences. By necessity,Curare treats recursive functions as loops and does not limit itself to explicit program loops. Recursive functions offer several advantages over explicit loops since they provide a convenient framework for inserting locks and handling the dynamic behavior of symbolic programs. Loops that are suitable for concurrent execution are changed to execute on a set of concurrent server processes. These servers execute single loop iterations and therefore need to be extremely inexpensive to invoke.Restructured programs execute significantly faster than the original sequential programs. This improvement is large enough to attract programmers to a multiprocessor, particularly since it requires little effort on their part.This research was funded by DARPA contract numbers N00039-85-C-0269 (SPUR) and N00039-84-C-0089 (XCS) and by an NSF Presidential Young Investigator award to Paul N. Hilfinger. Additional funding came from the California MICRO program (in conjunction with Texas Instruments, Xerox, Honeywell, and Phillips/Signetics).  相似文献   

8.
We model a deterministic parallel program by a directed acyclic graph of tasks, where a task can execute as soon as all tasks preceding it have been executed. Each task can allocate or release an arbitrary amount of memory (i.e., heap memory allocation can be modeled). We call a parallel schedule “space efficient” if the amount of memory required is at most equal to the number of processors times the amount of memory required for some depth-first execution of the program by a single processor. We describe a simple, locally depth-first scheduling algorithm and show that it is always space efficient. Since the scheduling algorithm is greedy, it will be within a factor of two of being optimal with respect to time. For the special case of a program having a series-parallel structure, we show how to efficiently compute the worst case memory requirements over all possible depth-first executions of a program. Finally, we show how scheduling can be decentralized, making the approach scalable to a large number of processors when there is sufficient parallelism  相似文献   

9.
Programs whose parallelism stems from multiple recursion form an interesting subclass of parallel programs with many practical applications. The highly irregular shape of many recursion trees makes it difficult to obtain good load balancing with small overhead. We present a system, called REAPAR, that executes recursive C programs in parallel on SMP machines. Based on data from a single profiling run of the program, REAPAR selects a load-balancing strategy that is both effective and efficient and it generates parallel code implementing that strategy. The performance obtained by REAPAR on a diverse set of benchmarks matches that published for much more complex systems requiring high-level problem-oriented explicitly parallel constructs. A case study even found REAPAR to be competitive to handwritten (low-level, machine-oriented) thread-parallel code  相似文献   

10.
11.
We present the first backtracking algorithm for stream AND-parallel logic programs. It relies on compile-time knowledge of the dataflow graph of each clause to let it figure out efficiently which goals to kill or restart when a goal fails. This crucial information, which we derive from mode declarations, was not available at compile-time in any previous stream AND-parallel system. We show that modes can increase the precision of the backtracking algorithm, though our algorithm allows this precision to be traded off against overhead on a procedure-by-procedure and call-by-call basis. The modes also allow our algorithm to handle efficiently programs that manipulate partially instantiated data structures and an important class of programs with circular dependency graphs. On code that does not need backtracking, the efficiency of our algorithm approaches that of the committed-choice languages; on code that does need backtracking its overhead is comparable to that of the independent AND-parallel backtracking algorithms.  相似文献   

12.
13.
In simultaneous multithreading (SMT) multiprocessors, using all the available threads (logical processors) to run a parallel loop is not always beneficial due to the interference between threads and parallel execution overhead. To maximize the performance of a parallel loop on an SMT multiprocessor, it is important to find an appropriate number of threads for executing the parallel loop. This article presents adaptive execution techniques that find a proper execution mode for each parallel loop in a conventional loop-level parallel program on SMT multiprocessors. A compiler preprocessor generates code that, based on dynamic feedbacks, automatically determines at run time the optimal number of threads for each parallel loop in the parallel application. We evaluate our technique using a set of standard numerical applications and running them on a real SMT multiprocessor machine with 8 hardware contexts. Our approach is general enough to work well with other SMT multiprocessor or multicore systems.  相似文献   

14.
Stochastic bounds are obtained on execution times of parallel programs when the number of processors is unlimited. A parallel program is considered to consist of interdependent tasks with synchronization constraints. These constraints are described by an acyclic directed graph called a task graph. The execution times of tasks are considered to be independently identically distributed (i.i.d.) random variables. The performance measure of interest is the overall execution of the considered parallel program (task graph). Stochastic bound methods are applied to obtain lower and upper bounds on this measure. Another upper bound is obtained for parallel programs having `new better than used in expectation' (NBUE) random variables as task execution times. NBUE random variables are replaced with exponential random variables of the same mean to derive this upper bound  相似文献   

15.
This paper presents the implementation and performance results of anand-parallel execution model of logic programs on a shared-memory multiprocessor. The execution model is meant for logic programs with “don't-know nondeterminism”, and handles binding conflicts by dynamically detecting dependencies among literals. The model also incorporates intelligent backtracking at the clause level. Our implementation of this model is based upon the Warren Abstract Machine (WAM); hence it retains most of the efficiency of the WAM for sequential segments of logic programs. Performance results on Sequent Balance 21000 show that on suitable programs, our parallel implementation can achieve linear speedup on dozens of processors. We also present an analysis of different overheads encountered in the implementation of the execution model.  相似文献   

16.
New compact, low-power implementation technologies for processors and imaging arrays can enable a new generation of portable video products. However, software compatibility with large bodies of existing applications written in C prevents more efficient, higher performance data parallel architectures from being used in these embedded products. If this software could be automatically retargeted explicitly for data parallel execution, product designers could incorporate these architectures into embedded products. The key challenge is exposing the parallelism that is inherent in these applications but that is obscured by artifacts imposed by sequential programming languages. This paper presents a recognition-based approach for automatically extracting a data parallel program model from sequential image processing code and retargeting it to data parallel execution mechanisms. The explicitly parallel model presented, called multidimensional data flow (MDDF), captures a model of how operations on data regions (e.g., rows, columns, and tiled blocks) are composed and interact. To extract an MDDF model, a partial recognition technique is used that focuses on identifying array access patterns in loops, transforming only those program elements that hinder parallelization, while leaving the core algorithmic computations intact. The paper presents results of retargeting a set of production programs to a representative data parallel processor array to demonstrate the capacity to extract parallelism using this technique. The retargeted applications yield a potential execution throughput limited only by the number of processing elements, exceeding thousands of instructions per cycle in massively parallel implementations.  相似文献   

17.
With the growing availability of multiprocessors, a great deal of attention has been given to executing Prolog in parallel. A question that naturally arises is how to execute standard sequential Prolog programs with side effects in parallel. The problem of performing side effects in AND parallel systems has been considered elsewhere. This paper presents a method that generates sequential semantics of side effect predicates in an OR parallel system. First, a general method is given for performing data side effects such as read and write. This method is then extended to control side effects such as asserta, assertz, and retract. Finally, a constant-time algorithm for performing cut is presented.The work of L. V. Kale was supported by the National Science Foundation under Grant NSF-CCR-8700988. The work of D. A. Padua and D. C. Sehr was supported in part by the National Science Foundation under Grant NSF-MIP-8410110, the Department of Energy under Grant DOE DE-FG02-85ER25001, and a donation from the IBM Corporation to the Center for Supercomputing Research and Development. D. C. Sehr holds a fellowship from the Office of Naval Research.  相似文献   

18.
The restricted and-parallelism (RAP) execution model of logic programs is described. This model uses a compile-time data-dependence analysis to generate execution graph expressions for the clauses in a Prolog program. These expressions use simple run-time tests to determine the possibilities of parallelism and produce multiple parallel execution graphs from a single execution graph expression. A simple algorithm is then presented which automatically produces these execution graphs for Prolog clauses. Several ways in which the algorithm can be significantly improved by using the results of program-level data-dependence analysis are discussed.  相似文献   

19.
20.
In this paper,we focus on the compiling implementation of parlalel logic language PARLOG and functional language ML on distributed memory multiprocessors.Under the graph rewriting framework, a Heterogeneous Parallel Graph Rewriting Execution Model(HPGREM)is presented firstly.Then based on HPGREM,a parallel abstact machine PAM/TGR is described.Furthermore,several optimizing compilation schemes for executing declarative programs on transputer array are proposed.The performance statistics on transputer array demonstrate the effectiveness of our model,parallel abstract machine,optimizing compilation strategies and compiler.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号