首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper presents a system for parallel execution of Prolog supporting both independent conjunctive and disjunctive parallelism. The system is intended for distributed memory architecture and is composed of a set of workers with a hierarchical structure scheduler. The execution model has been designed in such a way that each worker's environment does not contain references to terms in other environments, thus reducing communication overhead. In order to guarantee the improvement of the performance by the parallelism exploitation, a granularity control has been introduced for each kind of parallelism. For conjunctive parallelism PDP applies a control based on the estimation provided by CASLOG. The features of the system allow to introduce this control without adding overhead. For disjunctive parallelism PDP controls granularity by applying a heuristic-based method, which can be adapted to other parallel Prolog systems. Different scheduling policies have also been tested. The system has been implemented on a transputer network and performance results show that it provides a high speedup for coarse grain parallel programs.  相似文献   

2.
When two or more literals in the body of a Prolog clause are solved in (AND) parallel, their solutions need to bejoined to compute solutions for the clause. This is often a difficult problem in parallel Prolog systems that exploit OR and independent AND parallelism in Prolog programs. In several AND/OR parallel systems proposed recently, this problem is side-stepped at the cost of unexploited OR parallelism in the program, in part due to the complexity of the backtracking algorithm beneath AND parallel branches. In some cases, the data dependency graphs used by these systems cannot represent all the exploitable indenpendent AND parallelism known at compile time.In this paper, we describe the compile time analysis for an optimizedjoin algorithm for supporting independent AND parallelism in logic programs efficiently without leaving any OR parallelism unexploited. We then discuss how this analysis can be used to yield very efficient runtime behavior. We also discuss problems associated with a tree representation of the search space when arbitrarily complex data dependency graphs are permitted. We describe how these problems can be resolved by mapping the search space onto the data dependency graphs themselves. The algorithm has been implemented in a compiler for parallel Prolog based on the Reduce-OR process model. The algorithm is suitable for the implementation of AND/OR systems on both shared and nonshared memory machines. Performance on benchmark programs exhibiting AND and OR parallelism on one shared memory machine and one message passing machine is presented.This work was supported in part by NSF Grants CCR-87-00988 and CCR-89-02496.A shorter version of this paper appears in theProceedings of NACLP 1990.  相似文献   

3.
This paper concerns the exploitation of user transparent inherent parallelism of pure Prolog programs using program transformation. We describe a novel paradigmenumerate-and-filter for transforming generate-and-test programs for execution under the committed-choice model extended to incorporate multiple solutions based on set enumeration. The paradigm simulates OR-parallelism by stream AND-parallelism integrating OR-parallelism, AND-parallelism, and stream parallelism. Generate-and-test programs are classified into three categories:simple generate-and-test, recursively embedded generate-and-test, and deeply intertwined generate-and-test. The intermediate programs are further transformed to reduce structure copying and metacalls. Algorithms are presented and demonstrated by transforming the representative examples of different classes of generate-and-test programs to Flat Concurrent Prolog equivalents. Statistics show that the techniques are efficient.Funded in part by Cleveland Advanced Manufacturing Program through the State of Ohio as a part of its core research program grant to Center of Automation and Intelligent Systems Research, Case Western Reserve University and NSF equipment grant CDA-8820390 to Kent State University.  相似文献   

4.
We argue that in order to exploit both Independent And-and Or-parallelism in Prolog programs there is advantage in recomputing some of the independent goals, as opposed to all their solutions being reused. We present an abstract model, called the Composition-tree, for representing and-or parallelism in Prolog programs. The Composition-tree closely mirrors sequential Prolog execution by recomputing some independent goals rather than fully re-using them. We also outline two environment representation techniques for And-Or parallel execution offull Prolog based on the Composition-tree model abstraction. We argue that these techniques have advantages over earlier proposals for exploiting and-or parallelism in Prolog.  相似文献   

5.
We present an overview of the ACE system, a sound and complete parallel implementation of Prolog that exploits parallelism transparently (i.e., without any user intervention) from AI programs and symbolic applications coded in Prolog. ACE simultaneously exploits all the major forms of parallelism – Or-parallelism, Independent And-parallelism, and Dependent And-parallelism – found in Prolog programs. These three varieties of parallelism are discussed in detail, along with the problems encountered in their practical exploitation. Our solutions to these problems, incorporated in the ACE system, are presented. The ACE system has been implemented on Sequent Symmetry and Sun Sparc Multiprocessors; performance results from this implementation for several AI programs are presented, which confirm the effectiveness of the choices made. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

6.
A specification of the OR-parallel execution of Prolog programs, using CHOCS (calculus of higher order communicating systems) [24], is presented in the paper. A translation is defined from Prolog programs and goals to CHOCS processes: the execution of the CHOCS process corresponding to a goal mimics the OR-parallel execution of the original Prolog goal. In the translation, clauses and predicate definitions of a Prolog program correspond to processes. To model OR-parallelism, the processes , corresponding to clauses (having the same head predicate ) start their execution concurrently, but, in order to respect the depth-first search rule, each is guarded by the termination of the executions of processes 's, . The computational model is proved correct with respect to the semantics of Prolog, as given in [4, 5]. Our model, because of its algebraic specification, can be easily used to prove properties of the parallel execution of Prolog programs. Moreover, the model exploits the maximum degree of parallelism, by giving the Prolog solutions in parallel, without any order among them. However, this model, being close to the Prolog semantics definition, contains sources of inefficiency which make it unpractical as a guide for the implementation. To overcome these problems, a new computational model is defined. This model is obtained by modifications of the basic one and thus its correctness can be easily proved. Finally, we show how to obtain models of different real implementations of OR-parallel Prolog by slight modification of the new model. The relations among all these models, in terms of parallelism degree, are studied by using the concepts of bisimulation and simulation, developed for concurrent calculi. Received: 5 May 1995 / 28 May 1996  相似文献   

7.
Although studies of a number of parallel implementations of logic programming languages are now available, their results are difficult to interpret due to the multiplicity of factors involved, the effect of each of which is difficult to separate. In this paper we present the results of a high-level simulation study of or- and independent and-parallelism with a wide selection of Prolog programs that aims to determine the intrinsic amount of parallelism, independently of implementation factors, thus facilitating this separation. We expect this study will be instrumental in better understanding and comparing results from actual implementations, as shown by some examples provided in the paper. In addition, the paper examines some of the issues and tradeoffs associated with the combination of and- and or-parallelism and proposes reasonable solutions based on the simulation data obtained.  相似文献   

8.
李士刚  胡长军  王珏  李建江 《软件学报》2013,24(12):2782-2796
低功耗及廉价性使得异构多核在超级计算机计算资源中占有重要比例.然而,异构多核具有高带宽及松耦合一致性等特点,获得理想的存储及计算性能需要更多地考虑底层硬件细节.实现了一种针对典型的异构多核Cell BE 处理器的多级并行模型CellMLP,通过C 语言扩展编译指导语句,实现了对数据并行、任务并行以及流水并行编程模型的支持,提高了并行程序生产率.运行支持优化方面,数据并行采用SPE 并行数据传输、双缓冲等优化手段来提高数据传输带宽;任务并行使用一种新式混合任务队列以支持异步任务窃取,降低SPE 线程间竞争,提高了任务并行的可扩展性;流水并行首次使用阻塞信号传输机制实现SPE 线程间的低开销同步操作.实验对Stream,NASBenchmark 及BOTS 等应用进行了测试,结果表明,CellMLP 可对多种典型并行应用进行高效支持.与目前同类编程模型SARC 及CellSs 进行性能对比,其结果表明,CellMLP 实际数据传输带宽以及非规则应用的支持方面具有明显优势.  相似文献   

9.
10.
为研究并行图形绘制技术,介绍图形绘制的流水线过程,对其内在的可并行性进行分析,研究并行绘制的实现方式,包括流水线并行、数据并行和作业并行,以及前分布拼接合成、中分布拼接合成和后分布拼接合成,讨论并行绘制面临的主要问题及其发展趋势。  相似文献   

11.
The restricted and-parallelism (RAP) execution model of logic programs is described. This model uses a compile-time data-dependence analysis to generate execution graph expressions for the clauses in a Prolog program. These expressions use simple run-time tests to determine the possibilities of parallelism and produce multiple parallel execution graphs from a single execution graph expression. A simple algorithm is then presented which automatically produces these execution graphs for Prolog clauses. Several ways in which the algorithm can be significantly improved by using the results of program-level data-dependence analysis are discussed.  相似文献   

12.
Prolog/Rex represents a powerful amalgamation of the latest techniques for knowledge representation and processing, rich in semantic features that ease the difficult task of encoding heterogeneous knowledge of real-world applications. The Prolog/Rex concept mechanism lets a user represent domain entities in terms of their structural and behavioral properties, including multiple inheritance, arbitrary user-defined relations among entities, annotated values (demons), incomplete knowledge, etc. A flexible rule language helps the knowledge engineer capture human expertise and provide flexible control of the reasoning process. Additional Prolog/Rex strength that cannot be found in any other hybrid language made on top of Prolog is language level support for keeping many potentially contradictory solutions to a problem, allowing possible solutions and their implications to be automatically generated and completely explored before they are committed. The same mechanism is used to model time-states, which are useful in planning and scheduling applications of Prolog/Rex  相似文献   

13.
This paper presents a new approach for parallel tabu search based on adaptive parallelism. Adaptive parallelism was used to dynamically adjust the parallelism degree of the application with respect to the system load. Adaptive parallelism demonstrates that high-performance computing using a hundred of heterogeneous workstations combined with massively parallel machines is feasible to solve large optimization problems. The parallel tabu search algorithm includes different tabu list sizes and new intensification/diversification mechanisms. Encouraging results have been obtained in solving the quadratic assignment problem. We have improved the best known solutions for some large real-world problems.  相似文献   

14.
This paper presents the performance analysis results for the RAP-WAM AND-Parallel Prolog architecture on shared-memory multiprocessor organizations. The goal of this parallel model is to provide inference speeds beyond those attainable in sequential systems, while supporting conventional logic programming semantics. Special emphasis is placed on sequential performance, storage efficiency, and low control overhead. First, the concepts and techniques used in the parallel execution model are described, along with the general methodology, benchmarks, and simulation tools used for its evaluation. Results are given both at the memory reference level and at the memory organization level. A two-level shared-memory architecture model is presented together with an analysis of various solutions to the cache coherency problem. Finally, RAP-WAM shared-memory simulation results are presented. It is argued that the RAP-WAM model can exploit coherent caches and attain speeds in excess of 2 MLIPS with current shared-memory multiprocessing technology for real applications that exhibit medium degrees of parallelism. MCC  相似文献   

15.
计算机自动写作是人工智能领域的一个重要研究方向,现有方法大多都是基于一定的模板,生成行文较为单一的文章,没有对写作内容进行主题方面的提示和推荐,对文章修辞色彩的渲染就更少。为了使自动写作的文章更吸引人,可以将我们现实写作中使用的一些排比句根据主题和相似度的计算加入自动写作作品中,使得作品更加生动。文章主要研究规范文献资料的排比句自动抽取算法,以便抽取到的排比句作为语言素材有效应用于计算机自动写作。文章采用基于段内排比特征和段间排比特征的方法进行排比句的自动抽取,实验结果表明,本文方法抽取的准确率达到93%以上。  相似文献   

16.
As CMOS feature sizes continue to shrink and traditional microarchitectural methods for delivering high performance (e.g., deep pipelining) become too expensive and power-hungry, chip multiprocessors (CMPs) become an exciting new direction by which system designers can deliver increased performance. Exploiting parallelism in such designs is the key to high performance, and we find that parallelism must be exploited at multiple levels of the system: the thread-level parallelism that has become popular in many designs fails to exploit all the levels of available parallelism in many workloads for CMP systems. We describe the Cell Broadband Engine and the multiple levels at which its architecture exploits parallelism: data-level, instruction-level, thread-level, memory-level, and compute-transfer parallelism. By taking advantage of opportunities at all levels of the system, this CMP revolutionizes parallel architectures to deliver previously unattained levels of single chip performance. We describe how the heterogeneous cores allow to achieve this performance by parallelizing and offloading computation intensive application code onto the Synergistic Processor Element (SPE) cores using a heterogeneous thread model with SPEs. We also give an example of scheduling code to be memory latency tolerant using software pipelining techniques in the SPE. This paper is based in part on “Chip multiprocessing and the Cell Broadband Engine”, ACM Computing Frontiers 2006.  相似文献   

17.
Flat Concurrent Prolog (FCP) is a general purpose logic programming language designed for concurrent programming and parallel execution. Staring with a concise introduction of the language and its underlying computational model we describe how to implement a distributed FCP interpreter on a transputer environment using OCCAM. Basic techniques we used for exploiting and controlling parallelism are explained in terms of an abstract architecture. The result of mapping this abstract model on transputers is presented as concrete architecture. Substantial design issues are considered in detail.  相似文献   

18.
For pt.I. see ibid., p. 170-80. In pt.I, we presented a binding environment for the AND and OR parallel execution of logic programs. This environment was instrumental in rendering a compiler for the AND and OR parallel execution of logic programs machine independent. In this paper, we describe a compiler based on the Reduce-OR process model (ROPM) for the parallel execution of Prolog programs, and provide performance of the compiler on five parallel machines: the Encore Multimax, the Sequent Symmetry, the NCUBE 2, the Intel i860 hypercube and a network of Sun workstations. The compiler is part of a machine independent parallel Prolog development system built on top of a run time environment for parallel programming called the Chare kernel, and runs unchanged on these multiprocessors. In keeping with the objectives behind the ROPM, the compiler supports both on and independent AND parallelism in Prolog programs and is suitable for execution on both shared and nonshared memory machines. We discuss the performance of the Prolog compiler in some detail and describe how grain size can be used to deliver performance that is within 10% of the underlying sequential Prolog compiler on one processor, and scale linearly with increasing number of processors on problems exhibiting sufficient parallelism. The loose coupling between parallel and sequential components makes it possible to use the best available sequential compiler as the sequential component of our compiler  相似文献   

19.
A number of high‐level parallel programming platforms for networks of workstations (NOWs) have been developed in recent times. Most of these platforms target the exploitation of data parallelism in applications. They do not allow expressibility of applications as a collection of tasks along with their precedence relationships. As a result, the control or task parallelism in an application cannot be expressed or exploited. The current work aims at integrating the notion of task parallelism and precedence relationships among constituting tasks to such high‐level data parallel platforms for NOWs. Our model of integration provides for arbitrary nesting of data and task parallel modules. Also, the precedence relationships are clearly reflected from the program structure. The model relieves the programmer from the need to design applications for non‐determinism in the order of completion of constituting tasks. The design of the runtime support as well as system‐level book keeping is discussed. The model is general enough to be applied to a wide range of data parallel platforms. A specific case of integrating the model into anonymous remote computing (ARC), a data parallel programming platform, is presented. The performance related aspects are also discussed. Copyright © 2000 John Wiley & Sons, Ltd.  相似文献   

20.
Ming Hsiang Huang  Wuu Yang 《Software》2020,50(10):1877-1904
OpenACC is a directive-based programming model which allows programmers to write graphic processing unit (GPU) programs by simply annotating parallel loops. However, OpenACC has poor support for irregular nested parallel loops, which are natural choices to express nested parallelism. We propose PFACC, a programming model similar to OpenACC. PFACC directives can be used to annotate parallel loops and to guide data movement between different levels of memory hierarchy. Parallel loops can be arbitrarily nested or be placed inside functions that would be (possibly recursively) called in other parallel loops. The PFACC translator translates C programs with PFACC directives into CUDA programs by inserting runtime iteration-sharing and memory allocation routines. The PFACC runtime iteration-sharing routine is a two-level mechanism. Thread blocks dynamically organize loop iterations into batches and execute the batches in a depth-first order. Different thread blocks share iterations among one another with an iteration-stealing mechanism. PFACC generates CUDA programs with reasonable memory usage because of the depth-first execution order. The two-level iteration-sharing mechanism is implemented purely in software and fits well with the CUDA thread hierarchy. Experiments show that PFACC outperforms CUDA dynamic parallelism in terms of performance and code size on most benchmarks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号