共查询到20条相似文献,搜索用时 15 毫秒
1.
For pt.I. see ibid., p. 170-80. In pt.I, we presented a binding environment for the AND and OR parallel execution of logic programs. This environment was instrumental in rendering a compiler for the AND and OR parallel execution of logic programs machine independent. In this paper, we describe a compiler based on the Reduce-OR process model (ROPM) for the parallel execution of Prolog programs, and provide performance of the compiler on five parallel machines: the Encore Multimax, the Sequent Symmetry, the NCUBE 2, the Intel i860 hypercube and a network of Sun workstations. The compiler is part of a machine independent parallel Prolog development system built on top of a run time environment for parallel programming called the Chare kernel, and runs unchanged on these multiprocessors. In keeping with the objectives behind the ROPM, the compiler supports both on and independent AND parallelism in Prolog programs and is suitable for execution on both shared and nonshared memory machines. We discuss the performance of the Prolog compiler in some detail and describe how grain size can be used to deliver performance that is within 10% of the underlying sequential Prolog compiler on one processor, and scale linearly with increasing number of processors on problems exhibiting sufficient parallelism. The loose coupling between parallel and sequential components makes it possible to use the best available sequential compiler as the sequential component of our compiler 相似文献
2.
With the growing availability of multiprocessors, a great deal of attention has been given to executing Prolog in parallel. A question that naturally arises is how to execute standard sequential Prolog programs with side effects in parallel. The problem of performing side effects in AND parallel systems has been considered elsewhere. This paper presents a method that generates sequential semantics of side effect predicates in an OR parallel system. First, a general method is given for performing data side effects such as read and write. This method is then extended to control side effects such as asserta, assertz, and retract. Finally, a constant-time algorithm for performing cut is presented.The work of L. V. Kale was supported by the National Science Foundation under Grant NSF-CCR-8700988. The work of D. A. Padua and D. C. Sehr was supported in part by the National Science Foundation under Grant NSF-MIP-8410110, the Department of Energy under Grant DOE DE-FG02-85ER25001, and a donation from the IBM Corporation to the Center for Supercomputing Research and Development. D. C. Sehr holds a fellowship from the Office of Naval Research. 相似文献
3.
国产自主平台下可信执行环境的设计与实现 总被引:1,自引:0,他引:1
针对目前国产自主平台对应用程序执行缺乏有效安全管控的问题,根据可信计算基本原理,提出了一种国产自主平台下构建可信执行环境的方法。设计并实现了国产自主平台下的可信执行环境,在应用程序启动之前,完成对应用程序启动的可信验证,主动保护应用程序执行的安全性。实验表明该方法切实可行,具备执行效率高的优点,为国产自主平台下应用程序的安全执行提供了一种有效的安全管控措施。 相似文献
4.
5.
《Parallel and Distributed Systems, IEEE Transactions on》2002,13(2):167-178
Programs whose parallelism stems from multiple recursion form an interesting subclass of parallel programs with many practical applications. The highly irregular shape of many recursion trees makes it difficult to obtain good load balancing with small overhead. We present a system, called REAPAR, that executes recursive C programs in parallel on SMP machines. Based on data from a single profiling run of the program, REAPAR selects a load-balancing strategy that is both effective and efficient and it generates parallel code implementing that strategy. The performance obtained by REAPAR on a diverse set of benchmarks matches that published for much more complex systems requiring high-level problem-oriented explicitly parallel constructs. A case study even found REAPAR to be competitive to handwritten (low-level, machine-oriented) thread-parallel code 相似文献
6.
Optimization of parallel execution for multi-join queries 总被引:5,自引:0,他引:5
Ming-Syan Chen Yu P.S. Kun-Lung Wu 《Knowledge and Data Engineering, IEEE Transactions on》1996,8(3):416-428
We study the subject of exploiting interoperator parallelism to optimize the execution of multi-join queries. Specifically, we focus on two major issues: (1) scheduling the execution sequence of multiple joins within a query, and (2) determining the number of processors to be allocated for the execution of each join operation obtained in (1). For the first issue, we propose and evaluate by simulation several methods to determine the general join sequences, or bushy trees. Despite their simplicity, the heuristics proposed can lead to the general join sequences that significantly outperform the optimal sequential join sequence. The quality of the join sequences obtained by the proposed heuristics is shown to be fairly close to that of the optimal one. For the second issue, it is shown that the processor allocation for exploiting interoperator parallelism is subject to more constraints-such as execution dependency and system fragmentation-than those in the study of intraoperator parallelism for a single join. The concept of synchronous execution time is proposed to alleviate these constraints. Several heuristics to deal with the processor allocation, categorized by bottom-up and top-down approaches, are derived and are evaluated by simulation. The relationship between issues (1) and (2) is explored. Among all the schemes evaluated, the two-step approach proposed, which first applies the join sequence heuristic to build a bushy tree as if under a single processor system, and then, in light of the concept of synchronous execution time, allocates processors to execute each join in the bushy tree in a top-down manner, emerges as the best solution to minimize the query execution time 相似文献
7.
Simpson D.J. Burton F.W. 《IEEE transactions on pattern analysis and machine intelligence》1999,25(6):870-882
We model a deterministic parallel program by a directed acyclic graph of tasks, where a task can execute as soon as all tasks preceding it have been executed. Each task can allocate or release an arbitrary amount of memory (i.e., heap memory allocation can be modeled). We call a parallel schedule “space efficient” if the amount of memory required is at most equal to the number of processors times the amount of memory required for some depth-first execution of the program by a single processor. We describe a simple, locally depth-first scheduling algorithm and show that it is always space efficient. Since the scheduling algorithm is greedy, it will be within a factor of two of being optimal with respect to time. For the special case of a program having a series-parallel structure, we show how to efficiently compute the worst case memory requirements over all possible depth-first executions of a program. Finally, we show how scheduling can be decentralized, making the approach scalable to a large number of processors when there is sufficient parallelism 相似文献
8.
Georgia Penido Safe Claudionor Coelho Jr. Luiz Filipe M. Vieira Celina Gomes Do Val Jose Augusto Nacif Antonio Otavio Fernandes 《International Journal on Software Tools for Technology Transfer (STTT)》2012,14(1):95-108
Functional verification is “the” major design-phase bottleneck for silicon productivity. Since functional verification is
an NP-complete problem, it relies on a large number of heuristics with associated parameters (engines). With the advent of
parallel processing, formal verification can be optimized by selecting the best n engines to run in parallel, increasing the chance of reaching successful termination of the verification task. In this paper,
we will present a methodology to build engine estimators based on structural metrics and to select n engines to run in parallel. The methodology considers both engines’ estimated performance and engines’ correlation. Results
confirmed that the methodology can be a very quick selection mechanism for parallelization of engines in order to increase
the chance of running the best engines to solve the problem. 相似文献
9.
Yazia-Pekergin N. Vincent J.-M. 《IEEE transactions on pattern analysis and machine intelligence》1991,17(10):1005-1012
Stochastic bounds are obtained on execution times of parallel programs when the number of processors is unlimited. A parallel program is considered to consist of interdependent tasks with synchronization constraints. These constraints are described by an acyclic directed graph called a task graph. The execution times of tasks are considered to be independently identically distributed (i.i.d.) random variables. The performance measure of interest is the overall execution of the considered parallel program (task graph). Stochastic bound methods are applied to obtain lower and upper bounds on this measure. Another upper bound is obtained for parallel programs having `new better than used in expectation' (NBUE) random variables as task execution times. NBUE random variables are replaced with exponential random variables of the same mean to derive this upper bound 相似文献
10.
Jaejin Lee Jung-Ho Park Honggyu Kim Changhee Jung Daeseob Lim SangYong Han 《Journal of Parallel and Distributed Computing》2010
In simultaneous multithreading (SMT) multiprocessors, using all the available threads (logical processors) to run a parallel loop is not always beneficial due to the interference between threads and parallel execution overhead. To maximize the performance of a parallel loop on an SMT multiprocessor, it is important to find an appropriate number of threads for executing the parallel loop. This article presents adaptive execution techniques that find a proper execution mode for each parallel loop in a conventional loop-level parallel program on SMT multiprocessors. A compiler preprocessor generates code that, based on dynamic feedbacks, automatically determines at run time the optimal number of threads for each parallel loop in the parallel application. We evaluate our technique using a set of standard numerical applications and running them on a real SMT multiprocessor machine with 8 hardware contexts. Our approach is general enough to work well with other SMT multiprocessor or multicore systems. 相似文献
11.
Optimization of parallel query execution plans in XPRS 总被引:1,自引:0,他引:1
In this paper, we describe our approach to optimization of query execution plans in XPRS, a multiuser parallel database system based on a shared memory multiprocessor and a disk array. The main difficulties in this optimization problem are the compile-time unknown parameters such as available buffer size and number of free processors, and the enormous search space of possible parallel plans. We deal with these problems with a novel two phase optimization strategy which dramatically reduces the search space and allows run time parameters without significantly compromising plan optimality. In this paper we present our two phase optimization strategy and give experimental evidence from XPRS benchmarks that indicate that it almost always produces optimal or close to optimal plans. 相似文献
12.
13.
David Sánchez David Isern Ángel Rodríguez-Rozas Antonio Moreno 《Expert systems with applications》2011,38(6):6644-6656
Parallel computing has been radically evolving in the recent years from the supercomputer multi-processor centralised point of view to the modern distributed approaches such as grid computing. The availability of relatively obsolete or underused hardware and the increasing LAN and WAN interconnection speed have motivated the success of those new paradigms. In this paper, we propose the use of agent technology to improve the management, flexibility and reusability of grid-like parallel computing architectures. We present a general purpose agent-based architecture which is able to manage and execute independent parallel tasks through one or several heterogeneous computer networks – or even Internet nodes – exploiting state of the art agent mobility capabilities. A particular application of the proposed architecture to support the execution of a complex knowledge acquisition task is also introduced, showing a high scalability and a very low overhead. 相似文献
14.
Cameron E.J. Cohen D.M. Gopinath B. Keese W.M. II Ness L. Uppaluru P. Vollaro J.R. 《IEEE transactions on pattern analysis and machine intelligence》1988,14(3):317-326
The IC* project is an effort to create an environment for the design, specification, and development of complex systems such as communication protocols, parallel machines, and distributed systems. The basis of the project is the IC* model of parallel computation, in which a system is specified by a set of invariant expressions which describe its behavior in time. The features of this model include temporal and structural constraints, inherent parallelism, explicit modeling of time, nondeterministic evolution, and dynamic activation. The project also includes the construction of a parallel computer specifically designed to support the model of computation. The authors discuss the IC* model and the current user language, and describe the architecture and hardware of the prototype supercomputer built to execute IC* programs 相似文献
15.
16.
The large number of processing elements in current parallel systems necessitates the development of more comprehensive and realistic tools for the scalability analysis of algorithms on those architectures. This paper presents a simple analytical tool with which to study the scalability of parallel algorithm-architecture combinations. Our practical method studies separately execution time, efficiency, and memory usage in the accuracy-critical scaling model, where the problem size-input data set size-increases with the number of processors, which is the relevant one in many situations. The paper defines quantitative and qualitative measurements of the scalability and derives important relationships between execution time and efficiency. For example, results show that the best way to scale the system (to deteriorate as little as possible the properties of the system) is by maintaining constant execution time. These analytical results are verified with one candidate application for massive parallel computers: the full multigrid method. We study the scalability of a general d-dimensional full multigrid method on an r-dimensional mesh of processors. The analytical expressions are verified through experimental results obtained by implementing the full multigrid method on a Transputer-based machine and on the CRAY T3D 相似文献
17.
An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods.
Recommended by: Patrick Valduriez 相似文献
18.
并行结构骨架理论提供了一种描述并行程序设计模式的通用模型,对设计模式进行更高层次的抽象,能有效解决基于设计模式的并行程序设计方法的局限性问题,降低并行程序设计开发难度.基于并行结构骨架的并行程序设计环境--PASBPE在并行结构骨架理论的基础上,使用参数化设置快速生成用户所需并行程序框架,同时通过可视化的程序设计交互环境,简化并行程序的开发过程,提高开发效率. 相似文献
19.
Anna Feriani Alberto Franchi Francesco Genna 《Computer Methods in Applied Mechanics and Engineering》1996,130(3-4):289-298
We discuss solution schemes for the incremental elastic-plastic structural problem, discretized by means of the Finite Element method. Attention is focused on their formulation and implementation in a parallel computing environment defined by a cluster of workstations connected by means of a network. The availability of parallel computers allows one to consider possible formulations and solution strategies so far not considered competitive with the classical Newton-like schemes implying the definition of an elastic-plastic tangent stiffness matrix. The solution strategies herein considered are based on the explicit integration of the actual elastic-plastic rate problem. This, in turn, is phrased in terms of two different formulations, whose relative advantages—particularly with respect to their integration in parallel—are discussed. A
− gl (displacemen plastic multiplier) formulation of the structural rate theory of plasticity [1], integrated by means of an explicit, element-by-element scheme, seems to be the most promising one. 相似文献
20.
Edward J. Krall Patrick F. McGehearty 《International journal of parallel programming》1986,15(1):5-32
We report on a case study of the potentials for parallel execution of the inference engine of EMYCIN, a rule-based expert system. Multilisp, which supports parallel execution of tasks by means of thefuture construct, is used to implement the parallel version of the backwards-chaining inference engine. The study uses explicit specification of parallel execution and synchronization to attain parallel execution. It suggests some general techniques for obtaining parallel execution in expert systems and other applications. 相似文献