共查询到20条相似文献,搜索用时 15 毫秒
1.
To better understand the relationships between different models of parallel computation, we introduce a new computation system formulation and develop general notions of homomorphisms and isomorphisms between computation systems. This allows us to study relations between vector addition systems, vector replacement systems, Petri nets, and generalized Petri nets. Results in this paper that may be of particular interest include a long list of properties preserved under homomorphism, and constructions that show that vector replacement systems can be simulated by vector addition systems, and that generalized Petri nets can be emulated by Petri nets. 相似文献
2.
Domagoj Jakobović Marin Golub Marko Čupić 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2014,18(6):1225-1236
This paper presents the design and the application of asynchronous models of parallel evolutionary algorithms. An overview of the existing parallel evolutionary algorithm (PEA) models and available implementations is given. We present new PEA models in the form of asynchronous algorithms and implicit parallelization, as well as experimental data on their efficiency. The paper also discusses the definition of speedup in PEAs and proposes an appropriate speedup measurement procedure. The described parallel EA algorithms are tested on problems with varying degrees of computational complexity. The results show good efficiency of asynchronous and implicit models compared to existing parallel algorithms. 相似文献
3.
The major parallel architecture classes are considered: single-instruction multiple-data (SIMD) computers, tightly coupled multiple-instruction multiple-data (MIMD) computers, hypercuboid computers and constant-valence MIMD computers. An argument that the PRAM model is universal over tightly coupled and hypercube systems, but not over constant-valence-topology, loosely coupled-system is reviewed, showing precisely how the PRAM model is too powerful to permit broad universality. Ways in which a model of computation can be restricted to become universal over less powerful architectures are discussed. The Bird-Meertens formalism (R.S. Bird, 1989), is introduced and it is shown how it is used to express computations in a compact way. It is also shown that the Bird-Meertens formalism is universal over all four architecture classes and that nontrivial restrictions of functional programming languages exist that can be efficiently executed on disparate architectures. The use of the Bird-Meertens formalism as the basis for a programming language is discussed, and it is shown that it is expressive enough to be used for general programming. Other models and programming languages with architecture-independent properties are reviewed 相似文献
4.
A. G. Malyshkin 《Problems of Information Transmission》2006,42(3):234-250
We study limit dynamics of a system of interacting particles, which is one of possible models for the parallel and distributed computation process. For a rather wide class of multi-particle interactions, we prove that the stochastic process describing the configuration of a particle system weakly converges in the fluid-dynamic limit to a deterministic process, which is a solution of a certain partial differential equation. 相似文献
5.
D. B. Skillicorn 《International journal of parallel programming》1991,20(2):133-158
A major reason for the lack of practical use of parallel computers has been the absence of a suitable model of parallel computation. Many existing models are either theoretical or are tied to a particular architecture. A more general model must be architecture independent, must realistically reflect execution costs, and must reduce the cognitive overhead of managing massive parallelism. A growing number of models meeting some of these goals have been suggested. We discuss their properties and relative strengths and weaknesses. We conclude that data parallelism is a style with much to commend it, and discuss the Bird-Meertens formalism as a coherent approach to data parallel programming.This work was supported by the Natural Sciences and Engineering Research Council of Canada. 相似文献
6.
7.
8.
We present a new method for computing solutions of conservation laws based on the use of cellular automata with the method of characteristics. The method exploits the high degree of parallelism available with cellular automata and retains important features of the method of characteristics. It yields high numerical accuracy and extends naturally to adaptive meshes and domain decomposition methods for perturbed conservation laws. We describe the method and its implementation for a Dirichlet problem with a single conservation law for the one-dimensional case.
Numerical results for the one-dimensional law with the classical Burgers nonlinearity or the Buckley-Leverett equation show good numerical accuracy outside the neighborhood of the shocks. The error in the area of the shocks is of the order of the mesh size. The algorithm is well suited for execution on both massively parallel computers and vector machines. We present timing results for an Alliant FX/8, Connection Machine Model 2, and CRAY X-MP. 相似文献
9.
10.
Charles Koelbel Piyush Mehrotra John Van Rosendale 《International journal of parallel programming》1987,16(5):365-382
Automatic process partitioning is the operation of automatically rewriting an algorithm as a collection of tasks, each operating primarily on its own portion of the data, to carry out the computation in parallel. Hybrid shared memory systems provide a hierarchy of globally accessible memories. To achieve high performance on such machines one must carefully distribute the work and the data so as to keep the workload balanced while optimizing the access to nonlocal data. In this paper we consider a semi-automatic approach to process partitioning in which the compiler, guided by advice from the user, automatically transforms programs into such an interacting set of tasks. This approach is illustrated with a picture processing example written in BLAZE, which is transformed by the compiler into a task system maximizing locality of memory reference.Research supported by an IBM Graduate Fellowship.Research supported under NASA Contract No. 520-1398-0356.Research supported by NASA Contract No. NAS1-18107 while the last two authors were in residence at ICASE, NASA, Langley Research Center. 相似文献
11.
《Journal of Parallel and Distributed Computing》1987,4(3):309-318
A parallel algorithm to generate optimum test-and-treatment decision trees is presented. Constructing such trees is NP-hard. The algorithm is designed for a machine whose number of connections is 3p/2, where p is the number of processing elements (PEs), and where the PEs are simple enough such that a machine with 220 PEs is currently implementable and a 230 PE machine is feasible. A speedup of O(p/(s log p)) over a sequential dynamic programming algorithm for this task is achieved, by paying careful attention to the communication problem, where s bits are used to represent costs. This algorithm is realized on the Boolean Vector Machine, a cube-connected-cycle system with 220 PEs which is currently running in prototype form, with 512 PEs. The algorithm is concrete, in that all processor allocation and other control issues have been solved. The particular NP-hard problem is of independent interest; it generalizes the binary testing problem by introducing treatments on an equal basis with tests. Applications of this test-and-treatment problem are found in medical diagnosis, systematic biology, machine fault location, laboratory analysis, and many other fields. 相似文献
12.
An efficient implementation of parallel eigenvalue computation for massively parallel processing 总被引:4,自引:0,他引:4
This paper describes an efficient implementation and evaluation of a parallel eigensolver for computing all eigenvalues of dense symmetric matrices. Our eigensolver uses a Householder tridiagonalization method, which has higher parallelism and performance than conventional methods when problem size is relatively small, e.g., the order of 10,000. This is very important for relevant practical applications, where many diagonalizations for such matrices are required so often. The routine was evaluated on the 1024 processors HITACHI SR2201, and giving speedup ratios of about 2–5 times as compared to the ScaLAPACK library on 1024 processors of the HITACHI SR2201. 相似文献
13.
提出一种面向网格的基于消息通信方式的二级计算模型以求解问题。将二级模型与思维进化机器学习以及空间分解技术相结合(思维进化与空间分解并行演化计算-PMEBML-SP),采用多种通信模型实现处理器间负载均衡、支持网格动态资源分配等功能,最后在上海高校网格E网格计算应用平台上实例验证。 相似文献
14.
近程作用分子动力学模拟的两级并行 总被引:1,自引:1,他引:0
分子动力学作为一种重要的计算手段在许多领域有着广泛的应用,由于它的计算量比较庞大,因此并行计算方法被越来越多地引入到分子动力学的模拟中。本文在目前常见的SMP集群系统上,根据系统的结构特点,针对分子动力学的三种并行算法:区域分解法、原子分解法和力分解法,利用MPI Pthread的混合编程模型,采用节点间消息传递模式以及节点内部共享存储的编程模式,实现了近程作用分子动力学的两级并行计算。计算结果表明,不同的算法采用了两级并行的方式和原来只有消息传递的并行方式相比,具有不同的计算效率,但是从总体来说采用两级并行的计算方式可以利用更多的计算资源,从而有助于提高计算能力。 相似文献
15.
16.
17.
Tetsuro Nishino 《New Generation Computing》2002,20(4):317-337
In this paper, we introduce two mathematical models of realistic quantum computation. First, we develop a theory of bulk quantum
computation such as NMR (Nuclear Magnetic Resonance) quantum computation. For this purpose, we define bulk quantum Turing
machine (BQTM for short) as a model of bulk quantum computation. Then, we define complexity classes EBQP, BBQP and ZBQP as
counterparts of the quantum complexity classes EQP, BQP and ZQP, respectively, and show that EBQP=EQP, BBQP=BQP and ZBQP=ZQP.
This implies that BQTMs are polynomially related to ordinary QTMs as long as they are used to solve decision problems. We
also show that these two types of QTMs are also polynomially related when they solve a function problem which has a unique
solution. Furthermore, we show that BQTMs can solve certain instances of NP-complete problems efficiently.
On the other hand, in the theory of quantum computation, only feed-forward quantum circuits are investigated, because a quantum
circuit represents a sequence of applications of time evolution operators. But, if a quantum computer is a physical device
where the gates are interactions controlled by a current computer such as laser pulses on trapped ions, NMR and most implementation
proposals, it is natural to describe quantum circuits as ones that have feedback loops if we want to visualize the total amount
of the necessary hardware. For this purpose, we introduce a quantum recurrent circuit model, which is a quantum circuit with
feedback loops. LetC be a quantum recurrent circuit which solves the satisfiability problem for a blackbox Boolean function includingn variables with probability at least 1/2. And lets be the size ofC (i.e. the number of the gates inC) andt be the number of iterations that is needed forC to solve the satisfiability problem. Then, we show that, for those quantum recurrent circuits, the minimum value ofmax(s, t) isO(n
22
n/3).
Tetsuro Nishino, D.Sc.: He is presently an Associate Professor in the Department of Information and Communication Engineering, The University of
Electro-Communications. He received the B.S., M.S. and D.Sc degrees in mathematics from Waseda University, in 1982, 1984 and
1991 respectively. From 1984 to 1987, he joined Tokyo Research Laboratory, IBM Japan. From 1987 to 1992, he was a Research
Associate of Tokyo Denki University, and from 1992 to 1994, he was an Associate Professor of Japan Advanced Institute of Science
and Technology, Hokuriku. His main interests are circuit complexity theory, computational learning theory and quantum complexity
theory. 相似文献
18.
This paper presents the design and implementation of an efficient reconfigurable parallel prefix computation hardware on field-programmable
gate arrays (FPGAs). The design is based on a pipelined dataflow algorithm, and control logic is added to reconfigure the
system for arbitrary parallelism degree. The system receives multiple input streams of elements in parallel and produces output
streams in parallel. It has an advantage of controlling the degree of parallelism explicitly at run time. The time complexity
of the design is O(d+(N−d)/d), where d and N are parallelism degree and stream size, respectively. When the stream size is sufficiently larger than the initial trigger
time of the pipeline (d), the time complexity becomes O(N/d). Unlike the prefix computation circuits found in the literature, the design is scalable for different problem sizes including
unknown sized data. The design is modular based on a finite state machine, and implemented and tested for target FPGA devices
Xilinx Spartan2S XC2S300EFT256-6Q and XC2S600EFG676-6.
相似文献
H. K. DaiEmail: |
19.
This research defines and analyzes a methodology for deriving a performance model for SPMD hybrid parallel applications. Hybrid parallelism combines shared memory and message passing computing models. This work extends the current practice of application performance modelling by development of a methodology for hybrid applications with these procedures.
- Creation of a model based on complexity analysis of an application code and its data structures.
- Enhancement of a static complexity model by dynamic factors to capture execution time phenomena, such as memory hierarchy effects.
- Quantitative analysis of model characteristics and the effects of perturbations in measured parameters.
20.
The crossed cube architecture for parallel computation 总被引:4,自引:0,他引:4
The construction of a crossed cube which has many of the properties of the hypercube, but has diameter only about half as large, is discussed. This network is self-routing, in the sense that there is a simple distributed routing algorithm which guarantees optimal paths between any pair of vertices. This fact, together with other properties such as regularity, symmetry, high connectivity, and a simple recursive structure, suggests that the crossed cube may be an attractive alternative to the ordinary hypercube for massively parallel architectures, SIMD algorithms, which utilize the architecture are developed, and it is shown that the CQ n architecture can profitably emulate the ordinary hypercube. It is also shown that addition of simple switches can improve the capabilities of the system significantly. For instance, the dynamic reconfiguration capability allows hypercube algorithms to be executed on the proposed architecture. The use of these switches also improves the embedding properties of the system 相似文献