首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Models of parallel computation :a survey and classification   总被引:5,自引:1,他引:5  
In this paper, the state-of-the-art parallel computational model research is reviewed. We will introduce various models that were developed during the past decades. According to their targeting architecture features, especially memory organization, we classify these parallel computational models into three generations. These models and their characteristics are discussed based on three generations classification. We believe that with the ever increasing speed gap between the CPU and memory systems, incorporating non-uniform memory hierarchy into computational models will become unavoidable. With the emergence of multi-core CPUs, the parallelism hierarchy of current computing platforms becomes more and more complicated. Describing this complicated parallelism hierarchy in future computational models becomes more and more important. A semi-automatic toolkit that can extract model parameters and their values on real computers can reduce the model analysis complexity, thus allowing more complicated models with more parameters to be adopted. Hierarchical memory and hierarchical parallelism will be two very important features that should be considered in future model design and research.  相似文献   

2.
A major reason for the lack of practical use of parallel computers has been the absence of a suitable model of parallel computation. Many existing models are either theoretical or are tied to a particular architecture. A more general model must be architecture independent, must realistically reflect execution costs, and must reduce the cognitive overhead of managing massive parallelism. A growing number of models meeting some of these goals have been suggested. We discuss their properties and relative strengths and weaknesses. We conclude that data parallelism is a style with much to commend it, and discuss the Bird-Meertens formalism as a coherent approach to data parallel programming.This work was supported by the Natural Sciences and Engineering Research Council of Canada.  相似文献   

3.
A fundamental understanding of the interplay between computation and I/O activities in parallel applications that manipulate huge amounts of data is critical to achieving good application performance, as well as correctly characterizing the workloads of large-scale high-performance parallel systems. We present a formal model of the behavior of CPU and I/O interactions in scientific applications, from which we derive various formulas that characterize application performance. Our model captures the I/O and CPU activity at different levels of granularity, where results from the model are shown to be in excellent agreement with measurement data from a set of I/O-intensive applications. Using the formulas from our model, which explicitly take I/O activity into account, we also present examples of possible applications of the model  相似文献   

4.
Spatiotemporal aggregate computation: a survey   总被引:3,自引:0,他引:3  
Spatiotemporal databases are becoming increasingly more common. Typically, applications modeling spatiotemporal objects need to process vast amounts of data. In such cases, generating aggregate information from the data set is more useful than individually analyzing every entry. In this paper, we study the most relevant techniques for the evaluation of aggregate queries on spatial, temporal, and spatiotemporal data. We also present a model that reduces the evaluation of aggregate queries to the problem of selecting qualifying tuples and the grouping of these tuples into collections on which an aggregate function is to be applied. This model gives us a framework that allows us to analyze and compare the different existing techniques for the evaluation of aggregate queries. At the same time, it allows us to identify opportunities for research on types of aggregate queries that have not been studied.  相似文献   

5.
Skillicorn  D.B. 《Computer》1990,23(12):38-50
The major parallel architecture classes are considered: single-instruction multiple-data (SIMD) computers, tightly coupled multiple-instruction multiple-data (MIMD) computers, hypercuboid computers and constant-valence MIMD computers. An argument that the PRAM model is universal over tightly coupled and hypercube systems, but not over constant-valence-topology, loosely coupled-system is reviewed, showing precisely how the PRAM model is too powerful to permit broad universality. Ways in which a model of computation can be restricted to become universal over less powerful architectures are discussed. The Bird-Meertens formalism (R.S. Bird, 1989), is introduced and it is shown how it is used to express computations in a compact way. It is also shown that the Bird-Meertens formalism is universal over all four architecture classes and that nontrivial restrictions of functional programming languages exist that can be efficiently executed on disparate architectures. The use of the Bird-Meertens formalism as the basis for a programming language is discussed, and it is shown that it is expressive enough to be used for general programming. Other models and programming languages with architecture-independent properties are reviewed  相似文献   

6.
Over the past decade, frog biodiversity has rapidly declined due to many problems including habitat loss and degradation, introduced invasive species, and environmental pollution. Frogs are greatly important to improve the global ecosystem and it is ever more necessary to monitor frog biodiversity. One way to monitor frog biodiversity is to record audio of frog calls. Various methods have been developed to classify these calls. However, to the best of our knowledge, there is still no paper that reviews and summarizes currently developed methods. This survey gives a quantitative and detailed analysis of frog call classification. To be specific, a frog call classification system consists of signal pre-processing, feature extraction, and classification. Signal pre-processing is made up of signal processing, noise reduction, and syllable segmentation. Following signal preprocessing, the next step is feature extraction, which is the most crucial step for improving classification performance. Features used for frog call classification are categorized into four types: (1) time domain and frequency domain features (we classify time domain and frequency domain features into one type because they are often combined together to achieve higher classification accuracy), (2) time-frequency features, (3) cepstral features, and (4) other features. For the classification step, different classifiers and evaluation criteria used for frog call classification are investigated. In conclusion, we discuss future work for frog call classification.  相似文献   

7.
8.
Models for concurrency: towards a classification   总被引:4,自引:0,他引:4  
Models for concurrency can be classified with respect to three relevant parameters: behaviour/ system, interleaving/noninterleaving, linear/branching time. When modelling a process, a choice concerning such parameters corresponds to choosing the level of abstraction of the resulting semantics.

In this paper, we move a step towards a classification of models for concurrency based on the parameters above. Formally, we choose a representative of any of the eight classes of models obtained by varying the three parameters, and we study the formal relationships between them using the language of category theory.  相似文献   


9.
The survey focuses on algebraic-grammatical models of parallel processes, representation of knowledge about classes of algorithms in terms of a variety of production systems (structured design grammars), and program generation tools.Translated from Kibernetika i Sistemnyi Analiz, No. 5, pp. 5–13, September–October, 1991.  相似文献   

10.
11.
12.
The purpose of this survey paper is to present a characterization of the techniques proposed up to now to model diffuse interreflections. Basic approaches are presented as well as the evolution of implementation strategies. In the conclusions, open problems are enumerated.  相似文献   

13.
To better understand the relationships between different models of parallel computation, we introduce a new computation system formulation and develop general notions of homomorphisms and isomorphisms between computation systems. This allows us to study relations between vector addition systems, vector replacement systems, Petri nets, and generalized Petri nets. Results in this paper that may be of particular interest include a long list of properties preserved under homomorphism, and constructions that show that vector replacement systems can be simulated by vector addition systems, and that generalized Petri nets can be emulated by Petri nets.  相似文献   

14.
We present a new method for computing solutions of conservation laws based on the use of cellular automata with the method of characteristics. The method exploits the high degree of parallelism available with cellular automata and retains important features of the method of characteristics. It yields high numerical accuracy and extends naturally to adaptive meshes and domain decomposition methods for perturbed conservation laws. We describe the method and its implementation for a Dirichlet problem with a single conservation law for the one-dimensional case.

Numerical results for the one-dimensional law with the classical Burgers nonlinearity or the Buckley-Leverett equation show good numerical accuracy outside the neighborhood of the shocks. The error in the area of the shocks is of the order of the mesh size. The algorithm is well suited for execution on both massively parallel computers and vector machines. We present timing results for an Alliant FX/8, Connection Machine Model 2, and CRAY X-MP.  相似文献   


15.
I. Margaria  A. R. Meo  M. Zacchi 《Calcolo》1979,16(3):305-333
The main goal of this paper is the introduction, from a theoretical point of view, of a model specifically studied for the optimization of horizontal microprograms but well suited also to the analysis of microprocessor systems. Moreover, an approach to the design of well-structured and correct parallel programs is given by studying a subset of our schemes (complex structures) and applying to it existing techniques in order to give an algorithm for the maximization of parallelism.  相似文献   

16.
This paper presents the design and the application of asynchronous models of parallel evolutionary algorithms. An overview of the existing parallel evolutionary algorithm (PEA) models and available implementations is given. We present new PEA models in the form of asynchronous algorithms and implicit parallelization, as well as experimental data on their efficiency. The paper also discusses the definition of speedup in PEAs and proposes an appropriate speedup measurement procedure. The described parallel EA algorithms are tested on problems with varying degrees of computational complexity. The results show good efficiency of asynchronous and implicit models compared to existing parallel algorithms.  相似文献   

17.
In this paper we discuss design principles to implement a set of linear algebra subroutines in a portable libraty for parallel computers. Our design supports reuse of code and easy adaption to new parallel programming paradigms or network configurations. The routines are supposed to be used in self-verifying algorithms. They therefore have to deliver a validated result of high accuracy.  相似文献   

18.
Automatic process partitioning is the operation of automatically rewriting an algorithm as a collection of tasks, each operating primarily on its own portion of the data, to carry out the computation in parallel. Hybrid shared memory systems provide a hierarchy of globally accessible memories. To achieve high performance on such machines one must carefully distribute the work and the data so as to keep the workload balanced while optimizing the access to nonlocal data. In this paper we consider a semi-automatic approach to process partitioning in which the compiler, guided by advice from the user, automatically transforms programs into such an interacting set of tasks. This approach is illustrated with a picture processing example written in BLAZE, which is transformed by the compiler into a task system maximizing locality of memory reference.Research supported by an IBM Graduate Fellowship.Research supported under NASA Contract No. 520-1398-0356.Research supported by NASA Contract No. NAS1-18107 while the last two authors were in residence at ICASE, NASA, Langley Research Center.  相似文献   

19.
The computational complexity of a parallel algorithm depends critically on the model of computation. We describe a simple and elegant rule-based model of computation in which processors apply rules asynchronously to pairs of objects from a global object space. Application of a rule to a pair of objects results in the creation of a new object if the objects satisfy the guard of the rule. The model can be efficiently implemented as a novel MIMD array processor architecture, the Intersecting Broadcast Machine. For this model of computation, we describe an efficient parallel sorting algorithm based on mergesort. The computational complexity of the sorting algorithm isO(nlog2 n), comparable to that for specialized sorting networks and an improvement on theO(n 1.5) complexity of conventional mesh-connected array processors.  相似文献   

20.
Quantum control plays a key role in quantum technology, in particular for steering quantum systems. As problem size grows exponentially with the system size, it is necessary to deal with fast numerical algorithms and implementations. We improved an existing code for quantum control concerning two linear algebra tasks: the computation of the matrix exponential and efficient parallelisation of prefix matrix multiplication.For the matrix exponential we compare three methods: the eigendecomposition method, the Padé method and a polynomial expansion based on Chebyshev polynomials. We show that the Chebyshev method outperforms the other methods both in terms of computation time and accuracy. For the prefix problem we compare the tree-based parallel prefix scheme, which is based on a recursive approach, with a sequential multiplication scheme where only the individual matrix multiplications are parallelised. We show that this fine-grain approach outperforms the parallel prefix scheme by a factor of 2–3, depending on parallel hardware and problem size, and also leads to lesser memory requirements.Overall, the improved linear algebra implementations not only led to a considerable runtime reduction, but also allowed us to tackle problems of larger size on the same parallel compute cluster.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号