首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In this paper a computer memory system intended for storing an arbitrary sequence of multidimensional arrays is described. This memory system permits a parallel access to the cuts distinguished in the given array by fixing one of the coordinates and to the large set of parallelepipeds which are the same dimension subarrays of the given arrays.  相似文献   

2.
Possibilities of a programming environment that integrates the specificity of the different types of parallel computers are presented in the framework of computational structural mechanics. An extension of the development environment of the Finite Element code CASTEM 2000 has been realized to offer the user a global vision on all objects of the parallel application. To facilitate the implementation of parallel applications, this system hides data transfers between processors and allows a direct reuse of modules of the original sequential code. It is an object-based shared virtual memory system which allows a parallelism by data distribution (for non-structured data) or by control distribution; it is therefore well suited to “mechanic” parallelism. To validate this programming environment, domain decomposition techniques well suited to parallel computation have been used.  相似文献   

3.
A common time reference (i.e. global clock) is needed for observing the behavior of a distributed algorithm on a distributed computing system. The paper presents a pragmatic algorithm to build a global clock on any distributed system, which is optimal for homogeneous distributed memory parallel computers (DMPCs). In order to observe and sort concurrent events in common DMPCs, we need a global clock with a resolution finer than the message transfer time variance, which is better than what deterministic and fault-tolerant algorithms can obtain. Thus a statistical method is chosen as a building block to derive an original algorithm valid for any topology. Its main originality over related approaches is to cope with the problem of clock granularity in computing frequency offsets between local clocks to achieve a resolution comparable with the resolution of the physical clocks. This algorithm is particularly well suited for debugging distributed algorithms by means of trace recordings because after its acquisition step it does not induce message overhead: the perturbation induced on the execution remains as small as possible. It has been implemented on various DMPCs: Intel iPSC/2 hypercube and Paragon XP/S, Transputer-based networks and Sun networks, so we can provide some data about its behavior and performances on these DMPCs.  相似文献   

4.
5.
The role of multistage turbomachinery simulation in the development of propulsion system models is discussed. Particularly, the need for simulations with higher fidelity and faster turnaround time is highlighted. It is shown how such fast simulations can be used in engineering-oriented environments. The use of parallel processing to achieve the required turnaround times is discussed. Current work by several researchers in this area is summarized, as well as efforts at the NASA Lewis Research Center. The latter efforts are focused on implementing the average-passage turbomachinery model on MIMD, distributed memory parallel computers. Performance results are given for inviscid, single blade row and viscous, multistage applications on several parallel computers, including networked workstations.  相似文献   

6.
7.
《Parallel Computing》1999,25(13-14):2015-2037
Parallel computers have demonstrated their principle suitability for numerical simulation during the eighties and early nineties. In particular, they were able to provide a cost-effective means of achieving high performance computing (HPC) power. Even so, there was only a limited impact of this technology on industrial computing. In order to foster the take-up of this technology by industrial users, the European Commission launched a number of projects as part of the Esprit programme to parallelize commercial application programs, to demonstrate, document and disseminate the benefits of parallel architectures, and to explore the potential of parallel simulation in new application areas. Large-scale technology transfer initiatives such as Europort,1 Europort-D and Preparatory Support and Transfer Programme (PST) aimed at helping the industry in Europe to exploit the benefits of HPC, based on parallel computing, thus increasing their competitiveness. This paper gives a review on major activities and highlights their impact on industry by means of some selected examples.  相似文献   

8.
9.
We show that a multivariate homogeneous polynomial can be represented on a hypercube in such a way that sums, products and partial derivatives can be performed by massively parallel computers. This representation is derived from the theoretical results of Beauzamy-Bombieri-Enflo-Montgomery [1]. The norm associated with it, denoted by [·], is itself a very efficient tool: when products of polynomials are performed, the best constant in inequalities of the form [PQ]C[P][Q] are provided, and the extremal pairs (that is, the pairs of polynomials for which the product is as small as possible) can be identified.Supported by the C.N.R.S (France) and the N.S.F. (USA), by contracts E.T.C.A./C.R.E.A. No. 20351/90 and 20357/91 (Ministry of Defense, France), and by Research Contract EERP-FR 22, DIGITAL Eq. Corp.  相似文献   

10.
We estimate parallel complexity of several matrix computations under both Boolean and arithmetic machine models using deterministic and probabilistic approaches. Those computations include the evaluation of the inverse, the determinant, and the characteristic polynomial of a matrix. Recently, processor efficiency of the previous parallel algorithms for numerical matrix inversion has been substantially improved in (Pan and Reif, 1987), reaching optimum estimates up to within a logarithmic factor; that work, however, applies neither to the evaluation of the determinant and the characteristic polynomial nor to exact matrix inversion nor to the numerical inversion of ill-conditioned matrices. We present four new approaches to the solution of those latter problems (having several applications to combinatorial computations) in order to extend the suboptimum time and processor bounds of (Pan and Reif, 1987) to the case of computing the inverse, determinant, and characteristic polynomial of an arbitrary integer input matrix. In addition, processor efficient algorithms using polylogarithmic parallel time are devised for some other matrix computations, such as triangular and QR-factorizations of a matrix and its reduction to Hessenberg form.  相似文献   

11.
In a previous work we studied the concurrent implementation of a numerical model, CONDIFP, developed for the analysis of depth-averaged convection–diffusion problems. Initial experiments were conducted on the Intel Touchstone Delta System, using up to 512 processors and different problem sizes. As for other computation-intensive applications, the results demonstrated an asymptotic trend to unity efficiency when the computational load dominates the communication load. This paper relates some other numerical experiences, in both one and two space dimensions with various choices of initial and boundary conditions, carried out on the Intel Paragon XP/S Model L38 with the aim to illustrate the parallel solver versatility and reliability.  相似文献   

12.
《Parallel Computing》1997,23(14):2135-2142
This special issue on ‘regional weather models’ complements the October 1995 special issue on ‘climate and weather modeling’, which focused on global models. In this introduction we review the similarities and differences between regional and global atmospheric models. Next, the structure of regional models is described and we consider how the basic algorithms applied in these models influence the parallelization strategy. Finally, we give a brief overview of the eight articles in this issue and discuss some remaining challenges in the area of adapting regional weather models to parallel computers.  相似文献   

13.
An accurate and computationally efficient method for predicting the performance of a class of parallel computations running on concurrent systems is described. A parallel computation is modeled as a task system with precedence relationships expressed as a series-parallel directed acyclic graph. Resources in a concurrent system are modeled as service centers in a queuing network model. Using these two models as inputs, the method outputs predictions of expected execution time of the parallel computation and the concurrent system utilization. The method is validated against both detailed simulation and actual execution on a commercial multiprocessor. Using 100 test cases, the average error of the prediction when compared to simulation statistics is 1.7%, with a standard deviation of 1.5%; the maximum error is about 10%  相似文献   

14.
In parallel adaptive mesh refinement (AMR) computations the problem size can vary significantly during a simulation. The goal here is to explore the performance implications of dynamically varying the number of processors proportional to the problem size during simulation. An emulator has been developed to assess the effects of this approach on parallel communication, parallel runtime and resource consumption. The computation and communication models used in the emulator are described in detail. Results using the emulator with different AMR strategies are described for a test case. Results show for the test case, varying the number of processors, on average, reduces the total parallel communications overhead from 16 to 19% and improves parallel runtime time from 4 to 8%. These results also show that on average resource utilization improves more than 37%.  相似文献   

15.
Various proposals for networks of large numbers of processors are reviewed. Bottleneck problems arise in these networks with the flow of data between processors. Communication problems which can arise in practical situations are discussed and techniques for reducing bottlenecks are developed. Some simulation results are given for the binary n-cube.  相似文献   

16.
Parallel algorithms are presented for the Fourier pseudospectral method and parts of the Chebyshev pseudospectral method. Performance of these schemes is reported as implemented on the NCUBE hypercube. The problem to which these methods are applied is the time integration of the incompressible Navier-Stokes equations. Despite serious communication requirements, the efficiencies are high; e.g., 92% for a 1283 mesh on 1024 processors. Benchmark timings rival those of optimized codes on supercomputers.  相似文献   

17.
One of the essential problems in parallel computing is: Can SIMD machines handle asynchronous problems? This is a difficult, unsolved problem because of the mismatch between asynchronous problems and SIMD architectures. We propose a solution to let SIMD machines handle general asynchronous problems. Our approach is to implement a runtime support system which can run MIMD-like software on SIMD hardware. The runtime support system, named P kernel, is thread-based. There are two major advantages of the thread-based model. First, for application problems with irregular and/or unpredictable features, automatic scheduling can move some threads from overloaded processors to underloaded processors. Second, and more importantly, the granularity of threads can be controlled to reduce system overhead. The P kernel is also able to handle bookkeeping and message management, as well as to make these low-level tasks transparent to users. Substantial performance has been obtained on Maspar MP-1  相似文献   

18.
Assembly of large genomes from tens of millions of short genomic fragments is computationally demanding requiring hundreds of gigabytes of memory and tens of thousands of CPU hours. The advent of high throughput sequencing technologies, new gene-enrichment sequencing strategies, and collective sequencing of environmental samples further exacerbate this situation. In this paper, we present the first massively parallel genome assembly framework. The unique features of our approach include space-efficient and on-demand algorithms that consume only linear space, and strategies to reduce the number of expensive pairwise sequence alignments while maintaining assembly quality. Developed as part of the ongoing efforts in maize genome sequencing, we applied our assembly framework to genomic data containing a mixture of gene enriched and random shotgun sequences. We report the partitioning of more than 1.6 million fragments of over 1.25 billion nucleotides total size into genomic islands in under 2 h on 1024 processors of an IBM BlueGene/L supercomputer. We also demonstrate the effectiveness of the proposed approach for traditional whole genome shotgun sequencing and assembly of environmental sequences.  相似文献   

19.
This paper discusses the implementation of methods for the approximate calculation of multiple integrals on a SIMD parallel computer. Adaptive methods using polynomial integrating basic rules and Monte Carlo and number theoretic basic rules are considered with particular reference to implementation on the ICL DAP computer. Test results are given which compare serial and parallel versions of the same method.  相似文献   

20.
Three commercial systems are considered from a programmer's point of view. The three are the Intel iPSC, a network of Inmos transputers, and the Sequent Balance. The differences in overhead are examined by implementing a solution to the traveling-salesman problem on all three. The evaluation focuses on three major issues in parallel programming: (1) how execution is divided among processing elements and how it is controlled; (2) how data are shared; and (3) how events are synchronized. The experiences of the authors are presented and some specific as well as general conclusions are drawn  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号