首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The integration of vector computers into multiprocessor configurations allows the use of multiple high-speed processors in parallel for one program. There are two aspects which are considered to be important for an efficient use of multiprocessor configurations. First, the flexibility, speed and user friendliness of the available synchronization and communication primitives, and second, the user problems in detecting data dependencies and in translating programs correctly into the parallel form required by the system. This paper is intended to give an overview of our experiences in multitasking using up to four CPUs of a CRAY X-MP/48. The results gained by macrotasking and microtasking will be compared for program kernels and real-life application programs. Special attention is paid to the difficulties of using more than two CPUs in parallel.  相似文献   

2.
A hybrid granularity model is proposed for general concurrent solution. It is applied to the triangular factorization of a dense matrix ranging in size from 4 to 1024. Concurrency is achieved at two levels: (1) with small (micro) task granularity and (2) with large (blocked) task granularity. Relevance to a many-processor CRAY X-MP is demonstrated by simulation.  相似文献   

3.
On the multiprocessor vector-supercomputer CRAY X-MP, parallelism—beyond vectorization—can be exploited on the programming language level by two multitasking strategies: macrotasking and, more recently, microtasking. In this paper, multitasking results and experiences are presented which have been gained by applying these two implemented modes to linear-algebra and non-numerical algorithms as well as to a large fluid-flow simulation code. While comparing the concepts and realizations of macrotasking and microtasking, the features, tools, and problems of multitasking programming and the potential user benefit of these parallel processing techniques are discussed.  相似文献   

4.
The availability of a multiprocessor vector machine, such as the CRAY X-MP, along with large, fast secondary memory such as the CRAY SSD, opens new frontiers to numerical algorithm design for 3-D simulations. The 3-D seismic migration, which is of crucial importance in exploration seismology, will be studied as a model problem. The numerical model discussed in this paper employs an alternating direction implicit (ADI) Crank—Nicolson scheme which takes full advantage of the parallel architecture of the underlying machine. It is demonstrated that careful algorithm design can lead to a significant speedup of the calculation when more than one processor is used. The throughput times obtained in this study are an order of magnitude faster than some conventional approaches.  相似文献   

5.
We present a parallel Monte Carlo photon transport algorithm that insures the reproducibility of results. The important feature of this parallel implementation is the introduction of a pair of pseudo-random number generators. This pair of generators is structured in such a manner as to insure minimal correlation between the two sequences of pseudo-random numbers produced. We term this structure as a ‘pseudo-random tree’. Using this structure, we are able to reproduce results exactly in a asynchronous parallel processing environment. The algorithm tracks the history of photons as they interact with two carbon cylinders joined end to end. The algorithm was implemented on both a Denelcor HEP and a CRAY X-MP/48. We describe the algorithm and the pseudo-random tree structure and present speedup results of our implementation.  相似文献   

6.
Multitasking the conjugate gradient method on the CRAY X-MP/48   总被引:1,自引:0,他引:1  
We show how to efficiently implement the preconditioned conjugate gradient method on a four processors computer CRAY X-MP/48. We solve block tridiagonal systems using block preconditioners well suited to parallel computation. Numerical results are presented that exhibit nearly optimal speedup and high Mflops rates.  相似文献   

7.
The real benefit of structural optimization techniques is in the application of these techniques to large structures such as full vehicles or full aircraft. For these structures, however, the sequential computer's time and memory requirements prohibit solution. With the recent existence and rapid development of multi-processor computers, parallel processing of large-scale structural optimization problems is achievable. In this paper we discuss the parallel processing of structural optimization problems with parallel structural analysis on the Cray X-MP. Two different types of interface between the optimization and analysis routines are developed and tested.  相似文献   

8.
Requirements for tools analyzing the performance of parallel programs with respect to parallel and sequential parts, overhead, and load balance, as well as available tools for programs parallelized with Cray Microtasking or Autotasking are described.  相似文献   

9.
Memory interleaving and multiple access ports are the key to a high memory bandwidth in vector processing systems. Each of the active ports supports an independent access stream to memory among which access conflicts may arise. Such conflicts lead to a decrease in memory bandwidth and consequently to longer execution times.

We present some analytical results regarding the access in vector mode to an interleaved memory system. In order to demonstrate the practical effects of our analytical results we have done time measurements of some simple vector loops on a 2-CPU, 16-bank CRAY X-MP. By corresponding simulations we obtained the number and type of memory conflicts that were encountered.  相似文献   


10.
The ECMWF weather model runs daily as a time critical application. Acceptable elapsed times are achieved by multitasking the code on a CRAY X-MP/48. This is done at a high level giving rise to large tasks. Investigations have been carried out to tackle inefficiencies by microtasking at a low level so that the code can take advantage of any idle processors which may become available.  相似文献   

11.
Optimization of vector-intensive applications for the CRAY X-MP/Y-MP often requires arranging the operations to take full advantage of such architectural features as the memory system, independent memory ports, chaining, and independent functional units. Estimation of performance is not straightforward since many operations can occur concurrently. As a tool for making trades between vector algorithms, a method has been developed and used successfully at E-Systems Inc. to predict the execution time of a sequence of vector operations without resorting to actual code development. This method reduced our software development time, produced significantly more efficient code, and provided for a systematic approach to optimization. The performance estimation is generally accurate to within 10% and accounts for memory conflicts that result from fixed stride references.  相似文献   

12.
One of the prime considerations for high scalar performance in supercomputers is a low memory latency. With the increasing disparity between main memory and CPU clock speeds, the use of an intermediate memory in the hierarchy becomes necessary. In this paper, we present an intermediate memory structure called a programmable cache. A programmable cache exploits structural locality to decrease the average memory access time. We evaluate the concept of a programmable cache by using the vector registers in the CRAY X-MP and Y-MP supercomputers as a programmable cache. Our results indicate that a programmable cache can be used profitably to reduce the memory latency if the pattern of references to a data structure can be determined at compile time.The work of the first author was supported in part by NSF Grant CCR-8706722.  相似文献   

13.
The efficient use of MIMD computers calls for a careful choice of adequate algorithms as for an implementation taking into account the particular architecture. To demonstrate these facts, a parallel algorithm to find an approximate solution to the Euclidean Traveling Salesman Problem (ETSP) is presented. The algorithm is a parallelization of Karp's partitioning algorithm. It is a divide-and-conquer method for solving the ETSP approximately. Since the successor vertex to any vertex in the tour is usually a nearby vertex, the problem can be ‘geographically’ partitioned into subproblems which then can be solved independently. The resulting subtours can be combined into a single tour which is an approximate solution to the ETSP. The algorithm is implemented on a CRAY X-MP with two and four processors, and results using macrotasking and microtasking are presented.  相似文献   

14.
We consider the iterative solution of large sparse linear systems of equations arising from elliptic and parabolic partial differential equations in two or three space dimensions. Specifically, we focus our attention on nonsymmetric systems of equations whose eigenvalues lie on both sides of the imaginary axis, or whose symmetric part is not positive definite. This system of equation is solved using a block Kaczmarz projection method with conjugate gradient acceleration. The algorithm has been designed with special emphasis on its suitability for multiprocessors. In the first part of the paper, we study the numerical properties of the algorithm and compare its performance with other algorithms such as the conjugate gradient method on the normal equations, and conjugate gradient-like schemes such as ORTHOMIN(k), GCR(k) and GMRES(k). We also study the effect of using various preconditioners with these methods. In the second part of the paper, we describe the implementation of our algorithm on the CRAY X-MP/48 multiprocessor, and study its behavior as the number of processors is increased.  相似文献   

15.
Solving special tridiagonal systems often arise in the fields of engineering and science. This special tridiagonal system is diagonally dominant and circulant near-Toeplitz. This paper presents two fast vectorized algorithms for solving special tridiagonal systems. Both algorithms employ the matrix perturbation technique and have many computational advantages on vector supercomputer. The related error analysis are also given. Some experimental results are illustrated on vector uniprocessor of the CRAY X-MP EA/116se.  相似文献   

16.
17.
数据并行的性能分析   总被引:3,自引:1,他引:3       下载免费PDF全文
计算机是一种工具.作为工具,其应用的成功与否具有重要的意义.本文从应用的角度分析了数据并行处理方式下并行处理的性能.首先,本文建立了一个性能分析模型,之后,基于此模型,对影响并行处理性能的各因素进行了详细地分析.本文的分析结果对于并行算法的设计者和并行计算机系统的设计者均具有指导意义.  相似文献   

18.
We present a fast vector algorithm which solves tridiagonal linear equations by an optimum synthesis of the inherently recursive Gaussian elimination and the parallel though complex cyclic reduction. The idea is to perform an incomplete cyclic reduction to bring the dimension of the tridiagonal system efficiently below a characteristic size n* and then to solve the remaining system by Gaussian elimination. Extensive numerical experiments on the CYBER 205 and the CRAY X-MP computers reveal a maximum vector speedup of 13 and prove n* to reflect the architecture of the vector computer. The performance is further enhanced when a feq right-hand sides are treated simultaneously.  相似文献   

19.
关于不规则计算   总被引:1,自引:1,他引:1  
本文首先提出了不规则计算,说明了不规则计算的应用及其特征和意义,然后归纳了处理不规则计算的inspector/executor基本方法,并介绍了不规则计算的国内外研究现状,最后指出了当前不规则计算系统中存在的问题及解决方法。  相似文献   

20.
A serious limitation of the theory of P-completeness is that it fails to distinguish between those P-complete problems that do have polynomial speedup on parallel machines from those that don't. We introduce the notion of strict P-completeness and develop tools to prove precise limits on the possible speedups obtainable for a number of P-complete problems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号