期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

F. Hossfeld 《Parallel Computing》1988,7(3):373-385

Today, the field of high-speed computers and supercomputing applications is dominated by the vector-processor architecture. This paper gives a survey on the architectural principles of vector computers like segmentation, pipelining, and chaining as well as on the spectrum of real systems available in the market. It illuminates the potentiality and the limitations of vectorization strategies. Recent developments towards multi-vectorcomputer systems give impact to new supercomputing concepts balancing vectorization versus parallel computation by exploiting multitasking principles. Covering a wide spectrum of applications vector-supercomputers are making relevant contributions to the progress in scientific research and technology. 相似文献

2.

A projection method for solving nonsymmetric linear systems on multiprocessors

Chandrika Kamath Ahmed Sameh 《Parallel Computing》1989,9(3):291-312

We consider the iterative solution of large sparse linear systems of equations arising from elliptic and parabolic partial differential equations in two or three space dimensions. Specifically, we focus our attention on nonsymmetric systems of equations whose eigenvalues lie on both sides of the imaginary axis, or whose symmetric part is not positive definite. This system of equation is solved using a block Kaczmarz projection method with conjugate gradient acceleration. The algorithm has been designed with special emphasis on its suitability for multiprocessors. In the first part of the paper, we study the numerical properties of the algorithm and compare its performance with other algorithms such as the conjugate gradient method on the normal equations, and conjugate gradient-like schemes such as ORTHOMIN(k), GCR(k) and GMRES(k). We also study the effect of using various preconditioners with these methods. In the second part of the paper, we describe the implementation of our algorithm on the CRAY X-MP/48 multiprocessor, and study its behavior as the number of processors is increased. 相似文献

3.

Inner/outer iterative methods and numerical Schwarz algorithms

Garry Rodrigue 《Parallel Computing》1985,2(3):205-218

Variants of the numerical Schwarz algorithms for solving elliptic partial differential equations on multiprocessing systems are described and analyzed. the methods are described in terms of domain decomposition techniques and mathematically cast into an inner/outer iterative form. It is shown that under certain matrix nonnegativity conditions that the convergence rate of the global iteration is invariant to the amount of overlap of the subdomains. 相似文献

4.

Task granularity studies on a many-processor CRAY X-MP

D.A. Calahan 《Parallel Computing》1985,2(2):109-118

A hybrid granularity model is proposed for general concurrent solution. It is applied to the triangular factorization of a dense matrix ranging in size from 4 to 1024. Concurrency is achieved at two levels: (1) with small (micro) task granularity and (2) with large (blocked) task granularity. Relevance to a many-processor CRAY X-MP is demonstrated by simulation. 相似文献

5.

Multitasking the calculation of angular integrals on the CRAY-2 and CRAY X-MP

C. Froese Fischer

N. S. Scott

J. Yoo 《Parallel Computing》1988,8(1-3):385-390

At the first VAPP conference attention was drawn to the difficulty of calculating angular integrals on the CRAY-1. In this paper we describe how multitasking on the CRAY-2 and CRAY X-MP can be exploited to improve the efficiency of the calculation of angular integrals. Timings for the CRAY-2 and CRAY X-MP are presented. One surprising result is that for this application the CRAY X-MPis faster than the CRAY-2 in both unitasking and multitasking modes. 相似文献

6.

The approximate solution of the Euclidean traveling salesman problem on a CRAY X-MP

Renate Gurke 《Parallel Computing》1988,8(1-3):177-183

The efficient use of MIMD computers calls for a careful choice of adequate algorithms as for an implementation taking into account the particular architecture. To demonstrate these facts, a parallel algorithm to find an approximate solution to the Euclidean Traveling Salesman Problem (ETSP) is presented. The algorithm is a parallelization of Karp's partitioning algorithm. It is a divide-and-conquer method for solving the ETSP approximately. Since the successor vertex to any vertex in the tour is usually a nearby vertex, the problem can be ‘geographically’ partitioned into subproblems which then can be solved independently. The resulting subtours can be combined into a single tour which is an approximate solution to the ETSP. The algorithm is implemented on a CRAY X-MP with two and four processors, and results using macrotasking and microtasking are presented. 相似文献

7.

Modelling, measurement, and simulation of memory interference in the CRAY X-MP

W. Oed O. Lange 《Parallel Computing》1986,3(4):343-358

Memory interleaving and multiple access ports are the key to a high memory bandwidth in vector processing systems. Each of the active ports supports an independent access stream to memory among which access conflicts may arise. Such conflicts lead to a decrease in memory bandwidth and consequently to longer execution times.

We present some analytical results regarding the access in vector mode to an interleaved memory system. In order to demonstrate the practical effects of our analytical results we have done time measurements of some simple vector loops on a 2-CPU, 16-bank CRAY X-MP. By corresponding simulations we obtained the number and type of memory conflicts that were encountered. 相似文献

8.

Lattice quantum hadrodynamics on a CRAY Y-MP

Joachim Frank Siegfried Knecht 《The Journal of supercomputing》1992,6(3-4):195-209

Quantum corrections to the mean-field equation of state for nuclear matter are estimated in a lattice simulation of quantum hadrodynamics on a CRAY Y-MP. In contrast with lattice quantum chromodynamics, where coordinate space methods are the standard, the calculations are carried out in momentum space and on nonhypercubic (irregular) lattices. The quantum corrections to the known, mean-field equation of state were found to be considerable. The time frame of the project and the large computational needs of the program required the use of powerful supercomputers, like the CRAY Y-MP, which are capable of performing at a very high computing speed by using both vector and parallel hardware, the latter being exploited by means of autotasking. The paper describes the applied analytical and the numerical methods as well as the changes needed for the program to be executed in parallel. After some code modifications a very efficient version could be obtained on a CRAY Y-MP8/832, leading to an overall performance of 2.13 gigaflops. 相似文献

9.

Parallel vector processing of multidimensional orthogonal transforms for digital signal processing applications

Mohamed El-Sharkawy Wenlong Tsang Maurice Aburdene 《Multidimensional Systems and Signal Processing》1990,1(2):199-216

This paper presents vector and parallel algorithms and implementations of one- and two-dimensional orthogonal transforms. The speed performances are evaluated on Cray X-MP/48 vector computer. The sinusoidal orthogonal transforms are computed using fast real Fourier transform (FFT) kernel. The non-sinusoidal orthogonal transform algorithms are derived by using direct factorizations of transform matrices. Concurrent processing is achieved by using the multitasking capability of Cray X-MP/48 to transform long data vectors and two-dimensional data vectors. The discrete orthogonal transforms discussed in this paper include: Fourier transform (DFT), cosine transform (DCT), sine transform (DST), Hartley transform (DHT), Walsh transform (DWHT) and Hadamard transform (DHDT). The factors affecting the speedup of vector and parallel processing of these transforms are considered. The vectorization techniques are illustrated by an FFT example.This work is supported in part by the National Science Foundation, Pittsburgh Supercomputing Center (grant number ECS-880012P) and by the PEW Science Education Program. 相似文献

10.

The use of intermediate memories for low-latency memory access in supercomputer scalar units

Gurindar S. Sohi Wei-Chung Hsu 《The Journal of supercomputing》1990,4(1):5-21

One of the prime considerations for high scalar performance in supercomputers is a low memory latency. With the increasing disparity between main memory and CPU clock speeds, the use of an intermediate memory in the hierarchy becomes necessary. In this paper, we present an intermediate memory structure called a programmable cache. A programmable cache exploits structural locality to decrease the average memory access time. We evaluate the concept of a programmable cache by using the vector registers in the CRAY X-MP and Y-MP supercomputers as a programmable cache. Our results indicate that a programmable cache can be used profitably to reduce the memory latency if the pattern of references to a data structure can be determined at compile time.The work of the first author was supported in part by NSF Grant CCR-8706722. 相似文献