期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An efficient implementation of parallel eigenvalue computation for massively parallel processing 总被引：4，自引：0，他引：4

Takahiro Katagiri Yasumasa Kanada 《Parallel Computing》2001,27(14):1831-1845

This paper describes an efficient implementation and evaluation of a parallel eigensolver for computing all eigenvalues of dense symmetric matrices. Our eigensolver uses a Householder tridiagonalization method, which has higher parallelism and performance than conventional methods when problem size is relatively small, e.g., the order of 10,000. This is very important for relevant practical applications, where many diagonalizations for such matrices are required so often. The routine was evaluated on the 1024 processors HITACHI SR2201, and giving speedup ratios of about 2–5 times as compared to the ScaLAPACK library on 1024 processors of the HITACHI SR2201. 相似文献

2.

A parallel sorting algorithm for a novel model of computation

Amitabha Das Louise E. Moser P. M. Melliar-Smith 《International journal of parallel programming》1991,20(5):403-419

The computational complexity of a parallel algorithm depends critically on the model of computation. We describe a simple and elegant rule-based model of computation in which processors apply rules asynchronously to pairs of objects from a global object space. Application of a rule to a pair of objects results in the creation of a new object if the objects satisfy the guard of the rule. The model can be efficiently implemented as a novel MIMD array processor architecture, the Intersecting Broadcast Machine. For this model of computation, we describe an efficient parallel sorting algorithm based on mergesort. The computational complexity of the sorting algorithm isO(nlog² n), comparable to that for specialized sorting networks and an improvement on theO(n ^1.5) complexity of conventional mesh-connected array processors. 相似文献

3.

A symbolic-numerical algorithm for the computation of matrix elements in the parametric eigenvalue problem

S. I. Vinitsky V. P. Gerdt A. A. Gusev M. S. Kaschiev V. A. Rostovtsev V. N. Samoilov T. V. Tyupikova O. Chuluunbaatar 《Programming and Computer Software》2007,33(2):105-116

A symbolic-numerical algorithm for the computation of the matrix elements in the parametric eigenvalue problem to a prescribed accuracy is presented. A procedure for calculating the oblate angular spheroidal functions that depend on a parameter is discussed. This procedure also yields the corresponding eigenvalues and the matrix elements (integrals of the eigenfunctions multiplied by their derivatives with respect to the parameter). The efficiency of the algorithm is confirmed by the computation of the eigenvalues, eigenfunctions, and the matrix elements and by the comparison with the known data and the asymptotic expansions for small and large values of the parameter. The algorithm is implemented as a package of programs in Maple-Fortran and is used for the reduction of a singular two-dimensional boundary value problem for the elliptic second-order partial differential equation to a regular boundary value problem for a system of second-order ordinary differential equations using the Kantorovich method. 相似文献

4.

An algorithm for the on-line computation of fourier spectra

《国际计算机数学杂志》2012,89(1-4):361-370

相似文献

5.

A parallel algorithm for 2-D DFT computation with no interprocessorcommunication

Gertner I. Rofheart M. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(3):377-382

A parallel algorithm is proposed for the two-dimensional discrete Fourier transform (2-D DFT) computation which eliminates interprocessor communications and uses only O(N) processors. The mapping of the algorithm onto architectures with broadcast and report capabilities is discussed. Expressions are obtained for estimating the speed performance on these machines as a function of the size N×N of the 2-D DFT, the bandwidth of the communications channel, the time for an addition, the time T( F_N) for a single processing element to perform an N-point DFT, and the degree of parallelism. For single I/O channel machines that are capable of exploiting the full degree of parallelism of the algorithm, attainable execution times are as low as the time T(F_N) plus the I/O time for data upload and download. An implementation on a binary tree computer is discussed 相似文献

6.

A blocked QR-decomposition for the parallel symmetric eigenvalue problem

T. Auckenthaler T. Huckle R. Wittmann 《Parallel Computing》2014

In this paper we present a new stable algorithm for the parallel QR-decomposition of “tall and skinny” matrices. The algorithm has been developed for the dense symmetric eigensolver ELPA, where the QR-decomposition of tall and skinny matrices represents an important substep. Our new approach is based on the fast but unstable CholeskyQR algorithm (Stathopoulos and Wu, 2002) [1]. We show the stability of our new algorithm and provide promising results of our MPI-based implementation on a BlueGene/P and a Power6 system. 相似文献

7.

A fully parallel method for the singular eigenvalue problem

《Computers & Mathematics with Applications》2005,49(7-8):1279-1284

In this paper, a fully parallel method for finding some or all finite eigenvalues of a real symmetric matrix pencil (A, B) is presented, where A is a symmetric tridiagonal matrix and B is a diagonal matrix with b₁ > 0 and b_i ≥ 0, i = 2,3,…,n. The method is based on the homotopy continuation with rank 2 perturbation. It is shown that there are exactly m disjoint, smooth homotopy paths connecting the trivial eigenvalues to the desired eigenvalues, where m is the number of finite eigenvalues of (A, B). It is also shown that the homotopy curves are monotonic and easy to follow. 相似文献

8.

Towards a single model of efficient computation in real parallel machines

Pilar de la Torre Clyde P Kruskal 《Future Generation Computer Systems》1992,8(4):395-408

We propose a model of parallel computation, the YPRAM, that allows general parallel algorithms to be designed for a wide class of parallel models. The basic model captures locality among processors, which is measured as a function of two parameters; latency and bandwidth.

We design YPRAM algorithms for solving several fundamental problems: parallel prefix, sorting, sorting numbers from a bounded range, and list ranking. We show that our model predicts, reasonably accurately, the actual known performances of several basic parallel models — PRAM, hypercube, mesh and tree — when solving these problems. 相似文献

9.

A parallel Lanczos method for symmetric generalized eigenvalue problems

Kesheng Wu Horst Simon 《Computing and Visualization in Science》1999,2(1):37-46

The Lanczos algorithm is a very effective method for finding extreme eigenvalues of symmetric matrices. In this paper, we present our parallel version of the Lanczos method for symmetric generalized eigenvalue problem, PLANSO. PLANSO is based on a sequential package called LANSO which implements the Lanczos algorithm with partial reorthogonalization. It is portable to all parallel machines that support MPI and it is easy to interface with most parallel computing packages. Through numerical experiments, we demonstrate that it achieves similar parallel efficiency as PARPACK, but uses considerably less time. Received: 21 January 1998 / Accepted: 10 June 1999 相似文献

10.

Semantics for data parallel computation

Michael D. Rice 《International journal of parallel programming》1990,19(6):477-509

相似文献

11.

Models for practical parallel computation

D. B. Skillicorn 《International journal of parallel programming》1991,20(2):133-158

A major reason for the lack of practical use of parallel computers has been the absence of a suitable model of parallel computation. Many existing models are either theoretical or are tied to a particular architecture. A more general model must be architecture independent, must realistically reflect execution costs, and must reduce the cognitive overhead of managing massive parallelism. A growing number of models meeting some of these goals have been suggested. We discuss their properties and relative strengths and weaknesses. We conclude that data parallelism is a style with much to commend it, and discuss the Bird-Meertens formalism as a coherent approach to data parallel programming.This work was supported by the Natural Sciences and Engineering Research Council of Canada. 相似文献

12.

A parallel LEGION algorithm and cell-based architecture for real time split and merge video segmentation

Pradipta Roy Prabir Kumar Biswas 《Journal of Real-Time Image Processing》2018,15(2):363-387

Split and merge segmentation is a popular region-based segmentation scheme for its robustness and computational efficiency. But it is hard to realize for larger size images or video frames in real time due to its iterative sequential data flow pattern. A quad-tree data structure is quite popular for software implementation of the algorithm, where a local parallelism is difficult to establish due to inherent data dependency between processes. In this paper, we have proposed a parallel algorithm of splitting and merging which depends only on local operations. The algorithm is mapped onto a hierarchical cell network, which is a parallel version of Locally Excitory Globally Inhibitory Oscillatory Network (LEGION). Simulation results show that the proposed design is faster than any of the standard split and merge algorithmic implementations, without compromising segmentation quality. The timing performance enhancement is manifested in its Finite State Machine based VLSI implementation in VIRTEX series FPGA platforms. We have also shown that, though segmentation qualitywise split-and-merge algorithm is little bit behind the state-of-the-art algorithms, computational speedwise it over performs those sophisticated and complex algorithms. Good segmentation performance with minimal computational cost enables the proposed design to tackle real time segmentation problem in live video streams. In this paper, we have demonstrated live PAL video segmentation using VIRTEX 5 series FPGA. Moreover, we have extended our design to HD resolution for which the time taken is less than 5 ms rendering a processing throughput of 200 frames per second. 相似文献

13.

The real two-zero algorithm: a parallel algorithm to reduce a realmatrix to a real Schur form

Mantharam M. Eberlein P.J. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(1):48-62

We introduce a new method to reduce a real matrix to a real Schur form by a sequence of similarity transformations that are 3D orthogonal transformations. Two significant features of this method are that: all the transformed matrices and all the computations are done in the real field; and it can be easily parallelized. We call the algorithm that uses this method the real two-zero (RTZ) algorithm. We describe both serial and parallel implementations of the RTZ algorithm. Our tests indicate that the rate of convergence to a real Schur form is quadratic for real near-normal matrices with real distinct eigenvalues. Suppose n is the order of a real matrix A. In order to choose a sequence of 3D orthogonal transformations on A, we need to determine some ordering on triples in T={(k,l,m)|1⩽k相似文献

14.

基于CUDA的大型实对称矩阵并行求逆算法

《计算机工程与设计》2015,(8)

相似文献

15.

A parallel approach to the modified Numerov-like eigenvalue determination for sturm-Liouville problems

G. Vanden Berghe H. De Meyer M. Van Daele 《Computers & Mathematics with Applications》1992,23(12):69-74

A modified Numerov-like eigenvalue algorithm, previously introduced, is parallelized. An inplementation of this algorithm on a Helios based parallel processing transputer system is discussed. Time savings with respect to a sequential approach are commented. 相似文献

16.

A parallel algorithm for generating combinations

C.-J. Lin 《Computers & Mathematics with Applications》1989,17(12):1523-1533

A parallel algorithm for generating all combinations of m items out of n given items in lexicographic order is presented. The computational model is a linear systolic array consisting of m identical processing elements. It takes (_mⁿ) time-steps to generate all the (_mⁿ) combinations. Since any processing element is identical and executes the same procedure, it is suitable for VLSI implementation. Based on mathematical induction, such algorithm is proved to be correct. 相似文献

17.

A parallel algorithm for approximate regularity

Laurence Boxer Russ Miller 《Information Processing Letters》2001,80(6):311-316

Spatial regularity amidst a seemingly chaotic image is often meaningful. Many papers in computational geometry are concerned with detecting some type of regularity via exact solutions to problems in geometric pattern recognition. However, real-world applications often have data that is approximate, and may rely on calculations that are approximate. Thus, it is useful to develop solutions that have an error tolerance.

A solution has recently been presented by Robins et al. [Inform. Process. Lett. 69 (1999) 189–195] to the problem of finding all maximal subsets of an input set in the Euclidean plane that are approximately equally-spaced and approximately collinear. This is a problem that arises in computer vision, military applications, and other areas. The algorithm of Robins et al. is different in several important respects from the optimal algorithm given by Kahng and Robins [Patter Recognition Lett. 12 (1991) 757–764] for the exact version of the problem. The algorithm of Robins et al. seems inherently sequential and runs in O(n^5/2) time, where n is the size of the input set. In this paper, we give parallel solutions to this problem. 相似文献

18.

A parallel algorithm for tiling problems 总被引：2，自引：0，他引：2

Takefuji Y. Lee Y.-C. 《Neural Networks, IEEE Transactions on》1990,1(1):143-145

A parallel algorithm for tiling with polyominoes is presented. The tiling problem is to pack polyominoes in a finite checkerboard. The algorithm using lxmxn processing elements requires O(1) time, where l is the number of different kinds of polyominoes on an mxn checkerboard. The algorithm can be used for placement of components or cells in a very large-scale integrated circuit (VLSI) chip, designing and compacting printed circuit boards, and solving a variety of two- or three-dimensional packing problems. 相似文献

19.

Semi-automatic process partitioning for parallel computation

Charles Koelbel Piyush Mehrotra John Van Rosendale 《International journal of parallel programming》1987,16(5):365-382

Automatic process partitioning is the operation of automatically rewriting an algorithm as a collection of tasks, each operating primarily on its own portion of the data, to carry out the computation in parallel. Hybrid shared memory systems provide a hierarchy of globally accessible memories. To achieve high performance on such machines one must carefully distribute the work and the data so as to keep the workload balanced while optimizing the access to nonlocal data. In this paper we consider a semi-automatic approach to process partitioning in which the compiler, guided by advice from the user, automatically transforms programs into such an interacting set of tasks. This approach is illustrated with a picture processing example written in BLAZE, which is transformed by the compiler into a task system maximizing locality of memory reference.Research supported by an IBM Graduate Fellowship.Research supported under NASA Contract No. 520-1398-0356.Research supported by NASA Contract No. NAS1-18107 while the last two authors were in residence at ICASE, NASA, Langley Research Center. 相似文献

20.

A structure-preserving Jacobi algorithm for quaternion Hermitian eigenvalue problems

Ru-Ru Ma Zhi-Gang Jia Zheng-Jian Bai 《Computers & Mathematics with Applications》2018,75(3):809-820

A new real structure-preserving Jacobi algorithm is proposed for solving the eigenvalue problem of quaternion Hermitian matrix. By employing the generalized JRS-symplectic Jacobi rotations, the new quaternion Jacobi algorithm can preserve the symmetry and JRS-symmetry of the real counterpart of quaternion Hermitian matrix. Moreover, the proposed algorithm only includes real operations without dimension-expanding and is generally superior to the state-of-the-art algorithm. Numerical experiments are reported to indicate its efficiency and accuracy. 相似文献