共查询到20条相似文献,搜索用时 15 毫秒
1.
A new SIMD-model of parallel computations is proposed, in which the processors are interconnected by a controlled tree-like communication circuit. For the problems considered, an architecture is proposed that is more effective than that for well-known models. Translated from Kibernetika i Sistemnyi Analiz, No. 5, pp. 175–180, September–October, 1999. 相似文献
2.
Parallel accelerators are playing an increasingly important role in scientific computing. However, it is perceived that their weakness nowadays is their reduced “programmability” in comparison with traditional general-purpose CPUs. For the domain of dense linear algebra, we demonstrate that this is not necessarily the case. We show how the libflame library carefully layers routines and abstracts details related to storage and computation, so that extending it to take advantage of multiple accelerators is achievable without introducing platform specific complexity into the library code base. We focus on the experience of the library developer as he develops a library routine for a new operation, reduction of a generalized Hermitian positive definite eigenvalue problem to a standard Hermitian form, and configures the library to target a multi-GPU platform. It becomes obvious that the library developer does not need to know about the parallelization or the details of the multi-accelerator platform. Excellent performance on a system with four NVIDIA Tesla C2050 GPUs is reported. This makes libflame the first library to be released that incorporates multi-GPU functionality for dense matrix computations, setting a new standard for performance. 相似文献
3.
In the paper three distinct approaches to the implementation of an application in linear algebra on the AMT DAP 510 using the Fortran-Plus Enhanced language and compiler are investigated. In the first the implementation is tailored to fit a virtual array processor whose edge size corresponds to that of the maximum dimension of matrix used in the application. The second approach is a special case of the first, and corresponds to a situation in which the virtual array processor is constrained to have the same dimensions as the AMT DAP 510. The third approach may be considered to be a hybrid approach. In this case an implementation of the application is constructed which incorporates the most efficient features of the first two. Finally, comparisons of the performances of the three implementations, all of which were compiled using the Fortran-Plus Enhanced compiler, are presented. 相似文献
4.
A comparison is presented in regular algebra of the Gaussian and Gauss-Jordon elimination techniques for solving sparse systems of simultaneous equations. Specifically, the elimination form and product form of the star A * of a matrix A are defined and it is then shown that the product form is never more sparse than the elimination form. This result generalises an earlier one due to Brayton, Gustavson and Willoughby in which it is shown that the product form of the inverse A -1 of a matrix A is never more sparse than the elimination form of the inverse. Our result applies both in linear algebra and, more generally, to path-finding problems. 相似文献
5.
The implementation and evaluation of the performances on the ICL DAP of two algorithms for the parallel computation of eigenvalues and eigenvectors of moderately large real symmetric matrices of order N, where 64 < N 256, is reported. The first of the algorithms is a modified form of a Parallel Orthogonal Transformation algorithm proposed by Clint et al., which has already been implemented on the DAP for matrices of order N, where N < 65. The second, which has also been implemented on the DAP for matrices of order N, where N < 65, is Jacobi's algorithm, in the modified form proposed by Modi and Pryce. A comparison of the efficiency of the two algorithms for the solution of a variety of large matrices is given. 相似文献
6.
We analyse an alternative decomposition for a tridiagonal matrix which has the property that the decomposition as well as the subsequent solution process can be done in two parallel parts. This decomposition is equivalent to the two-sided Gaussian elimination algorithm that has been discussed by Babuska. In the context of parallel computing a similar approach has been suggested by Joubert and Cloete. The computational complexity of this alternative decomposition is the same as for the standard decomposition and a remarkable aspect is that it often leads to slightly more accurate solutions than the standard process does. The algorithm can be combined with recursive doubling or cyclic reduction in order to increase the degree of parallelism and vectorizability. 相似文献
7.
This paper describes the implementation and performance results for a few standard linear algebra routines on the Denelcor HEP computer. The algorithms used here are based on high-level modules that facilitate portability and perform efficiently in a wide range of environments. The modules are chosen to be of a large enough computational granularity so that reasonably optimum performance may be insured. The design of algorithms with such fundamental modules in mind will also facilitate their replacement by others more suited to gain the desired performance on a particular computer architecture. 相似文献
8.
Extreme scale supercomputers available before the end of this decade are expected to have 100 million to 1 billion computing cores. The power and energy efficiency issue has become one of the primary concerns of extreme scale high performance scientific computing. This paper surveys the research on saving power and energy for numerical linear algebra algorithms in high performance scientific computing on supercomputers around the world. We first stress the significance of numerical linear algebra algorithms in high performance scientific computing nowadays, followed by a background introduction on widely used numerical linear algebra algorithms and software libraries and benchmarks. We summarize commonly deployed power management techniques for reducing power and energy consumption in high performance computing systems by presenting power and energy models and two fundamental types of power management techniques: static and dynamic. Further, we review the research on saving power and energy for high performance numerical linear algebra algorithms from four aspects: profiling, trading off performance, static saving, and dynamic saving, and summarize state-of-the-art techniques for achieving power and energy efficiency in each category individually. Finally, we discuss potential directions of future work and summarize the paper. 相似文献
9.
分析研究了美国Sandia国家实验室Trilinos项目的设计思想、组织结构,该项目受到ASC计划等的资助.Trilinos框架中定义了一个线性代数对象模型,作为各种软件包的构建基础和沟通载体,此模型当前较成熟的实现为Epetra.详细地阐述了Epetra的组织设计和主要类的层次结构,通过两个数值实验考察该软件当前的性能.可以看到,以Trilinos为代表的众多数值计算项目,为各种数值软件的有效协作、集成和扩展所进行的努力和一些有价值的经验. 相似文献
10.
We compare five implementations of the Jacobi method for diagonalizing a symmetric matrix. Two of these, the classical Jacobi and sequential sweep Jacobi, have been used on sequential processors. The third method, the parallel sweep Jacobi, has been proposed as the method of choice for parallel processors. The fourth and fifth methods are believed to be new. They are similar to the parallel sweep method but use different schemes for selecting the rotations. The classical Jacobi method is known to take O(n4) time to diagonalize a matrix of order n. We find that the parallel sweep Jacobi run on one processor is about as fast as the sequential sweep Jacobi. Both of these methods take O(n3 log2n) time. One of our new methods also takes O(n3 log2n) time, but the other one takes only O(n3) time. The choice among the methods for parallel processors depends on the degree of parallelism possible in the hardware. The time required to diagonalize a matrix on a variety of architectures is modeled. Unfortunately for proponents of the Jacobi method, we find that the sequential QR method is always faster than the Jacobi method. The QR method is faster even for matrices that are nearly diagonal. If we perform the reduction to tridiagonal form in parallel, the QR method will be faster even on highly parallel systems. 相似文献
11.
Gaussian elimination over GF(2) is used in a number of applications including the factorisation of large integers. The boolean nature of arithmetic in GF(2) makes the task well suited to highly parallel bit-organised computers. A program to work with up to 4096 × 4096 matrices has been developed for the ICL-DAP. A method has been developed that needs no extra storage to store the history of the elimination. The algorithm is presented and its correctness proved. 相似文献
12.
In this paper we discuss design principles to implement a set of linear algebra subroutines in a portable libraty for parallel computers. Our design supports reuse of code and easy adaption to new parallel programming paradigms or network configurations. The routines are supposed to be used in self-verifying algorithms. They therefore have to deliver a validated result of high accuracy. 相似文献
13.
In this paper we show that finding solutions of a system of multivariate polynomial equalities and inequalities in the max algebra is equivalent to solving an Extended Linear Complementarity Problem. This allows us to find all solutions of such a system of multivariate polynomial equalities and inequalities and provides a geometrical insight in the structure of the solution set. We also demonstrate that this enables us to solve many important problems in the max algebra and the max-min-plus algebra such as matrix decompositions, construction of matrices with a given characteristic polynomial, state space transformations and the (minimal) state space realization problem.Research assistant with the N.F.W.O. (Belgian National Fund for Scientific Research).Senior research associate with the N.F.W.O. 相似文献
14.
Methods are presented for performing various error analyses of numerical algorithms. These analyses include forward, backward, and B-analysis (a combination of forward and backward). These analyses additionally provide alternative criteria by which different algorithms that solve the same problem may be compared. The conclusions of various comparison criteria are related to the correlation of errors in each algorithm. Finally, the analysis of a composite algorithm, which is made up of concatenated sub-algorithms, is given in terms of analyses done on its parts.
Zusammenfassung In dieser Arbeit werden Methoden vorgestellt, die es gestatten, verschiedene Fehleranalysen numerischer Algorithmen zu vollziehen. Darunter befinden sich Vorwärts- und Rückwärtsanalyse (forward and backward analysis) sowie beidseitige Analyse (B-analysis, eine Kombination von forward and backward) ein. Diese Analysen liefern zusätzlich weitere Kriterien, durch welche verschiedene Algorithmen, die dasselbe Problem lösen, verglichen werden können. Die Aussagen der verschiedenen Vergleichskriterien beziehen sich auf die Fehlerkorelation in jedem Algorithmus. Schließlich wird die Analyse zusammengesetzter Algorithmen, welche aus verketteten Subalgorithmen bestehen, mit Hilfe der Analysen, die an den Teilen vollzogen wurden, dargestellt.
This work was supported in part by the National Science Foundation under NSF Grant MCS 75-21758. 相似文献
15.
A new method of classification for numerical stability of parallel algorithms is proposed based on the theoretical foundation of forward error analysis. It partitions the algorithms according to their asymptotic stability—a measure introduced to relate the limiting behavior of the stability to the size of the problem. Using this method, the stability aspect of the pipelined solution technique for first-order and second-order linear recurrences—the core of a tridiagonal linear equation solver—is studied. In particular, it shows that the pipelined solution method of the first-order linear recurrences has the same degree of stability as the commonly used sequential evaluation algorithms. The stability problems of sequential and pipelined solution methods of the second-order linear recurrences are also studied. 相似文献
16.
In this paper, an algorithm based on linear algebra tools is proposed to compute a weighted sum of squares decomposition of a given form whose length is lower than the number of variables. Such an objective is pursued by using linear algebra techniques to perform tasks that are usually carried out through computational algebraic geometry tools. Several examples are reported to show that the use of linear algebra rather than algebraic geometry leads to a reduction of the execution times, without affecting the effectiveness of the algorithm. Applications of the given procedure to system analysis and to control design problems are reported as well as a detailed complexity analysis. 相似文献
17.
This paper furnishes a solution to the problem of designing robust controllers for linear servomechanisms. The results complete the work begun by Pearson et al. in that a general class of exogenous signals is included in the problem formulation. 相似文献
18.
To enhance linear structures in a gray level image, local operations with an additive score are normally used. Here a multiplicative score is used instead which gives better results than the additive one. The problem of segmenting the image of the multiplicative score is then dealt with where the threshold value can be automatically selected. The experimental results on some satellite images are reported. 相似文献
19.
A new tridiagonal Toeplitz linear system (TTLS) solver is proposed. The solver first decomposes an n-dimensional strictly diagonally dominant TTLS equation into a number of m-dimensional subsystems employing a modified Gaussian elimination method. An analytic solution of a continued fraction is obtained to derive the solver. The solver based on the modified Gaussian elimination method fully exploits parallelism. Computation and communication complexities of the proposed algorithm are all shown to be O( n/ m). 相似文献
20.
A new notation for a NAND operator is proposed, based on the prefix Polish notation following the style of Lukasiewicz. There is a direct one-to-one correspondence between the operators in the resulting system and gates in a circuit realization if identical subfunctions are recognized. An axiom set for the Boolean algebra based on this operator is given, and the axiomatic derivation of the algebra is demonstrated. A number of results concerning circuit manipulation, redundant sets of gate inputs, and necessary and sufficient conditions for the relocation of certain gate inputs are presented, and their use demonstrated by a number of examples. 相似文献
|