首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Skander Belhaj 《Computing》2010,87(3-4):169-186
In this paper, we present an algorithm for finding an approximate block diagonalization of complex Hankel matrices. Our method is based on inversion techniques of an upper triangular Toeplitz matrix, specifically, by simple forward substitution. We also consider an approximate block diagonalization of complex Hankel matrices via Schur complementation. An application of our algorithm by calculating the approximate polynomial quotient and remainder appearing in the Euclidean algorithm is also given. We have implemented our algorithms in Matlab. Numerical examples are included. They show the effectiveness of our strategy.  相似文献   

2.
In this work, efficient algorithms for sparse computations (reordering algorithms, storage schemes, symbolic factorization, master degree-of-freedom, L1D1U numerical factorization, forward and backward solutions) are developed and integrated into the proposed procedures. In order to exploit fast saxpy operations offered by many vector computers, take advantage of available cache in many workstations, and minimize data movements into fast memory, special storage schemes are designed to store the coefficient (unsymmetrical) matrix. Thus, the upper triangular portion of the coefficient matrix is stored in a compressed sparse row format, while the lower triangular portion of the same matrix is stored in a compressed sparse column format. A reordering algorithm is applied on one portion of the matrix to minimize fill-ins. Unsymmetrical matrix–vector multiplication has also been sparsely computed for “error-norm check” purpose. The entire sparse procedures have been coded in standard Fortran language.  相似文献   

3.
4.
S. Chandrasekaran  M. Gu  W. Lyons 《Calcolo》2005,42(3-4):171-185
Abstract We present a fast solver for matrices in hierarchically semi-separable form with highly non-uniform partition trees. The algorithm employed is based on an implicit ULV factorization. Numerical experiments are provided and demonstrate that the solution time scales linearly with problem size.  相似文献   

5.
We present a parallel algorithm for distributed memory multiprocessors, which is based on generalized marching (GM), one of the fastest methods in the class of fast Poisson solvers. The GM algorithm is not suited for any but very coarse-grain parallel processing. The main difficulty with parallelization is that the number of independent processes and the amount of work in each process change exponentially and in inverse proportion of each other. To improve parallelism, the matrices involved in GM are diagonalized performing multiple FFTs. In this way, independent processes extending across all the algorithm are obtained. The parallel GM has been tested on an Ncube/10 and a Symult S2010, running the Express communication system. A performance evaluation has been carried out using a scaled efficiency model and some classical parameters.  相似文献   

6.
7.
8.
This paper deals with the problem of computing the singular values and vectors of a Hankel operator with symbol m*W where m ε H is arbitrary inner and W ε H is rational. A simplified version of the formula given in [6] is obtained for computing the singular values of the Hankel operator. This result is applied to the (one-block) H optimal control of SISO stable infinite dimensional plants and rational weights. Using this new formula a simple expression is derived for the H optimal controller whose structure was observed in [9].  相似文献   

9.
An O(h6) accurate difference approximation to solutions of the Helmholtz equation is derived. The discrete equations are solved using a reduction procedure and Fourier analysis. Its computational performance is compared with a fourth order similar method over a set of linear and mildly nonlinear elliptic boundary value problems.  相似文献   

10.
In this note a method is presented for inverting ann times nmatrixMwith complex elements. The approach consists in inverting a2n times 2nmatrix with only real elements.  相似文献   

11.
This paper describes an efficient algorithm for the parallel solution of systems of linear equations with a block tridiagonal coefficient matrix. The algorithm comprises a multilevel LU-factorization based on block cyclic reduction and a corresponding solution algorithm.

The paper includes a general presentation of the parallel multilevel LU-factorization and solution algorithms, but the main emphasis is on implementation principles for a message passing computer with hypercube topology. Problem partitioning, processor allocation and communication requirement are discussed for the general block tridiagonal algorithm.

Band matrices can be cast into block tridiagonal form, and this special but important problem is dealt with in detail. It is demonstrated how the efficiency of the general block tridiagonal multilevel algorithm can be improved by introducing the equivalent of two-way Gaussian elimination for the first and the last partitioning and by carefully balancing the load of the processors. The presentation of the multilevel band solver is accompanied by detailed complexity analyses.

The properties of the parallel band solver were evaluated by implementing the algorithm on an Intel iPSC hypercube parallel computer and solving a larger number of banded linear equations using 2 to 32 processors. The results of the evaluation include speed-up over a sequential processor, and the measure values are in good agreement with the theoretical values resulting from complexity analysis. It is found that the maximum asymptotic speed-up of the multilevel LU-factorization using p processors and load balancing is approximated well by the expression (p +6)/4.

Finally, the multilevel parallel solver is compared with solvers based on row and column interleaved organization.  相似文献   


12.
Jacket matrices motivated by the center weight Hadamard matrices have played an important role in signal processing, communications, image compression, cryptography, etc. In this paper, we suggest a design approach for the Pauli block jacket matrix achieved by substituting some Pauli matrices for all elements of common matrices. Since, the well-known Pauli matrices have been widely utilized for quantum information processing, the large-order Pauli block jacket matrix that contains commutative row operations are investigated in detail. After that some special Abelian groups are elegantly generated from any independent rows of the yielded Pauli block jacket matrix. Finally, we show how the Pauli block jacket matrix can simplify the coding theory of quantum error-correction. The quantum codes we provide do not require the dual-containing constraint necessary for the standard quantum error-correction codes, thus allowing us to construct quantum codes of the large codeword length. The proposed codes can be constructed structurally by using the stabilizer formalism of Abelian groups whose generators are selected from the row operations of the Pauli block jacket matrix, and hence have advantages of being fast constructed with the asymptotically good behaviors.  相似文献   

13.
FABRIK: A fast,iterative solver for the Inverse Kinematics problem   总被引:3,自引:0,他引:3  
Inverse Kinematics is defined as the problem of determining a set of appropriate joint configurations for which the end effectors move to desired positions as smoothly, rapidly, and as accurately as possible. However, many of the currently available methods suffer from high computational cost and production of unrealistic poses. In this paper, a novel heuristic method, called Forward And Backward Reaching Inverse Kinematics (FABRIK), is described and compared with some of the most popular existing methods regarding reliability, computational cost and conversion criteria. FABRIK avoids the use of rotational angles or matrices, and instead finds each joint position via locating a point on a line. Thus, it converges in few iterations, has low computational cost and produces visually realistic poses. Constraints can easily be incorporated within FABRIK and multiple chains with multiple end effectors are also supported.  相似文献   

14.
We present an implementation of a stiff ODE solver on the AP120B, and discuss design considerations which should be generally applicable to ODE software on attached processors of this type.  相似文献   

15.
16.
An optimized implementation of a block tridiagonal solver based on the block cyclic reduction (BCR) algorithm is introduced and its portability to graphics processing units (GPUs) is explored. The computations are performed on the NVIDIA GTX480 GPU. The results are compared with those obtained on a single core of Intel Core i7-920 (2.67 GHz) in terms of calculation runtime. The BCR linear solver achieves the maximum speedup of 5.84x with block size of 32 over the CPU Thomas algorithm in double precision. The proposed BCR solver is applied to discontinuous Galerkin (DG) simulations on structured grids via alternating direction implicit (ADI) scheme. The GPU performance of the entire computational fluid dynamics (CFD) code is studied for different compressible inviscid flow test cases. For a general mesh with quadrilateral elements, the ADI-DG solver achieves the maximum total speedup of 7.45x for the piecewise quadratic solution over the CPU platform in double precision.  相似文献   

17.
A modification of the algorithm shown by Q.G. Wang and C.H. Zhou (see ibid., vol.AC-31, p.165-6, 1968) is presented. The performance of their algorithm is improved by the use of convolutions and therefore of FFT techniques. The present method is based on the fast inversion of block triangular Toeplitz matrices, and it is amenable to parallel implementation  相似文献   

18.
In this paper, we propose an efficient image inpainting algorithm by introducing important aspects and improvements corresponding to the filling order of the pixels in the target region and texture synthesis in a dynamic searching range. The algorithm consists of two parts. The first part decides the filling order of the pixels in the target regions based on the high accuracy fast marching method. The second part of the algorithm implicitly assumes a Markov random field model for textured image regions and computes blocks of texture using an efficient search process and the SSD (Sum of Squared Differences) measure. The algorithm is straightforward to implement and restores the target regions with visually plausible quality that is at par or better than several existing methods, with a lower execution cost.  相似文献   

19.
In this paper a parallelization of a Shallow Water numerical scheme suitable for Graphics Processor Unit (GPU) architectures under the NVIDIA™'s Compute Unified Device Architecture (CUDA) framework is presented. In order to provide robust and accurate simulations of real flood events, the system features a state-of-the-art Finite Volume explicit discretization technique which is well balanced, second order accurate and based on positive depth reconstruction. The model is based on a Cartesian grid and boundary conditions are implemented by means of the implicit local ghost cell approach, which enables the discretization of a broad spectrum of boundary conditions including inflow/outflow conditions. A novel and efficient Block Deactivation Optimization procedure has also been adopted, in order to increase the efficiency of the numerical scheme in the presence of wetting-drying fronts. This led to speedups of two orders of magnitude with respect to a single-core CPU. The code has been validated against several severe benchmark test cases, and its capability of producing accurate fast simulations (with high ratios between physical and computing times) for different real world cases has been shown.  相似文献   

20.
可满足性问题是计算机理论与应用的核心问题。在FPGA上提出了一个基于不完全算法的并行求解器pprobSAT+。使用多线程的策略来减少相关组件的等待时间,提高了求解器效率。此外,不同线程采用共用地址和子句信息的数据存储结构,以减少片上存储器的资源开销。当所有数据均存储在FPGA的片上存储器时,pprobSAT+求解器可以达到最佳性能。实验结果表明,相比于单线程的求解器,所提出的pprobSAT+求解器可获得超过2倍的加速比。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号