期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Computing the block factorization of complex Hankel matrices

Skander Belhaj 《Computing》2010,87(3-4):169-186

In this paper, we present an algorithm for finding an approximate block diagonalization of complex Hankel matrices. Our method is based on inversion techniques of an upper triangular Toeplitz matrix, specifically, by simple forward substitution. We also consider an approximate block diagonalization of complex Hankel matrices via Schur complementation. An application of our algorithm by calculating the approximate polynomial quotient and remainder appearing in the Euclidean algorithm is also given. We have implemented our algorithms in Matlab. Numerical examples are included. They show the effectiveness of our strategy. 相似文献

2.

Vector-sparse solver for unsymmetrical matrices

《Advances in Engineering Software》2000,31(8-9):563-569

In this work, efficient algorithms for sparse computations (reordering algorithms, storage schemes, symbolic factorization, master degree-of-freedom, L¹D¹U numerical factorization, forward and backward solutions) are developed and integrated into the proposed procedures. In order to exploit fast saxpy operations offered by many vector computers, take advantage of available cache in many workstations, and minimize data movements into fast memory, special storage schemes are designed to store the coefficient (unsymmetrical) matrix. Thus, the upper triangular portion of the coefficient matrix is stored in a compressed sparse row format, while the lower triangular portion of the same matrix is stored in a compressed sparse column format. A reordering algorithm is applied on one portion of the matrix to minimize fill-ins. Unsymmetrical matrix–vector multiplication has also been sparsely computed for “error-norm check” purpose. The entire sparse procedures have been coded in standard Fortran language. 相似文献

3.

A VLSI fast solver for tridiagonal linear systems

《Information Processing Letters》1986,23(3):111-114

相似文献

4.

A fast adaptive solver for hierarchically semiseparable representations

S. Chandrasekaran M. Gu W. Lyons 《Calcolo》2005,42(3-4):171-185

Abstract We present a fast solver for matrices in hierarchically semi-separable form with highly non-uniform partition trees. The algorithm employed is based on an implicit ULV factorization. Numerical experiments are provided and demonstrate that the solution time scales linearly with problem size. 相似文献

5.

A fast poisson solver for distributed memory multiprocessors

D. Di Serafino A. Murli F. Perla 《Concurrency and Computation》1992,4(7):499-508

We present a parallel algorithm for distributed memory multiprocessors, which is based on generalized marching (GM), one of the fastest methods in the class of fast Poisson solvers. The GM algorithm is not suited for any but very coarse-grain parallel processing. The main difficulty with parallelization is that the number of independent processes and the amount of work in each process change exponentially and in inverse proportion of each other. To improve parallelism, the matrices involved in GM are diagonalized performing multiple FFTs. In this way, independent processes extending across all the algorithm are obtained. The parallel GM has been tested on an Ncube/10 and a Symult S2010, running the Express communication system. A performance evaluation has been carried out using a scaled efficiency model and some classical parameters. 相似文献

6.

A definiteness test for Hankel matrices and their lower submatrices

I. Koltracht P. Lancaster 《Computing》1987,39(1):19-26

相似文献

7.

HYMNISBLOCK — Eigenvalue solver for blocked matrices

R. Gruber 《Computer Physics Communications》1980,20(3):421-428

相似文献

8.

A simpler formula for the singular values of a certain Hankel operator

Hitay zbay 《Systems & Control Letters》1990,15(5):381-390

This paper deals with the problem of computing the singular values and vectors of a Hankel operator with symbol m^*W where m ε H^∞ is arbitrary inner and W ε H^∞ is rational. A simplified version of the formula given in [6] is obtained for computing the singular values of the Hankel operator. This result is applied to the (one-block) H^∞ optimal control of SISO stable infinite dimensional plants and rational weights. Using this new formula a simple expression is derived for the H^∞ optimal controller whose structure was observed in [9]. 相似文献

9.

A sixth order fast direct helmholtz equation solver

E.N. Houstis R.E. Lynch T.S. Papatheodorou 《Mathematics and computers in simulation》1980,22(2):91-97

An O(h⁶) accurate difference approximation to solutions of the Helmholtz equation is derived. The discrete equations are solved using a reduction procedure and Fourier analysis. Its computational performance is compared with a fourth order similar method over a set of linear and mildly nonlinear elliptic boundary value problems. 相似文献

10.

A note on the inversion of complex matrices

Smith W. Jr. Erdman S. 《Automatic Control, IEEE Transactions on》1974,19(1):64-64

In this note a method is presented for inverting ann times nmatrixMwith complex elements. The approach consists in inverting a2n times 2nmatrix with only real elements. 相似文献

11.

A multilevel parallel solver for block tridiagonal and banded linear systems

Ibrahim N. Hajj Stig Skelboe 《Parallel Computing》1990,15(1-3):21-45

This paper describes an efficient algorithm for the parallel solution of systems of linear equations with a block tridiagonal coefficient matrix. The algorithm comprises a multilevel LU-factorization based on block cyclic reduction and a corresponding solution algorithm.

The paper includes a general presentation of the parallel multilevel LU-factorization and solution algorithms, but the main emphasis is on implementation principles for a message passing computer with hypercube topology. Problem partitioning, processor allocation and communication requirement are discussed for the general block tridiagonal algorithm.

Band matrices can be cast into block tridiagonal form, and this special but important problem is dealt with in detail. It is demonstrated how the efficiency of the general block tridiagonal multilevel algorithm can be improved by introducing the equivalent of two-way Gaussian elimination for the first and the last partitioning and by carefully balancing the load of the processors. The presentation of the multilevel band solver is accompanied by detailed complexity analyses.

The properties of the parallel band solver were evaluated by implementing the algorithm on an Intel iPSC hypercube parallel computer and solving a larger number of banded linear equations using 2 to 32 processors. The results of the evaluation include speed-up over a sequential processor, and the measure values are in good agreement with the theoretical values resulting from complexity analysis. It is found that the maximum asymptotic speed-up of the multilevel LU-factorization using p processors and load balancing is approximated well by the expression (p +6)/4.

Finally, the multilevel parallel solver is compared with solvers based on row and column interleaved organization. 相似文献

12.

Fast quantum codes based on Pauli block jacket matrices

Ying Guo Jun Peng Moon Ho Lee 《Quantum Information Processing》2009,8(5):361-378

Jacket matrices motivated by the center weight Hadamard matrices have played an important role in signal processing, communications, image compression, cryptography, etc. In this paper, we suggest a design approach for the Pauli block jacket matrix achieved by substituting some Pauli matrices for all elements of common matrices. Since, the well-known Pauli matrices have been widely utilized for quantum information processing, the large-order Pauli block jacket matrix that contains commutative row operations are investigated in detail. After that some special Abelian groups are elegantly generated from any independent rows of the yielded Pauli block jacket matrix. Finally, we show how the Pauli block jacket matrix can simplify the coding theory of quantum error-correction. The quantum codes we provide do not require the dual-containing constraint necessary for the standard quantum error-correction codes, thus allowing us to construct quantum codes of the large codeword length. The proposed codes can be constructed structurally by using the stabilizer formalism of Abelian groups whose generators are selected from the row operations of the Pauli block jacket matrix, and hence have advantages of being fast constructed with the asymptotically good behaviors. 相似文献

13.

FABRIK: A fast,iterative solver for the Inverse Kinematics problem 总被引：3，自引：0，他引：3

Andreas Aristidou Joan Lasenby 《Graphical Models》2011,73(5):243-260

Inverse Kinematics is defined as the problem of determining a set of appropriate joint configurations for which the end effectors move to desired positions as smoothly, rapidly, and as accurately as possible. However, many of the currently available methods suffer from high computational cost and production of unrealistic poses. In this paper, a novel heuristic method, called Forward And Backward Reaching Inverse Kinematics (FABRIK), is described and compared with some of the most popular existing methods regarding reliability, computational cost and conversion criteria. FABRIK avoids the use of rotational angles or matrices, and instead finds each joint position via locating a point on a line. Thus, it converges in few iterations, has low computational cost and produces visually realistic poses. Constraints can easily be incorporated within FABRIK and multiple chains with multiple end effectors are also supported. 相似文献

14.

A stiff ODE solver for an attached processor

J.D. Pryce J.W. Paine 《Computer Physics Communications》1982,27(1):97-100

We present an implementation of a stiff ODE solver on the AP120B, and discuss design considerations which should be generally applicable to ODE software on attached processors of this type. 相似文献

15.

A new preconditioner for the interface system arising in a fast Helmholtz solver

Kui Du 《Computers & Mathematics with Applications》2012,63(4):794-806

相似文献

16.

A discontinuous Galerkin method with block cyclic reduction solver for simulating compressible flows on GPUs

Behzad Baghapour Mohammad Torabzadeh Hossein Mahmoodi Darian 《国际计算机数学杂志》2015,92(1):110-131

An optimized implementation of a block tridiagonal solver based on the block cyclic reduction (BCR) algorithm is introduced and its portability to graphics processing units (GPUs) is explored. The computations are performed on the NVIDIA GTX480 GPU. The results are compared with those obtained on a single core of Intel Core i7-920 (2.67 GHz) in terms of calculation runtime. The BCR linear solver achieves the maximum speedup of 5.84x with block size of 32 over the CPU Thomas algorithm in double precision. The proposed BCR solver is applied to discontinuous Galerkin (DG) simulations on structured grids via alternating direction implicit (ADI) scheme. The GPU performance of the entire computational fluid dynamics (CFD) code is studied for different compressible inviscid flow test cases. For a general mesh with quadrilateral elements, the ADI-DG solver achieves the maximum total speedup of 7.45x for the piecewise quadratic solution over the CPU platform in double precision. 相似文献

17.

A fast algorithm for the division of two polynomial matrices

《Automatic Control, IEEE Transactions on》1989,34(4):446-448

A modification of the algorithm shown by Q.G. Wang and C.H. Zhou (see ibid., vol.AC-31, p.165-6, 1968) is presented. The performance of their algorithm is improved by the use of convolutions and therefore of FFT techniques. The present method is based on the fast inversion of block triangular Toeplitz matrices, and it is amenable to parallel implementation 相似文献

18.

Image restoration based on the fast marching method and block based sampling

《Computer Vision and Image Understanding》2010,114(8):847-856

In this paper, we propose an efficient image inpainting algorithm by introducing important aspects and improvements corresponding to the filling order of the pixels in the target region and texture synthesis in a dynamic searching range. The algorithm consists of two parts. The first part decides the filling order of the pixels in the target regions based on the high accuracy fast marching method. The second part of the algorithm implicitly assumes a Markov random field model for textured image regions and computes blocks of texture using an efficient search process and the SSD (Sum of Squared Differences) measure. The algorithm is straightforward to implement and restores the target regions with visually plausible quality that is at par or better than several existing methods, with a lower execution cost. 相似文献

19.

GPU-enhanced Finite Volume Shallow Water solver for fast flood simulations

《Environmental Modelling & Software》2014

In this paper a parallelization of a Shallow Water numerical scheme suitable for Graphics Processor Unit (GPU) architectures under the NVIDIA™'s Compute Unified Device Architecture (CUDA) framework is presented. In order to provide robust and accurate simulations of real flood events, the system features a state-of-the-art Finite Volume explicit discretization technique which is well balanced, second order accurate and based on positive depth reconstruction. The model is based on a Cartesian grid and boundary conditions are implemented by means of the implicit local ghost cell approach, which enables the discretization of a broad spectrum of boundary conditions including inflow/outflow conditions. A novel and efficient Block Deactivation Optimization procedure has also been adopted, in order to increase the efficiency of the numerical scheme in the presence of wetting-drying fronts. This led to speedups of two orders of magnitude with respect to a single-core CPU. The code has been validated against several severe benchmark test cases, and its capability of producing accurate fast simulations (with high ratios between physical and computing times) for different real world cases has been shown. 相似文献

20.

基于不完全算法的并行FPGA SAT求解器

黎铁军马柯帆张建民《计算机工程与科学》2021,43(12):2126-2130

可满足性问题是计算机理论与应用的核心问题。在FPGA上提出了一个基于不完全算法的并行求解器pprobSAT+。使用多线程的策略来减少相关组件的等待时间,提高了求解器效率。此外,不同线程采用共用地址和子句信息的数据存储结构,以减少片上存储器的资源开销。当所有数据均存储在FPGA的片上存储器时,pprobSAT+求解器可以达到最佳性能。实验结果表明,相比于单线程的求解器,所提出的pprobSAT+求解器可获得超过2倍的加速比。相似文献