期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Chebyshev polynomial acceleration for block SOR methods for solving the rank-deficient least-squares problem

《国际计算机数学杂志》2012,89(1):6-20

In this article, we give the acceleration of the block successive overrelaxation (SOR) method for solving the rank-deficient least-squares problem. Santos and Silva proposed the two-block SOR method and the three-block SOR method. Here, we consider the acceleration of the two-block SOR method and the three-block SOR method using the Chebyshev polynomial and derive what we term the C-2-block SOR method and the C-3-block SOR method. The advantage of our methods is that we can get good results with very small iteration number. The comparison between the C-2-block method and the C-3-block method is presented. Finally, numerical examples are given. 相似文献

2.

广义修正HSS迭代法的超松弛加逮

陈芳蒋耀林《数值计算与计算机应用》2011,32(1):41-48

通过推广修正埃尔米特和反埃尔米特（MHSS）迭代法,我们进一步得到了求解大型稀疏非埃尔米特正定线性方程组的广义MHSS（GMHSS）迭代法．基于不动点方程,我们还将超松弛（SOR）技术运用到了GMHSS迭代法,得到了关于GMHSS迭代法的SOR加速,并分析了它的收敛性．数值算例表明,SOR技术能够大大提高加速GMHSS迭代法的收敛效率．相似文献

3.

Design,implementation, and evaluation of a low-complexity vector-core for executing scalar/vector instructions

Mostafa I. Soliman 《Journal of Parallel and Distributed Computing》2013

This paper proposes a low-complexity vector-core called LcVc for executing both scalar and vector instructions on the same execution datapath. A unified register file in the decode stage is used for storing both scalar operands and vector elements. The execution stage accepts a new set of operands each cycle and produces a new result. Rather than issuing a vector instruction (1-D operations) as a whole, each vector operation is issued sequentially with the existing scalar issue hardware. In the first implementation of LcVc, all loads and stores of registers take place from the data cache in the memory access stage in a rate of one element per clock cycle. The complete design of our proposed LcVc processor is implemented using VHDL targeting the Xilinx FPGA Spartan 3E, xc3s1600e-4-fg320 device. The total number of slices required for implementing LcVc is 1778, where the number of slice flip-flops is 538 and the number of 4-input LUTs is 3706: 1914 for logic and 1792 for RAMs. Moreover, our performance evaluation results show that the speedup of executing vector addition, vector scaling, SAXPY, and matrix–matrix multiplication on LcVc over the scalar execution are 2.3, 2.5, 1.9, and 3, respectively. The hardware required to support the enhanced vector capability is insignificant (5%), which results in reducing the area per core and increasing the number of cores available in a given chip area. 相似文献

4.

Optimal control of scalar conservation laws by on/off-switching

Sebastian Pfaff 《Optimization methods & software》2017,32(4):904-939

相似文献

5.

Evaluation of fortran vector compilers and preprocessors

Glenn Luecke Waqar Haque James Hoekstra Howard Jespersen James Coyle 《Software》1991,21(9):891-905

Many scientific codes can achieve significant performance improvement when executed on a computer equipped with a vector processor. Vector constructs in source code should be recognized by a vectorizing compiler or preprocessor. This paper discusses, from a general point of view, how a vectorizing compiler/preprocessor can be evaluated. The areas discussed include data dependence analysis, IF loop analysis, nested loops, loop interchanging, loop collapsing, indirect addressing, use of temporary storage, and order of arithmetic. The ideas presented are based on vectorization of over a million lines of production codes and an extensive test suite developed to evaluate preprocessors under varying degrees of code complexity. Areas for future research are also discussed. 相似文献

6.

Scheduling parallel iterative methods on multiprocessor systems

Nikolaos M. Missirlis 《Parallel Computing》1987,5(3):295-302

The paper describes the implementation of the Successive Overrelaxation (SOR) method on an asynchronous multiprocessor computer for solving large, linear systems. The parallel algorithm is derived by dividing the serial SOR method into noninterfering tasks which are then combined with an optimal schedule of a feasible number of processors. The important features of the algorithm are: (i) achieves a speedup S_p O(N/3) and an efficiency E_p 2/3 using P = [N/2] processors, where N is the number of the equations, (ii) contains a high level of inherent parallelism, whereas on the other hand, the convergence theory of the parallel SOR method is the same as its sequential counterpart and (iii) may be modified to use block methods in order to minimise the overhead due to communication and synchronisation of the processors. 相似文献

7.

A comparison of the computatlonal complexities of the stride reduction methods

《国际计算机数学杂志》2012,89(3-4):307-320

In this paper the 'Stride of 3' reduction method is compared with the wtfell known cyclic reduction method for solving tridiagonal systems derived from the discretiped steady state convection diffusion equation. The Stride of 3 algorithm is shown to be superior for moderate to large linear systems (e.g., of order > 20). 相似文献

8.

On some new approximate factorization methods for block tridiagonal matrices suitable for vector and parallel processors

Hou-Biao Li Ting-Zhu Huang Yong Zhang Xing-Ping Liu Hong Li 《Mathematics and computers in simulation》2009

In this paper, to obtain an efficient parallel algorithm to solve sparse block-tridiagonal linear systems, stair matrices are used to construct some parallel polynomial approximate inverse preconditioners. These preconditioners are suitable when the desired goal is to maximize parallelism. Moreover, some theoretical results concerning these preconditioners are presented and how to construct preconditioners effectively for any nonsingular block tridiagonal H-matrices is also described. In addition, the validity of these preconditioners is illustrated with some numerical experiments arising from the second order elliptic partial differential equations and oil reservoir simulations. 相似文献

9.

伏秒积矢量法改进的SPWM逆变器研究

李小凡《自动化与仪器仪表》2011,(1):89-90

针对正弦脉宽调制技术存在直流母线电压利用率低等问题,提出了一种基于伏秒积矢量的分析方法。通过对正弦脉宽调制的伏秒积矢量分析得出:改进后SPWM逆变器提高了直流电压利用率,与空间电压矢量调制相近,而运算量比空间电压矢量调制要降低很多,并有效减小了开关频率。以三相全桥逆变器为对象,对提出的改进正弦脉宽调制进行仿真实验和样机实验。实验结果验证了上述方法的正确性和有效性。相似文献

10.

A comparison of CPU and GPU implementations for solving the Convection Diffusion equation using the local Modified SOR method

Yiannis Cotronis Elias Konstantinidis Maria A. Louka Nikolaos M. Missirlis 《Parallel Computing》2014

In this paper we study a parallel form of the SOR method for the numerical solution of the Convection Diffusion equation suitable for GPUs using CUDA. To exploit the parallelism offered by GPUs we consider the fine grain parallelism model. This is achieved by considering the local relaxation version of SOR. More specifically, we use SOR with red-black ordering using two sets of parameters _ω₁_ij

ω_{1 ij}

and _ω₂_ij

ω_{2 ij}

for the 5 point stencil. The parameter _ω₁_ij

ω_{1 ij}

is associated with each red (i + j even) grid point (i,j)

(i, j)

, whereas the parameter _ω₂_ij

ω_{2 ij}

is associated with each black (i+j

(i + j

odd) grid point (i,j)

(i, j)

. The use of a parameter for each grid point avoids the global communication required in the adaptive determination of the best value of ω

ω

and also increases the convergence rate of the SOR method (Varga, 1962) [38] and (Young, 1971) [41]. We present our strategy and the results of our effort to exploit the computational capabilities of GPUs under the CUDA environment. Additionally, two parallel CPU programs utilizing manual SSE2 (Streaming SIMD Extensions 2) and AVX (Advanced Vector Extensions) vectorization were developed as performance references. The optimizations applied on the GPU version were also considered for the CPU version. Significant performance improvement was achieved with all three developed GPU kernels differentiated by the degree of recomputations thus affecting the flops per element access ratio. 相似文献

11.

On the generalisation of the basic iterative methods for the solution of linear systems

《国际计算机数学杂志》2012,89(3-4):355-369

For the solution of the linear system Ax=b many iterative methods based on a splitting of A exist. Among them the Jacobi, the Gauss-Seidel and the Successive Overrelaxation (SOR) methods as well as their extrapolated counterparts are the most popular. This paper presents a new general method such that the aforementioned methods become special cases of it. Besides its four degrees of freedom, which make it a very flexible method, another of its main characteristics is that it is well-defined even when some elements on the diagonal of A are zero. The first results concerning the new method show that a proper exploitation of its basic properties will make it a very powerful technique. 相似文献

12.

Dense linear systems FORTRAN solvers on the IBM 3090 vector multiprocessor

G. Radicati Y. Robert P. Sguazzero 《Parallel Computing》1988,8(1-3):377-384

The IBM 3090 is a vector multiprocessor with a hierarchical memory system. We show with two examples (the LU and Householder factorizations) that the complex memory system and the vector hardware can be used efficiently by recasting the basic algorithms in terms of high-level matrix-matrix modules. 相似文献

13.

Image indexing and retrieval based on vector quantization

Shyh Wei Teng^{Author Vitae} Guojun Lu Author Vitae 《Pattern recognition》2007,40(11):3299-3316

To effectively utilize information stored in a digital image library, effective image indexing and retrieval techniques are essential. This paper proposes an image indexing and retrieval technique based on the compressed image data using vector quantization (VQ). By harnessing the characteristics of VQ, the proposed technique is able to capture the spatial relationships of pixels when indexing the image. Experimental results illustrate the robustness of the proposed technique and also show that its retrieval performance is higher compared with existing color-based techniques. 相似文献

14.

决策树支持向量机多分类器设计的向量投影法 总被引：1，自引：1，他引：1

厉小润赵光宙赵辽英《控制与决策》2008,23(7)

针对如何有效地设计决策树支持向量机(SVM)多类分类器的层次结构这个关键问题,提出一种基于向量投影的类间可分性测度的设计方法,并给出一种基于该类间可分性测度设计决策树SVM多分类器层次结构的方法.为加快每个SVM子分类器的训练速度且保持其高推广性,将基于向量投影的支持向量预选取方法用于每个子分类器的训练中.通过对3个大规模数据集和手写体数字识别的仿真实验表明,新方法能有效地提高决策树SVM多类分类器的分类精度和速度. 相似文献

15.

一种支持向量逐步回归机算法研究 总被引：2，自引：2，他引：2

下载免费PDF全文

曾绍华魏延段廷才曹长修《计算机工程与应用》2007,43(8):78-81

支持向量机是解决非线性问题的重要工具,对多元线性回归模型和支持向量机的原始形式进行比较,拟定从样本子集的多元线性回归模型出发,逐步搜索支持向量,提出了一种建立支持向量回归机的快速算法,以降低核矩阵的规模从而降低解凸二次规划的复杂度;最后,分析了该算法的复杂度,并提供了一个算例。相似文献

16.

Convergence testing on a distributed network of processors

《国际计算机数学杂志》2012,89(2):357-378

The termination of iterative algorithms on a distributed network of transputers is an important issue with the increasing usage of parallel computers.

In this paper we analyse the computational and communication costs of performing the convergence tests on the solution of the Laplace Equation on a two dimensional region,.i.e., the unit square.

Finally a strategy of terminating the iteration without convergence testing is demonstrated. 相似文献

17.

Designing linear algebra algorithms on the IBM 3090 vector multiprocessor with a hierarchical memory system

G. Radicati Y. Robert P. Sguazzero 《Calcolo》1988,25(1-2):153-167

The IBM 3090 is a vector multiprocessor with a hierarchical memory system. We show with two examples (the LU and Householder factorizations) that the complex memory system and the vector hardware can be used efficiently by recasting the basic algorithms in terms of high-level matrix-matrix modules. 相似文献

18.

基于双相锁相放大器的微弱信号矢量测量 总被引：2，自引：0，他引：2

蔡屹《微计算机信息》2007,23(25):111-112,228

噪声存在于任何一个系统中,当所要检测的信号比较微弱且淹没在强噪声背景中时,用传统的检测方法检测信号非常困难,因此如何把淹没于强噪声中的有用信号提取出来的问题越来越引起人们的关注。该文采用时域检测方法,利用移相技术得到相互正交的参考方波信号,通过互相关算法,完成了正交矢量型LIA相关器的实现,利用该方法实现了对微弱信号幅值和相位的检测,有效地抑制了干扰,提高了系统的性能。相似文献

19.

Relative performance comparisons for the Group Explicit class of methods on MIMD, SIMD and pipelined vector computers

M. P. Bekakos D. J. Evans 《Parallel Computing》1989,10(3):357-364

相似文献

20.

一种改进的支持向量机在入侵检测系统中的应用 总被引：1，自引：0，他引：1

李小剑谢晓尧《网络安全技术与应用》2014,(8):10-12

入侵检测作为一种积极主动的防御技术,已成为信息安全领域的重要研究内容。将统计机器学习方法引入到入侵检测技术中,具有重要的现实意义。但单纯使用支持向量机的机器学习方法对网络连接记录进行分类,对于远离分类超平面的正负实例点能以充分大的确信度将它们区分开来,但对于离分类面比较近的实例点,被正确分类的可信度较低,还有可能因为各种主客观的因素造成误分类。为此,引入K近邻法,对分类面附近的实例点进行二次分类,并借鉴KDDCUP99公共数据集描述网络连接的41个特征进行了仿真实验,实验结果表明,相比单独使用支持向量机的方法,分类的准确率有了进一步的提高。相似文献