首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In this article, we give the acceleration of the block successive overrelaxation (SOR) method for solving the rank-deficient least-squares problem. Santos and Silva proposed the two-block SOR method and the three-block SOR method. Here, we consider the acceleration of the two-block SOR method and the three-block SOR method using the Chebyshev polynomial and derive what we term the C-2-block SOR method and the C-3-block SOR method. The advantage of our methods is that we can get good results with very small iteration number. The comparison between the C-2-block method and the C-3-block method is presented. Finally, numerical examples are given.  相似文献   

2.
通过推广修正埃尔米特和反埃尔米特(MHSS)迭代法,我们进一步得到了求解大型稀疏非埃尔米特正定线性方程组的广义MHSS(GMHSS)迭代法.基于不动点方程,我们还将超松弛(SOR)技术运用到了GMHSS迭代法,得到了关于GMHSS迭代法的SOR加速,并分析了它的收敛性.数值算例表明,SOR技术能够大大提高加速GMHSS迭代法的收敛效率.  相似文献   

3.
This paper proposes a low-complexity vector-core called LcVc for executing both scalar and vector instructions on the same execution datapath. A unified register file in the decode stage is used for storing both scalar operands and vector elements. The execution stage accepts a new set of operands each cycle and produces a new result. Rather than issuing a vector instruction (1-D operations) as a whole, each vector operation is issued sequentially with the existing scalar issue hardware. In the first implementation of LcVc, all loads and stores of registers take place from the data cache in the memory access stage in a rate of one element per clock cycle. The complete design of our proposed LcVc processor is implemented using VHDL targeting the Xilinx FPGA Spartan 3E, xc3s1600e-4-fg320 device. The total number of slices required for implementing LcVc is 1778, where the number of slice flip-flops is 538 and the number of 4-input LUTs is 3706: 1914 for logic and 1792 for RAMs. Moreover, our performance evaluation results show that the speedup of executing vector addition, vector scaling, SAXPY, and matrix–matrix multiplication on LcVc over the scalar execution are 2.3, 2.5, 1.9, and 3, respectively. The hardware required to support the enhanced vector capability is insignificant (5%), which results in reducing the area per core and increasing the number of cores available in a given chip area.  相似文献   

4.
5.
Many scientific codes can achieve significant performance improvement when executed on a computer equipped with a vector processor. Vector constructs in source code should be recognized by a vectorizing compiler or preprocessor. This paper discusses, from a general point of view, how a vectorizing compiler/preprocessor can be evaluated. The areas discussed include data dependence analysis, IF loop analysis, nested loops, loop interchanging, loop collapsing, indirect addressing, use of temporary storage, and order of arithmetic. The ideas presented are based on vectorization of over a million lines of production codes and an extensive test suite developed to evaluate preprocessors under varying degrees of code complexity. Areas for future research are also discussed.  相似文献   

6.
The paper describes the implementation of the Successive Overrelaxation (SOR) method on an asynchronous multiprocessor computer for solving large, linear systems. The parallel algorithm is derived by dividing the serial SOR method into noninterfering tasks which are then combined with an optimal schedule of a feasible number of processors. The important features of the algorithm are: (i) achieves a speedup Sp O(N/3) and an efficiency Ep 2/3 using P = [N/2] processors, where N is the number of the equations, (ii) contains a high level of inherent parallelism, whereas on the other hand, the convergence theory of the parallel SOR method is the same as its sequential counterpart and (iii) may be modified to use block methods in order to minimise the overhead due to communication and synchronisation of the processors.  相似文献   

7.
《国际计算机数学杂志》2012,89(3-4):307-320
In this paper the 'Stride of 3' reduction method is compared with the wtfell known cyclic reduction method for solving tridiagonal systems derived from the discretiped steady state convection diffusion equation. The Stride of 3 algorithm is shown to be superior for moderate to large linear systems (e.g., of order > 20).  相似文献   

8.
In this paper, to obtain an efficient parallel algorithm to solve sparse block-tridiagonal linear systems, stair matrices are used to construct some parallel polynomial approximate inverse preconditioners. These preconditioners are suitable when the desired goal is to maximize parallelism. Moreover, some theoretical results concerning these preconditioners are presented and how to construct preconditioners effectively for any nonsingular block tridiagonal H-matrices is also described. In addition, the validity of these preconditioners is illustrated with some numerical experiments arising from the second order elliptic partial differential equations and oil reservoir simulations.  相似文献   

9.
针对正弦脉宽调制技术存在直流母线电压利用率低等问题,提出了一种基于伏秒积矢量的分析方法。通过对正弦脉宽调制的伏秒积矢量分析得出:改进后SPWM逆变器提高了直流电压利用率,与空间电压矢量调制相近,而运算量比空间电压矢量调制要降低很多,并有效减小了开关频率。以三相全桥逆变器为对象,对提出的改进正弦脉宽调制进行仿真实验和样机实验。实验结果验证了上述方法的正确性和有效性。  相似文献   

10.
In this paper we study a parallel form of the SOR method for the numerical solution of the Convection Diffusion equation suitable for GPUs using CUDA. To exploit the parallelism offered by GPUs we consider the fine grain parallelism model. This is achieved by considering the local relaxation version of SOR. More specifically, we use SOR with red-black ordering using two sets of parameters ω1ijω1ij and ω2ijω2ij for the 5 point stencil. The parameter ω1ijω1ij is associated with each red (i + j   even) grid point (i,j)(i,j), whereas the parameter ω2ijω2ij is associated with each black (i+j(i+j odd) grid point (i,j)(i,j). The use of a parameter for each grid point avoids the global communication required in the adaptive determination of the best value of ωω and also increases the convergence rate of the SOR method (Varga, 1962) [38] and (Young, 1971) [41]. We present our strategy and the results of our effort to exploit the computational capabilities of GPUs under the CUDA environment. Additionally, two parallel CPU programs utilizing manual SSE2 (Streaming SIMD Extensions 2) and AVX (Advanced Vector Extensions) vectorization were developed as performance references. The optimizations applied on the GPU version were also considered for the CPU version. Significant performance improvement was achieved with all three developed GPU kernels differentiated by the degree of recomputations thus affecting the flops per element access ratio.  相似文献   

11.
《国际计算机数学杂志》2012,89(3-4):355-369
For the solution of the linear system Ax=b many iterative methods based on a splitting of A exist. Among them the Jacobi, the Gauss-Seidel and the Successive Overrelaxation (SOR) methods as well as their extrapolated counterparts are the most popular. This paper presents a new general method such that the aforementioned methods become special cases of it. Besides its four degrees of freedom, which make it a very flexible method, another of its main characteristics is that it is well-defined even when some elements on the diagonal of A are zero. The first results concerning the new method show that a proper exploitation of its basic properties will make it a very powerful technique.  相似文献   

12.
The IBM 3090 is a vector multiprocessor with a hierarchical memory system. We show with two examples (the LU and Householder factorizations) that the complex memory system and the vector hardware can be used efficiently by recasting the basic algorithms in terms of high-level matrix-matrix modules.  相似文献   

13.
To effectively utilize information stored in a digital image library, effective image indexing and retrieval techniques are essential. This paper proposes an image indexing and retrieval technique based on the compressed image data using vector quantization (VQ). By harnessing the characteristics of VQ, the proposed technique is able to capture the spatial relationships of pixels when indexing the image. Experimental results illustrate the robustness of the proposed technique and also show that its retrieval performance is higher compared with existing color-based techniques.  相似文献   

14.
决策树支持向量机多分类器设计的向量投影法   总被引:1,自引:1,他引:1  
针对如何有效地设计决策树支持向量机(SVM)多类分类器的层次结构这个关键问题,提出一种基于向量投影的类间可分性测度的设计方法,并给出一种基于该类间可分性测度设计决策树SVM多分类器层次结构的方法.为加快每个SVM子分类器的训练速度且保持其高推广性,将基于向量投影的支持向量预选取方法用于每个子分类器的训练中.通过对3个大规模数据集和手写体数字识别的仿真实验表明,新方法能有效地提高决策树SVM多类分类器的分类精度和速度.  相似文献   

15.
一种支持向量逐步回归机算法研究   总被引:2,自引:2,他引:2       下载免费PDF全文
支持向量机是解决非线性问题的重要工具,对多元线性回归模型和支持向量机的原始形式进行比较,拟定从样本子集的多元线性回归模型出发,逐步搜索支持向量,提出了一种建立支持向量回归机的快速算法,以降低核矩阵的规模从而降低解凸二次规划的复杂度;最后,分析了该算法的复杂度,并提供了一个算例。  相似文献   

16.
The termination of iterative algorithms on a distributed network of transputers is an important issue with the increasing usage of parallel computers.

In this paper we analyse the computational and communication costs of performing the convergence tests on the solution of the Laplace Equation on a two dimensional region,.i.e., the unit square.

Finally a strategy of terminating the iteration without convergence testing is demonstrated.  相似文献   

17.
G. Radicati  Y. Robert  P. Sguazzero 《Calcolo》1988,25(1-2):153-167
The IBM 3090 is a vector multiprocessor with a hierarchical memory system. We show with two examples (the LU and Householder factorizations) that the complex memory system and the vector hardware can be used efficiently by recasting the basic algorithms in terms of high-level matrix-matrix modules.  相似文献   

18.
基于双相锁相放大器的微弱信号矢量测量   总被引:2,自引:0,他引:2  
蔡屹 《微计算机信息》2007,23(25):111-112,228
噪声存在于任何一个系统中,当所要检测的信号比较微弱且淹没在强噪声背景中时,用传统的检测方法检测信号非常困难,因此如何把淹没于强噪声中的有用信号提取出来的问题越来越引起人们的关注。该文采用时域检测方法,利用移相技术得到相互正交的参考方波信号,通过互相关算法,完成了正交矢量型LIA相关器的实现,利用该方法实现了对微弱信号幅值和相位的检测,有效地抑制了干扰,提高了系统的性能。  相似文献   

19.
20.
一种改进的支持向量机在入侵检测系统中的应用   总被引:1,自引:0,他引:1  
入侵检测作为一种积极主动的防御技术,已成为信息安全领域的重要研究内容。将统计机器学习方法引入到入侵检测技术中,具有重要的现实意义。但单纯使用支持向量机的机器学习方法对网络连接记录进行分类,对于远离分类超平面的正负实例点能以充分大的确信度将它们区分开来,但对于离分类面比较近的实例点,被正确分类的可信度较低,还有可能因为各种主客观的因素造成误分类。为此,引入K近邻法,对分类面附近的实例点进行二次分类,并借鉴KDDCUP99公共数据集描述网络连接的41个特征进行了仿真实验,实验结果表明,相比单独使用支持向量机的方法,分类的准确率有了进一步的提高。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号