首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 234 毫秒
1.
通过改变CR算法的计算次序。提出了一种改进的共轭剩余(ICR)算法.对比CR算法。ICR算法的数值稳定性和CR算法相同,几乎没有增加计算量。但考虑了在MIMD并行机上实现时并行算法的性能,其同步开销减少为CR算法的一半,并且所有内积计算以及矩阵向量乘是独立的,没有数据相关性。可以进行计算与通信的重叠.从理论和实验两个角度来讨论ICR算法的性能,当处理机台数较多时ICR算法的计算速度快于CR算法.在64台处理机机群上进行的数值实验表明,并行ICR算法的计算速度大约比CR算法快30%.  相似文献   

2.
本文根据影响并行蚁群算法性能的关键因素,提出了一种自适应的并行蚁群算法.首先提出了基于适应度和基于距离选择的两种不同的信息交流策略,使得各处理机自适应地选择与之进行信息交换的处理机,然后采用自适应的更新策略进行信息素的更新.为了增强该算法的搜索能力,还根据解的多样性给出了自适应地调节处理机之间的信息交流周期的方法.在MPP处理机深腾1800上对TSP问题的实验结果表明了该算法在保证有效的加速比的同时,具有很好的收敛性.  相似文献   

3.
并行蚁群算法中的自适应交流策略   总被引:6,自引:0,他引:6  
陈崚  章春芳 《软件学报》2007,18(3):617-624
提出了并行蚁群算法中处理机间信息交流的两种策略,使得各处理机能够自适应地选择其他处理机以进行信息交换和相应信息素的全局更新.还提出了一种确定处理机之间进行信息交流的时间的策略,可以根据解的分布情况自适应地确定信息交流的时间,以取得全局收敛速度和解的多样性之间的平衡.在算法每一次信息交换后,采用自适应的更新策略,根据信息素的均匀度进行信息素的更新,从而避免了早熟和局部收敛.在MPP处理机曙光2000上对TSP问题的实验结果,表明了基于该自适应信息交换策略的并行蚁群算法比其他算法具有更好的收敛性、更高的加速比  相似文献   

4.
在嵌入式并行计算系统中,任务调度是决定系统性能的关键。多任务调度中,启发式调度法是一种设计简单且性能良好的调度方法。目前的调度算法大多是基于任务复制的,没有充分考虑前驱任务与其后继任务间的相关性。该文提出了一种基于相关任务优化(DTO)的调度算法,通过分析已用处理机的负载和空闲时间,尽量减少系统的调度长度和处理机数目。算法分析结果表明,DTO算法在性能上优于其他算法,对嵌入式并行计算系统中的多任务调度是一个较好的选择。  相似文献   

5.
混合型实时容错调度算法的设计和性能分析   总被引:17,自引:2,他引:15  
以往文献中研究的实时容错调度算法都只能调度单一的具有容错需求的任务.该文建立了一个混合型实时容错调度模型,提出一种静态实时容错调度算法.该算法能同时调度具有容错需求的实时任务和无容错需求的实时任务.该文还提出了一个求解最小处理机个数的算法,用于对静态实时容错调度算法的性能进行模拟分析.为了提高静态调度算法的调度性能,提出了一种动态调度算法.最后,通过模拟实验分析了静态和动态调度算法的性能.实验表明,调度算法的性能与实时任务的个数、任务的计算时间、周期和处理机个数等系统参数相关.  相似文献   

6.
基于异构分布式系统的实时容错调度算法   总被引:26,自引:1,他引:26  
目前文献中研究的实时容错调度算法都是基于同构分布式系统,系统中的所有处理机完全相同。该文首先建立了一个基于异构分布式系统实时容错调度模型,异构分布式系统中的各个处理机均不相同。基于该异构分布式系统模型,该文引入了可靠性代价(reliability cost)概念,并提出两种静态实时容错调度算法(RTFTNO和RTFTRC)用于调度周期性实时容错任务。算法RTFTRC在调度任务时,尽量使系统的可靠性代价最小;而算法RTFTNO在调度实时任务时,没有考虑系统的可靠性代价。该文详细讨论了两种调度算法的性能。性能模拟实验分别比较了两个算法的可靠性代价,超时比率和可调度性;并研究了任务的计算时间与可靠性代价的关系以及调度长度阈值与最小处理机个数的关系。实验结果表明,算法RTFTRC的性能优于算法RTFTNO。  相似文献   

7.
该文针对经典雅可比算法求对称矩阵特征值不但要选主元素,而且还要同时进行行、列旋转变换、数据相关关系复杂、额外计算开销大、不易并行的缺点,提出了一种基于矩阵单侧旋转的算法并对此算法进行分析。最后通过该算法在PC机和分布式存储的大规模并行处理机曙光1000上的实验数据对比验证了该算法的性能较雅可比算法优越。  相似文献   

8.
针对二分图匹配算法在任务之间存在时序关系时无法进行有效调度以及EFT算法没有充分考虑各处理机性能及网络通信状况的问题,提出基于二分图匹配的改进ETF算法。该算法综合考虑任务之间的时序关系、处理机的性能、处理机之间的通信情况及已处理任务的调度情况,利用二分图最佳匹配思想对局部任务进行调度。实验表明该算法具有较小的调度长度和较好的负载均衡性。  相似文献   

9.
基于PVM的线性方程组的一种网上并行迭代算法   总被引:1,自引:0,他引:1  
针对基于PVM的桌面PC机联网而成的网络并行计算环境中,处理机的运算速度较快,而处理机间的通信相对较慢的实际情况,提出了求解线性方程组的一种分组Guass-Seidel并行迭代算法,该算法将线性方程组的增广矩阵按行分块储存在各处理机,每台处理机分别对各自的块采用Guass-Seidel迭代法进行迭代计算,其处理机间的通信较少,实现容易。并用1~24台桌面PC机联成的局域网,在PVM 3.4 on Windows2000,VC 6.0并行计算平台上编程对该算法进行了数值试验,试验结果表明,该算法较传统的Jacobi并行迭代算法和传统的Guass—Seidel并行迭代算法更优越。  相似文献   

10.
基于精确划分的思想提出了一种新的集合差并行算法DIFF—DL。利用DL子算法查找最终全局序列中等分位置上的划分点,将数据平均划分并分配给所有处理机,使每个处理机具有相同的工作负载。给出了网络并行计算环境下的实验结果,并与DIFF-S、DIFF-NS算法进行了对比。理论分析和实验的结果都表明,DIFF-DL算法具有很高的并行效率和扩展性,原因是划分类算法的性能和划分后区间数据量的平均程度正相关。  相似文献   

11.
A new multiresolution coarse-to-fine search algorithm for efficient computation of the Hough transform is proposed. The algorithm uses multiresolution images and parameter arrays. Logarithmic range reduction is proposed to achieve faster convergence. Discretization errors are taken into consideration when accumulating the parameter array. This permits the use of a very simple peak detection algorithm. Comparative results using three peak detection methods are presented. Tests on synthetic and real-world images show that the parameters converge rapidly toward the true value. The errors in ρ and &thetas;, as well as the computation time, are much lower than those obtained by other methods. Since the multiresolution Hough transform (MHT) uses a simple peak detection algorithm, the computation time will be significantly lower than other algorithms if the time for peak detection is also taken into account. The algorithm can be generalized for patterns with any number of parameters  相似文献   

12.
This note is a sequel of paper (Escudero et al. (2012) [1]), in which the sequential Branch-and-Fix Coordination referred to as the BFC-MS algorithm was introduced for solving large-scale multistage mixed 0–1 optimization problems up to optimality under uncertainty. The aim of the note is to present the parallelization version of the BFC algorithm, referred to as PC-BFCMS, so that the elapsed time reduction on problem solving is analyzed. The parallelization is performed at two levels. The inner level parallelizes the optimization of the MIP submodels attached to the set of scenario clusters that have been created by the modeler-defined break stage to decompose the original problem, where the nonanticipativity constraints are partially relaxed. Several strategies are presented for analyzing the performance of the inner parallel computation based on MPI (Message Passing Interface) threads to solve scenario cluster submodels versus the sequential version of the BFC-MS methodology. The outer level of parallelization defines a set of 0–1 variables, the combinations of whose 0–1 values, referred to as paths (one for each combination), allow independent models to be optimized in parallel, such that each one can itself be internally optimized with the inner parallelization approach. The results of using the outer–inner parallelization are remarkable: the larger the number of paths and MPI threads (in addition to the number of threads that the MIP solver allows to be used), the smaller the elapsed time to obtain the optimal solution. This new approach allows problems to be solved faster, and, can thus solve very large scale problems that could not otherwise be solved by plain use of a state-of the-art MIP solver, or could not be solved by the sequential version of the decomposition algorithm in acceptable elapsed time.  相似文献   

13.
This paper analyzes the performance and scalability of an iteration of the preconditioned conjugate gradient algorithm on parallel architectures with a variety of interconnection networks, such as the mesh, the hypercube, and that of the CM-5 parallel computer. It is shown that for block-tridiagonal matrices resulting from two-dimensional finite difference grids, the communication overhead due to vector inner products dominates the communication overheads of the remainder of the computation on a large number of processors. However, with a suitable mapping, the parallel formulation of a PCG iteration is highly scalable for such matrices on a machine like the CM-5 whose fast control network practically eliminates the overheads due to inner product computation. The use of the truncated Incomplete Cholesky (IC) preconditioner can lead to further improvement in scalability on the CM-5 by a constant factor,as a result, a parallel formulation of the PCG algorithm with IC preconditioner may execute faster than that with a simple diagonal preconditioner even if the latter runs faster in a serial implementation  相似文献   

14.
The conjugate gradient squared (CGS) algorithm is a Krylov subspace algorithm that can be used to obtain fast solutions for linear systems (Ax=b) with complex nonsymmetric, very large, and very sparse coefficient matrices (A). By considering electromagnetic scattering problems as examples, a study of the performance and scalability of this algorithm on two MIMD machines is presented. A modified CGS (MCGS) algorithm, where the synchronization overhead is effectively reduced by a factor of two, is proposed in this paper. This is achieved by changing the computation sequence in the CGS algorithm. Both experimental and theoretical analyses are performed to investigate the impact of this modification on the overall execution time. From the theoretical and experimental analysis it is found that CGS is faster than MCGS for smaller number of processors and MCGS outperforms CGS as the number of processors increases. Based on this observation, a set of algorithms approach is proposed, where either CGS or MGS is selected depending on the values of the dimension of the A matrix (N) and number of processors (P). The set approach provides an algorithm that is more scalable than either the CGS or MCGS algorithms. The experiments performed on a 128-processor mesh Intel Paragon and on a 16-processor IBM SP2 with multistage network indicate that MCGS is approximately 20% faster than CGS.  相似文献   

15.
16.
In this paper, a new algorithm is presented to compute the disparity map from a stereo pair of images by using Belief Propagation (BP). While many algorithms have been proposed in recent years, the real-time computation of an accurate disparity map is still a challenging task. The computation time and run-time memory requirements are two very important factors for all real-time applications. The proposed algorithm divides the matching process into two steps; they are initial matching and disparity map refinement. Initial matching is performed by memory efficient hierarchical belief propagation algorithm that uses less than half memory at run-time and minimizes the energy function at much faster rate as compare to other hierarchical BP algorithms that makes it more suitable for real-time applications. Disparity map refinement uses a simple but very effective single-pass approach that improves the accuracy without affecting the computation cost. Experiments by using Middlebury dataset demonstrate that the performance of our algorithm is the best among other real-time stereo matching algorithms.  相似文献   

17.
无线传感器网络中路由选择算法的研究   总被引:3,自引:2,他引:1  
针对无线传感器网络中采用多跳方式建立路由的特点,将蚁群算法用于在无线传感器网络中寻找多跳路由,通过一组"人工蚂蚁"采用并行搜索方式,寻找从源节点到目的节点的最少跳数路径;在算法中通过引入约束条件,既可降低算法的计算开销,又加快了算法的收敛速度;仿真结果说明将该算法用于无线传感器网络中搜寻路由是有效的,且具有鲁棒性特点,同时比传统的路由算法具有更低的时间复杂度。  相似文献   

18.
This paper presents a new cost-effective algorithm to compute exact loop bounds when multilevel tiling is applied to a loop nest having affine functions as bounds (nonrectangular loop nest). Traditionally, exact loop bounds computation has not been performed because its complexity is doubly exponential on the number of loops in the multilevel tiled code and, therefore, for certain classes of loops (i.e., nonrectangular loop nests), can be extremely time consuming. Although computation of exact loop bounds is not very important when tiling only for cache levels, it is critical when tiling includes the register level. This paper presents an efficient implementation of multilevel tiling that computes exact loop bounds and has a much lower complexity than conventional techniques. To achieve this lower complexity, our technique deals simultaneously with all levels to be tiled, rather than applying tiling level by level as is usually done. For loop nests having very simple affine functions as bounds, results show that our method is between 15 and 28 times faster than conventional techniques. For loop nests caving not so simple bounds, we have measured speedups as high as 2,300. Additionally, our technique allows eliminating redundant bounds efficiently. Results show that eliminating redundant bounds in our method is between 22 and 11 times faster than in conventional techniques for typical linear algebra programs.  相似文献   

19.
This paper presents new efficient shortest path algorithms to solve single origin shortest path problems (SOSP problems) and multiple origins shortest path problems (MOSP problems) for hierarchically clustered data networks. To solve an SOSP problem for a network with n nodes, the distributed version of our algorithm reaches the time complexity of O(log(n)), which is less than the time complexity of O(log 2 (n)) achieved by the best existing algorithm. To solve an MOSP problem, our algorithm minimizes the needed computation resources, including computation processors and communication links for the computation of each shortest path so that we can achieve massive parallelization. The time complexity of our algorithm for an MOSP problem is O(m log(n)), which is much less than the time complexity of O(M log2 (0)) of the best previous algorithm. Here, M is the number of the shortest paths to be computed and m is a positive number related to the network topology and the distribution of the nodes incurring communications, m is usually much smaller than M. Our experiment shows that m is almost a constant when the network size increases. Accordingly, our algorithm is significantly faster than the best previous algorithms to solve MOSP problems for large data networks  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号