首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In this paper, an exhaustive parallel library of sparse iterative methods and preconditioners in HPF and MPI was developed, and a model for predicting the performance of these codes is presented. This model can be used both by users and by library developers to optimize the efficiency of the codes, as well as to simplify their use. The information offered by this model combines theoretical features of the methods and preconditioners in addition to certain practical considerations and predictions about aspects of the performance of their execution in distributed memory multiprocessors.  相似文献   

2.
In this paper we present two algorithms for the parallel solution of first-order linear recurrences, We show that the algorithms can be used to efficiently solve both scalar and blocked versions of the problem on vector and SIMD architectures. The first algorithm is a parallel approach whose resulting code can be explicitly vectorized, making it suitable for efficient execution on vector architectures such as the Cray 2. The second algorithm is a modified recursive approach designed to reduce the communication overhead encountered in SIMD architectures such as the Connection Machine 2 (CM-2). We present the performance exhibited by the parallel algorithm implementations on the Cray 2 and CM-2 for both scalar and blocked versions of the recurrence problem.  相似文献   

3.
The problem of solving tridiagonal linear systems on parallel distributed-memory environments is considered in this paper. In particular, two common direct methods for solving such systems are considered: odd-even cyclic reduction and prefix summing. For each method, a variety of lower bounds on execution time for solving tridiagonal linear systems are presented. Specifically, lower bounds are presented that (a) hold when the number of data items per processor is bounded, (b) are general lower bounds, and (c) for specific data layouts commonly used in designing parallel algorithms to solve tridiagonal linear systems. Furthermore, algorithms are presented that have running times within a constant factor of the lower bounds provided. Lastly, a comparison of bounds for odd-even cyclic reduction and prefix summing is given.  相似文献   

4.
测试并分析了高性能预条件库HYPRE的多重网格解法器SMG和BoomerAMG在某国产大规模并行机数千个处理器上的可扩展性能,得到若干对线性解法器算法研究和并行实现技术发展具有启示性意义的结论.这些结论对实际复杂物理系统数值模拟中线性解法器的应用和发展具有一定的指导意义.  相似文献   

5.
6.
In this paper we present a new parallel algorithm for the solution of the incompressible two- and three-dimensional Navier-Stokes equations. The parallelization is achieved via domain decomposition. The computational region is considered in the form of a 2-D or 3-D periodic box decomposed into parallel strips (slabs). For time discretization we use a third order multistep method of [11]. The time discretization procedure results in solving global elliptic problems of (monotonic) Helmholtz and Poisson types in each time step. For the space discretization we employ the multidomain local Fourier (MDLF) method that was developed in [9, 10, 13]. The discretization in the periodic directions is performed by the standard Fourier method. In the direction across the strips we use the Local Fourier Basis technique which involves the overlapping of the neighboring subdomains and smoothing of local functions across the interior boundaries (interfaces). The matching of the local solutions is performed by adding properly weighted interface Green's functions. Their amplitudes are found in terms of the jumps of the solution and its first derivatives at the interfaces. The present paper extends the results of our previous works [1, 9, 10, 13] on parallel use of the MDLF method in three-fold aspects: 1. In [1] a model Navier-Stokes type system was considered which does not include the pressure term. Correspondingly, in each time step only the Helmholtx type equations were solved. It was shown that the parallel solution of this equation can be accomplished using only local (neighbor-to-neighbor) communication due to localization properties of the Helmholtz operator. We consider the complete Navier-Stokes system including the pressure term. The solution of the Poisson equation for pressure has the potential to degrade the performance and the achieved speedup of a parallel algorithm due to the global nature of this equation that necessitates global communication among the processors. However, we show that only a few lowest harmonics require for the global data transfer whereas the rest of harmonics can be treated locally. Therefore, most of the communication that is required for parallelization of the Navier-Stokes solver using the MDLF method is mainly local between adjacent subdomains (processors). Moreover, the percentage of the time spent in global communication reduces as the size of the problem increases. Thus, the present parallel algorithm is highly scalable. 2. In [l] we considered only 2-D equations. In this paper we extend the previous technique to 3-D problems. 3. Previously, the MDLF solver was implemented only on the MEIKO parallel machine. In this paper the 2-D and 3-D Navier-Stokes solvers are implemented on three MIMD message-passing multiprocessors (a 60-processors IBM SP2, a 20-processors MOSIX [3], and a network of 10 Alpha workstations) and achieve an efficiency of more than 70% to 95%. The same code written with the PVM (parallel virtual machine [7]) software package was executed on all the above distinct computational platforms. Detailed performance results, which include scalability analysis, are presented. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

7.
介绍了一种日常参数智能控制器的研制及研制过程中的几个重要问题的解决方法。特别是用于城市照明及美化城市的彩灯的开、关时间智能控制方面,给出了依据当地绝对时间、经度、纬度进行昼夜时间计算的方法,使之在不接光电传感器情况下可跟踪季节的昼夜变化。控制系统采用89C2051单片机,配有时钟日历芯片DS12887,系统体积小、功能强、可靠性高。  相似文献   

8.
频繁子图挖掘是数据挖掘领域的一个重要问题,并且有着广泛的应用。在Hadoop平台上实现了一种基于MapReduce的高效频繁子图挖掘算法Cloud-GFSG(cloud-global frequent subgraph)。该算法基于Apriori思想,在扩展边生成新的子图时,使用已经挖掘出的k-1阶的频繁子图生成k阶的频繁子图。同时,检查是否存在待扩展生成的子图,设定生成的频繁子图表示规则,保证了频繁子图信息的唯一性。较同类算法相比,该算法在挖掘频繁子图时更具通用性,并且在扩展边时避免产生大量的复制图,从而使得算法的正确性得以保证,且运行效率显著提高。  相似文献   

9.
The complexity of characterizing both parallel hardware and software makes it very difficult to explain and predict the performances of parallel programs for real industrial CFD applications. A performance model based on a generalized Amdahl's formulation has been developed and applied to a flow solver. The present formulation allows us to explain the behavior of a typical CFD explicit multiblock solver when the program is run on a multiprocessor distributed-memory system. Using this approach, it is possible to gain an insight on the performance limitations of this class of parallel solvers, by considering the impact of larger and larger number of processors on fixed-scaled problems.  相似文献   

10.
《Real》2001,7(1):47-57
In this paper we present parallel algorithms to solve the problem of image restoration when the Point Spread Function is Space Variant. The problem has a very high computational complexity and it is very hard to solve it on scalar computers. The algorithms are based on the decomposition of the image spatial domain and on the solution of both constrained and unconstrained restoration subproblems of size smaller than the original. The main results can be summarized as follows: (a) the quality of restorations do not depend on the number of subdomains; (b) the unconstrained restoration is scalable and efficient even with a large number of processors while the constrained restoration is efficient for subdomains of more than 50×50 pixels. The numerical tests have been executed on a Cray T3E with 128 processors and on a network of workstations.  相似文献   

11.
Because of the inherent NP-completeness of SAT, many SAT problems currently cannot be solved in a reasonable time. Usually, in order to tackle a new class of SAT problems, new ad hoc algorithms must be designed. Another way to solve a new problem is to use a generic solver and employ parallelism to reduce the solve time. In this paper we propose a parallelization scheme for a class of SAT solvers based on the DPLL procedure. The scheme uses a dynamic load-balancing mechanism based on work-stealing techniques to deal with the irregularity of SAT problems. We parallelize Satz, one of the best generic SAT solvers, with our scheme to obtain a parallel solver called PSatz. The first experimental results on random 3-SAT problems and a set of well-known structured problems show the efficiency of PSatz. PSatz is freely available and runs on any networked workstations under Unix/Linux.  相似文献   

12.
ANTS: Agents on Networks, Trees, and Subgraphs   总被引:1,自引:0,他引:1  
Efficient exploration of large networks is a central issue in data mining and network maintenance applications. In most existing work there is a distinction between the active ‘searcher’ which both executes the algorithm and holds the memory and the passive ‘searched graph’ over which the searcher has no control at all. Large dynamic networks like the Internet, where the nodes are powerful computers and the links have narrow bandwidth and are heavily-loaded, call for a different paradigm, in which a noncentralized group of one or more lightweight autonomous agents traverse the network in a completely distributed and parallelizable way. Potential advantages of such a paradigm would be fault tolerance against network and agent failures, and reduced load on the busy nodes due to the small amount of memory and computing resources required by the agent in each node. Algorithms for network covering based on this paradigm could be used in today’s Internet as a support for data mining and network control algorithms. Recently, a vertex ant walk ( ) method has been suggested [I.A. Wagner, M. Lindenbaum, A.M. Bruckstein, Ann. Math. Artificial Intelligence 24 (1998) 211–223] for searching an undirected, connected graph by an a(ge)nt that walks along the edges of the graph, occasionally leaving ‘pheromone’ traces at nodes, and using those traces to guide its exploration. It was shown there that the ant can cover a static graph within time nd, where n is the number of vertices and d the diameter of the graph. In this work we further investigate the performance of the method on dynamic graphs, where edges may appear or disappear during the search process. In particular we prove that (a) if a certain spanning subgraph S is stable during the period of covering, then the method is guaranteed to cover the graph within time nds, where ds is the diameter of S, and (b) if a failure occurs on each edge with probability p, then the expected cover time is bounded from above by nd((logΔ/log(1/p))+((1+p)/(1−p))), where Δ is the maximum vertex degree in the graph. We also show that (c) if G is a static tree then it is covered within time 2n.  相似文献   

13.
并行编译中一种线性数据和计算划分算法   总被引:1,自引:1,他引:0       下载免费PDF全文
对于高性能并行计算机而言,如何找到一种好的计算和数据划分,对数据和计算进行合理划分,增加数据本地化来减少处理器间的通信是提高其并行性能的关键。该文讨论了一种线性的自动进行无数据重组的计算和数据划分算法。  相似文献   

14.
沈雁  戴瑜兴 《计算机工程》2019,45(2):284-289
在OpenCL并行计算框架的clMAGMA库中,Cholesky分解算法采用大尺寸分块并行方法,不能充分利用GPU的高速局部存储器,且在计算过程中存在多次GPU-CPU间的数据传递。为此,提出采用小尺寸分块并行方法,充分利用GPU中的高速局部存储器,使矩阵子块的逆矩阵得到复用,完成对称正定矩阵的高效Cholesky分解,并且其能够应用于三维视觉光束平差问题中的大型正定矩阵的分解。实验结果表明,该方法的Cholesky分解速度比clMAGMA提升50%以上,针对光束平差问题,比Ceres Solver中使用的Eigen库速度提升约38倍。  相似文献   

15.
Cholesky分解细粒度并行算法   总被引:1,自引:0,他引:1  
本文提出了一种Cholesky分解细粒度流水线并行算法,该算法可以处理任意规模的数据,可以充分开发FP-GA加速器提供的细粒度并行。实验表明,该算法具有很好的可扩展性,在Xilinx XC5 VLX330 FPGA上能够集成36个处理单元(PE),当矩阵的阶为16384、运行频率为200MHz时性能达到14.3GFLOPS。  相似文献   

16.
We present a strategy for parallelizing computations that use the transport method. It combines spatial domain decomposition with domain replication to realize the scaling benefits of replication while allowing for problems whose computational mesh will not fit in a single processor's memory. The mesh is decomposed into a small number of spatial domains—typically fewer domains than there are processors—and heuristics are used to estimate the computational effort required to generate the solution in each subdomain using Monte Carlo. That work estimate determines the number of times a subdomain is replicated relative to the others. Timing of runs for two problems show that the new method scales better than traditional domain decomposition.  相似文献   

17.
刘泓  李平  闻育 《控制与决策》2006,21(11):1214-1218
应用蚁群优化算法求解复杂大规模多阶段决策问题时,其计算量会随着阶段数和各阶段离散化容许决策集合规模的增加成指数增长,造成无法在单PC中进行计算.针对这一问题,提出了基于解构造图拆分的并行蚁群算法.该算法通过应用并行计算技术,将解构造图拆分成若干块,把每一块的计算任务放置在不同的PC上并行执行,互相合作完成整个计算任务.经实验验证,这种算法可以快速有效地进行问题的求解.  相似文献   

18.
19.
A geometric method is presented to determine the unmanipulable singular configurations of a general class of parallel mechanisms. In unmanipulable singular configurations, the composite Jacobian matrix that maps active joints velocities to end-effector velocity loses rank, indicating the loss of a task degree-of-freedom. Finding unmanipulable configurations is difficult due to the complexity of the Jacobian matrix. The problem is greatly simplified by a novel decomposition of the matrix presented in this paper. The method is used to find singularities in several example parallel machines.  相似文献   

20.
A parallel algorithm for computing the generalized singular value decomposition of two matrices A and B having the same number of columns is described in this paper. The algorithm is designed for efficient implementation on distributed-memory parallel computer architectures. The time cost is O(n2) units for parallel preprocessing, and O(n2/p) units for the GSVD of two upper trapezoidal matrices, where p is the dimension of the triangular array of processors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号