首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 149 毫秒
1.
针对大规模结构非线性动力问题的有限元分析非常耗时,基于消息传递接口(MPI)机群环境,提出多种基于并行求解策略的显式有限元并行算法。基于显式消息传递的区域分解技术,采取重叠、非重叠区域分解技术及动态任务分配方法,通过将计算与通信重叠,优化处理器间的通信,对非重叠通信区域分解并行算法、重叠通信区域分解并行算法、群动态任务分配算法、动态任务分配算法及动态负载平衡算法进行研究。为在机群环境下实现非线性动力有限元分析,开发了基于有效并行求解策略的显式有限元并行算法。编写了基于消息传递编程模式的并行有限元程序,在工作站机群上实现了数值算例,分析了算法的性能,并与传统的Newmark算法进行了比较。算例表明:群动态任务分配算法的性能优于动态任务分配算法,低于区域分解算法的性能,动态负载平衡算法最优。对相同规模的问题提出的算法比Newmark算法快,优于Newmark算法。对结构非线性动力问题的有限元分析,所提出的并行算法是可行有效的。  相似文献   

2.
弹性力学数值模拟被广泛应用到建筑、机械、化工、材料、航天等工程领域.随着计算规模和精度的不断提高,普通串行程序已经不能满足应用的需求,需要研制并行应用程序.面向非结构网格,提出了一种基于层次化网格数据结构的并行有限元算法,并用来求解弹性力学方程组.最后,用数值结果验证了网格数据结构和并行算法的正确性和扩展性.数值结果显示弹性力学并行程序可成功扩展到4 080进程,网格规模达到15亿单元.  相似文献   

3.
目前,在研究有限元并行计算时,讨论并行算法理论和并行算法设计与分析的居多,研究并行算法的实现并解决实际问题较少。在Beowulf集群环境下,采用EBE策略设计出基于PVM平台的EBE-PCG算法,并通过一个电法勘探的典型工程算例对该算法在有限元计算中的性能进行了测试。实验结果表明,该算法加速比和并行效率均较为理想;在处理同等规模的问题时,同CG算法、PCG算法相比,具有并行度更高,耗时更少等优点。  相似文献   

4.
EBE-PCG算法在有限元并行计算中的应用研究   总被引:1,自引:1,他引:0  
目前,在研究有限元并行计算时,讨论并行算法理论和并行算法设计与分析的居多, 研究并行算法的实现并解决实际问题较少.在Beowulf集群环境下,采用EBE策略设计出基于 PVM平台的EBE-PCG算法,并通过一个电法勘探的典型工程算例对该算法在有限元计算中的性能进行了测试.实验结果表明,该算法加速比和并行效率均较为理想;在处理同等规模的问题时,同CG算法、PCG算法相比,具有并行度更高,耗时更少等优点.  相似文献   

5.
本文给出了具有最小面积约束的一类边界问题的数值求法,同时实现了该算法的并行化,在算例中,介绍了利用Jacobi迭代求解曲顶柱体顶面面积最小值的一种并行算法,并阐述了解决这一问题的实际意义,算例结果表明,该并行算法的并行效率令人满意。  相似文献   

6.
王海兵 《计算机应用》2011,31(Z1):172-173,176
通过重载MPI消息传递函数,在重载的MPI函数中调用MPE库中各日志记录函数,实现了大规模面向对象有限元程序自定义并行性能监测。对一个典型冲击动力学问题进行了16 CPU的并行有限元模拟,通过并行性能监测对其有限元并行算法进行了分析。  相似文献   

7.
付朝江  陈洪均 《计算机应用》2015,35(12):3387-3391
针对弹塑性问题的有限元分析非常耗时,基于消息传递接口(MPI)集群环境,提出了残余平滑的子结构预处理共轭梯度并行算法。采取区域分解,将子结构通过界面条件处理为独立的有限元模型。整体分析时,每个处理器仅存储与其相关的子结构信息并生成局部刚度矩阵。采用对角存储方式和最小残余平滑法,设计出了结合残余平滑(MR)的并行子结构预处理共轭梯度(PCG)算法。并行算法中对负载平衡进行了探讨,对处理器间的通信进行了优化。利用子步法对弹塑性应力应变进行积分,根据预定的容许值自动调整每个子步的大小来控制积分过程的误差。在工作站集群上实现了数值算例,分析了算法的性能,计算性能与传统的PCG算法进行了比较。算例显示:所提算法具有良好的加速比和效率,优于传统的PCG算法,对弹塑性问题的有限元分析,是一种有效的并行求解算法。  相似文献   

8.
本文提出了一种适于在并行计算机上进行信息处理的数据结构。此结构分两层,第一层是以向量为结点的表,第二层则由向量分量导出独立表,故称之为“向量双层表”。文中详细论述了向量双层表结构的组成和特点,并给出了基于这种数据结构的并行算法。  相似文献   

9.
讨论自适应有限元计算中常用的标记策略的并行实现问题,介绍并行自适应有限元软件平台PHG中实现这些策略的统一函数接口.特别地,针对一类在分布式存储并行计算机上不易实现的策略,如GERS策略和MNS策略,介绍我们所设计的并行算法.  相似文献   

10.
本文描述了NPAB—1并行算法库的结构和功能,并通过两个例子说明了库中并行算法的设计风格.  相似文献   

11.
有限元并行程序设计与实现   总被引:1,自引:0,他引:1  
1.引言有限元并行计算的一个主要途径是利用子结构方法山;并行对各子结构进行静凝聚,再并行求解界面方程,然后并行回代求内点位移和计算应变、应力.并行程序的设计与有效实现强烈地依赖于并行机硬件的计算模型.网络并行计算由于具有巨大的计算潜能、良好的性能价格比和可扩展性,以及灵活的体系结构等优点,和以PVM,MPI,EXPRESSP[2,3]等为代表的一批基于消息传递的并行程序设计软件平台的出现,使得可伸缩分布式网络并行有限元成了有限元并行计算的一个重要方向.本文详细介绍了基于PVM的分布式网络并行环境下有限元并行分…  相似文献   

12.
In this paper fast parallel Preconditioned Conjugate Gradient (PCG) algorithms for robot manipulator forward dynamics, or dynamic simulation, problem are presented. By exploiting the inherent structure of the forward dynamics problem, suitable preconditioners are devised to accelerate the iterations. Also, based on the choice of preconditioners, a modified dynamic formulation is used to speedup both serial and parallel computation of each iteration. The implementation of the parallel algorithms on two interconnected processor arrays is discussed and their computation and communication complexities are analyzed. The simulation results for a Puma Arm are presented to illustrate the effectiveness of the proposed preconditioners. With a faster convergence due to preconditioning and a faster computation of iterations due to parallelization, the developed parallel PCG algorithms represent the fastest alternative for parallel computation of the problem withO(n) processors.  相似文献   

13.
为提高大型结构振动分析的规模、精度和效率,基于面向对象有限元并行计算框架PANDA和高性能矩阵特征问题并行求解算法,开发出适用于大规模结构振动问题计算的并行有限元模态分析程序;在超级计算机银河YH和曙光5000A上,通过不同算例验证该程序的正确性和可靠性.以某靶室结构为研究对象演示该程序的应用,指出实际应用时需注意加速...  相似文献   

14.
We discuss the design of sequential and parallel algorithms working on a time-increasing data set, within two paradigms of computation. In Paradigm 1 the process terminates when all the data currently arrived have been treated, independently of future arrivals. In Paradigm 2 an endless process is divided in stages, and in each stage the computation is carried out on the data set updated up to the previous stage. A problem may be unsolvable because no algorithm is fast enough to cope with the increasing data set. The computational cost of succeeding algorithms is studied in a new perspective, in the sequential RAM and parallel PRAM models, with the running time possibly tending to infinity for proper values of the parameters. It is shown that the traditional time bounds of parallel versus sequential computation (i.e., speed-up and slow-down under the so-called Brent's principle) do not hold, and new bounds are provided. Several problems are examined in the new paradigms, and the new algorithms are compared with the known ones designed for time-invariant data. Optimal sequential and parallel algorithms are also defined, and given whenever possible. In particular it is shown that some problems do not gain anything from a parallel solution, while others can be practically solved only in parallel. Paradigm 1 is the most innovative, and the relative results on parallel speed-up and scaling are probably unexpected. Paradigm 2 opens a new perspective in dynamic algorithms, because processing batches of data may be more efficient than processing single incoming data on-line. Received July 1993, and in final form April 1997.  相似文献   

15.
Graphics processing units (GPUs) have an SIMD architecture and have been widely used recently as powerful general-purpose co-processors for the CPU. In this paper, we investigate efficient GPU-based data cubing because the most frequent operation in data cube computation is aggregation, which is an expensive operation well suited for SIMD parallel processors. H-tree is a hyper-linked tree structure used in both top-k H-cubing and the stream cube. Fast H-tree construction, update and real-time query response are crucial in many OLAP applications. We design highly efficient GPU-based parallel algorithms for these H-tree based data cube operations. This has been made possible by taking effective methods, such as parallel primitives for segmented data and efficient memory access patterns, to achieve load balance on the GPU while hiding memory access latency. As a result, our GPU algorithms can often achieve more than an order of magnitude speedup when compared with their sequential counterparts on a single CPU. To the best of our knowledge, this is the first attempt to develop parallel data cubing algorithms on graphics processors.  相似文献   

16.
Array operations are useful in a large number of important scientific codes, such as molecular dynamics, finite element methods, climate modeling, atmosphere and ocean sciences, etc. In our previous work, we have proposed a scheme of extended Karnaugh map representation (EKMR) for multidimensional array representation. We have shown that sequential multidimensional array operation algorithms based on the EKMR scheme have better performance than those based on the traditional matrix representation (TMR) scheme. Since parallel multidimensional array operations have been an extensively investigated problem, we present efficient data parallel algorithms for multidimensional array operations based on the EKMR scheme for distributed memory multicomputers. In a data parallel programming paradigm, in general, we distribute array elements to processors based on various distribution schemes, do local computation in each processor, and collect computation results from each processor. Based on the row, column, and 2D mesh distribution schemes, we design data parallel algorithms for matrix-matrix addition and matrix-matrix multiplication array operations in both TMR and EKMR schemes for multidimensional arrays. We also design data parallel algorithms for six Fortran 90 array intrinsic functions: All, Maxval, Merge, Pack, Sum, and Cshift. We compare the time of the data distribution, the local computation, and the result collection phases of these array operations based on the TMR and the EKMR schemes. The experimental results show that algorithms based on the EKMR scheme outperform those based on the TMR scheme for all test cases.  相似文献   

17.
Efficient computation of dynamics parameters is one of the important issues in simulation and control of the multibody systems as these systems become more complex. Recent advances in computer architecture are toward multiple core systems rather than high-speed single core systems. Therefore, parallel computation algorithms for dynamics parameters should be designed to improve the performance on these multicore architectures. In this paper, a new dynamics computation algorithm is derived using the principle of dynamical balance, which provides explicit computation of dynamic parameters. This new algorithm has the structure to which parallel computation can be easily applicable. Parallel computation methods are then applied so that we can exploit the structure of the proposed dynamics computation algorithm based on the principle of dynamical balance. The parallel algorithm is designed based on task and data-parallelism. The performance of the proposed algorithm is verified on robots with various topologies. The improved speed of parallel computation is demonstrated through these experiments.  相似文献   

18.
This paper introduces a model for parallel computation, called thedistributed randomaccess machine (DRAM), in which the communication requirements of parallel algorithms can be evaluated. A DRAM is an abstraction of a parallel computer in which memory accesses are implemented by routing messages through a communication network. A DRAM explicitly models the congestion of messages across cuts of the network. We introduce the notion of aconservative algorithm as one whose communication requirements at each step can be bounded by the congestion of pointers of the input data structure across cuts of a DRAM. We give a simple lemma that shows how to “shortcut” pointers in a data structure so that remote processors can communicate without causing undue congestion. We giveO(lgn)-step, linear-processor, linear-space, conservative algorithms for a variety of problems onn-node trees, such as computing treewalk numberings, finding the separator of a tree, and evaluating all subexpressions in an expression tree. We giveO(lg2 n)-step, linear-processor, linear-space, conservative algorithms for problems on graphs of sizen, including finding a minimum-cost spanning forest, computing biconnected components, and constructing an Eulerian cycle. Most of these algorithms use as a subroutine a generalization of the prefix computation to trees. We show that any suchtreefix computation can be performed inO(lgn) steps using a conservative variant of Miller and Reif's tree-contraction technique.  相似文献   

19.
研究基于算法图的并行计算优化设计方法。通过引入算法图,从数学机理上算法的并行结构进行描述,针对不同要求提出了对计算网络的并行优化设计方法,为设计并行算法提供了新的有途途径。  相似文献   

20.
进化计算的理论与算法   总被引:7,自引:0,他引:7  
进化计算是近年来信息科学、人工智能与计算机科学的研究热点,是人们解决棘手问题的有力工具。阐述了进化计算的基本结构、理论、方法与算法,详细论述了遗传进化的主要操作如选择、重组或交叉、变异、迁移、并行实现等基本理论与相应的算法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号