首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 156 毫秒
1.
蔡勇  李胜 《计算机应用》2016,36(3):628-632
针对传统并行计算方法实现结构拓扑优化快速计算的硬件成本高、程序开发效率低的问题,提出了一种基于Matlab和图形处理器(GPU)的双向渐进结构优化(BESO)方法的全流程并行计算策略。首先,探讨了Matlab编程环境中实现GPU并行计算的三种途径的优缺点和适用范围;其次,分别采用内置函数直接并行的方式实现了拓扑优化算法中向量和稠密矩阵的并行化计算,采用MEX函数调用CUSOLVER库的形式实现了稀疏格式有限元方程组的快速求解,采用并行线程执行(PTX)代码的方式实现了拓扑优化中单元敏度分析等优化决策的并行化计算。数值算例表明,基于Matlab直接开发GPU并行计算程序不仅编程效率高,而且还可以避免不同编程语言间的计算精度差异,最终使GPU并行程序可以在保持计算结果不变的前提下取得可观的加速比。  相似文献   

2.
韩琪  蔡勇 《计算机仿真》2015,32(4):221-226,304
针对进行大规模拓扑优化问题计算量庞大且计算效率低的问题,设计并实现了一种基于图形处理器(GPU)的并行拓扑优化方法.采用双向渐进结构拓扑优化(BESO)为基础优化算法,采用一种基于节点计算的共轭梯度求解方法用于有限元方程组求解.通过对原串行算法的研究,并结合GPU的计算特点,实现了迭代过程全流程的并行计算.上述方法的程序设计和编写采用统一计算架构(CUDA),提出了基于单元和基于节点的两种并行策略.编写程序时充分使用CUDA自带的各种数学运算库,保证了程序的稳定性和易用性.数值算例证明,并行计算方法稳定并且高效,在优化结果一致的前提下,采用GTX580显卡可以取得巨大的计算加速比.  相似文献   

3.
有限元并行程序设计与实现   总被引:1,自引:0,他引:1  
1.引言有限元并行计算的一个主要途径是利用子结构方法山;并行对各子结构进行静凝聚,再并行求解界面方程,然后并行回代求内点位移和计算应变、应力.并行程序的设计与有效实现强烈地依赖于并行机硬件的计算模型.网络并行计算由于具有巨大的计算潜能、良好的性能价格比和可扩展性,以及灵活的体系结构等优点,和以PVM,MPI,EXPRESSP[2,3]等为代表的一批基于消息传递的并行程序设计软件平台的出现,使得可伸缩分布式网络并行有限元成了有限元并行计算的一个重要方向.本文详细介绍了基于PVM的分布式网络并行环境下有限元并行分…  相似文献   

4.
并行计算正成为科学和工程计算中的一个新趋势。将采用区域分裂技术的并行有限元方法应用于工作站机群的分布式并行环境。提出了基于单元区域分裂的共轭梯度并行算法。在工作站机群上对坝体结构进行求解,对其并行性能进行分析。  相似文献   

5.
基于工作站机群并行求解有限元线性方程组   总被引:2,自引:0,他引:2  
随着计算机高速网络技术的发展,工作站机群正在成为并行计算的主要平台.有限元线性方程组在土木工程结构分析中是最常见的问题.预处理共轭梯度法(PCGM)是求解线性方程组的迭代方法.对预处理共轭梯度法进行并行化并在两个不同的机群上实现,对存储方式进行详细分析,编程中采用了稀疏矩阵向量相乘的优化技术.数值结果表明,设计的并行算法具有良好的加速比和并行效率,说明并行计算能更快地求解大规模问题.  相似文献   

6.
基于MPI集群环境对弹塑性区域分解有限元并行计算进行研究。提出了基于三阶和四阶的龙格库塔(Runge-Kutta)方法对应力-应变关系进行积分的算法。积分过程中自动调整子步大小来控制积分过程中的误差。研制了采用最小残余平滑法的子结构预处理共轭梯度并行求解算法。算法在基于工作站机群的并行环境下实现。计算结果表明:该算法具有良好的并行加速比和效率,是一种有效的并行求解算法。  相似文献   

7.
基于光滑聚集代数多重网格法实现一种用于结构有限元并行计算的预条件共轭梯度求解方法。对计算区域进行均匀划分,将这些子区域分配给各个进程同时进行单元刚度矩阵的计算,并组合形成分布式存储的整体平衡方程。采用光滑聚集代数多重网格预条件共轭梯度法对整体平衡方程进行并行求解,在天河二号超级计算机上进行数值试验,分析代数多重网格的主要参数对算法性能的影响,测试程序的并行计算性能。试验结果表明该方法具有较好的并行性能和可扩展性,适合于大规模实际应用。  相似文献   

8.
泊松-玻尔兹曼方程(Poisson-Boltzmann Equation,PBE)是广泛应用于溶剂化生物分子静电分析的隐式溶剂化模型.本文在原有有限元软件基础上对近来提出的基于高阶有限元求解PBE的无条件稳定方法~([9])设计并实现了一种高效的并行计算方法.无条件稳定方法对PBE拟时间迭代求解,避开了强非线性导致的不稳定性.基于非结构化四面体网格本文设计实现了基于代数分解的求解稀疏线性方程组的高效并行模型.规模可扩展至6400 CPU核,并行效率达到近86%.大规模并行迭代求解线性方程组是计算科学领域的共性问题,它的高效并行实现不仅对实际生物分子静电分析提供了很好的基础,也可扩展至其他各应用领域.  相似文献   

9.
在对标准微粒群算法分析的基础上,将它与BSP并行计算模型相结合,设计并实现了一种基于BSP并行计算模型的并行微粒群算法.这种基于BSP并行计算模型的并行微粒群算法改变了标准微粒群算法的结构,提高了算法求解效率.实验结果表明,该并行算法的性能比标准微粒群算法有了很大的提高.  相似文献   

10.
求解偏微分方程组是许多流体力学问题的数值模拟中所碰到的关键问题之一,但是设计相应的并行算法并实现都会碰到开发周期长,难度大的问题.介绍的可移植可扩展科学计算工具箱PETSc(Portable,Extensible Toolkit for Scientific Computation)突破性地解决了这一问题,它能够实现自动并行处理.通过求解三对角方程问题实例,并和基于MPI(message passing interface)方法手工编写的并行代码作了比较,给出了并行性能的分析结果.  相似文献   

11.
In order to exploit the efficient computing power of many integrated cores on heterogeneous cluster, a multi-level and multi-granularity collaborative parallel computing method is proposed for finite element structural mechanical analysis. Computing tasks are divided into three levels: inter-node parallelism, inter-device parallelism and inter-core parallelism. Through mapping decomposablecomput- ing jobs to different hardware layers of heterogeneous MIC system, the proposed method not only effectively resolves the load balancing problem between CPU and MIC devices, but also significantly reduces the communication overheads of the system. Different engineering simulation case experiments for large scale parallel computing were conducted on “Tianhe 2” supercomputer. Up to 39000 CPU+MIC cores were employed and the finite element size of the analysis was more than 100 million units. Test results show that the proposed method can achieve good speedup and parallel computing efficiency in large scale parallel computing of finite element structural analysis. The optimized adaptation of finite element structural analysis and heterogeneous MIC computing platform is realized, which can provide reference for parallel porting and performance optimization of similar applications.  相似文献   

12.
Based on structural finite element analysis of discrete models, a neurocomputing strategy is developed in this paper. Dynamic iterative equations are constructed in terms of neural networks of discrete models. Determination of the iterative step size, which is important for convergence, is investigated based on the positive definiteness of the finite element stiffness matrix. Consequently, a method of choosing the step size of dynamic equations is proposed and the computational formula of the best step size is derived. The analysis of the computing model shows that the solution of finite element system equations can be obtained by the method of neural network computation efficiently. The proposed method can be used for parallel computation of structural finite element in a large-scale integrated circuit (LSI).  相似文献   

13.
A mathematical model for the simulation of three-dimensional bursting phenomena and its parallel implementation are presented. The model consists of four nonlinearly coupled partial differential equations that include fast and slow variables, and exhibits bursting in the absence of diffusion. The differential equations have been discretized by means of a second-order accurate in both space and time, linearly-implicit finite difference method in equally-spaced grids. The resulting system of linear algebraic equations at each time level has been solved by means of the Preconditioned Conjugate Gradient (PCG) method. Three different parallel implementations of the proposed mathematical model have been developed; two of these implementations, i.e., the MPI and the PETSc codes, are based on a message passing paradigm, while the third one, i.e., the OpenMP code, is based on a shared space address paradigm. These three implementations are evaluated on two current high performance parallel architectures, i.e., a dual-processor cluster and a Shared Distributed Memory (SDM) system. A novel representation of the results that emphasizes the most relevant factors that affect the performance of the paralled implementations, is proposed. The comparative analysis of the computational results shows that the MPI and the OpenMP implementations are about twice more efficient than the PETSc code on the SDM system. It is also shown that, for the conditions reported here, the nonlinear dynamics of the three-dimensional bursting phenomena exhibits three stages characterized by asynchronous, synchronous and then asynchronous oscillations, before a quiescent state is reached. It is also shown that the fast system reaches steady state in much less time than the slow variables.  相似文献   

14.
Large-scale scientific simulations are nowadays fully integrated in many scientific and industrial applications. Many of these simulations rely on modelisations based on PDEs that lead to the solution of huge linear or nonlinear systems of equations involving millions of unknowns. In that context, the use of large high performance computers in conjunction with advanced fully parallel and scalable numerical techniques is mandatory to efficiently tackle these problems.In this paper, we consider a parallel linear solver based on a domain decomposition approach. Its implementation naturally exploits two levels of parallelism, that offers the flexibility to combine the numerical and the parallel implementation scalabilities. The combination of the two levels of parallelism enables an optimal usage of the computing resource while preserving attractive numerical performance. Consequently, such a numerical technique appears as a promising candidate for intensive simulations on massively parallel platforms.The robustness and parallel numerical performance of the solver is investigated on large challenging linear systems arising from the finite element discretization in structural mechanics applications.  相似文献   

15.
This paper presents parallel computational strategies to implement explicit nonlinear finite element analysis code onto distributed memory parallel computers for solving large-scale problems in structural dynamics. Implementation details on both homogeneous and heterogeneous parallel processing environments are considered in detail in this paper. Implementation of an explicit nonlinear finite element dynamic analysis code on homogeneous systems is discussed first and this is later moved onto heterogeneous systems. Domain decomposition with explicit message passing is preferred for parallel implementation. The message passing implementation in the parallel algorithm is based on MPI (Message Passing Interface) libraries. Implementation aspects of overlapped, non-overlapped domain decomposition techniques, Dynamic Task Allocation (DTA) and clustering techniques for DTA and their relative merits are presented. The interprocessor communications are optimised by overlapping with computations to improve the performance of the domain decomposition based explicit dynamic analysis finite element code.The issues related to implementation of finite element code for nonlinear dynamic analysis on heterogeneous parallel computing environment are later presented. A new dynamic load-balancing algorithm is developed for this purpose and it is integrated with the domain decomposition based parallel explicit finite element code to test our algorithms on a coarse grain heterogeneous cluster of workstations. Numerical experiments have been carried out on PARAM-10000, an Indian parallel computer and also on cluster of Unix workstations.  相似文献   

16.
A class of specialised data structures designed for the distributed solution of non-conventional finite element formulations, which are equally effective when used in conjunction with conventional formulations, is presented. We begin by briefly discussing how the non-conventional finite element formulations being developed within the structural analysis group at IST [Freitas JAT, Almeida JPM, Pereira EMBR. Non-conventional formulations for the finite element method. Comput Mech 1999;23(5–6):488–501] lead to systems of equations that appear to be naturally suited for parallel processing, but we also recognise that to take full advantage of the characteristics of these systems – large dimension, non-overlapping block structure and sparsity – it is necessary to use appropriate data structures. The approach presented, which references the logical subdivisions of the system matrices, was designed to fulfil these objectives. Examples of parallel performance and efficiency on an homogeneous distributed platform are presented.  相似文献   

17.
Summary The alternative stress and displacement models of the hybrid-Trefftz finite element formulation for the analysis of linear boundary value problems are derived in parallel form to emphasise the complementary nature of the fundamental concepts they develop from. In the stress model the stresses in the structural domain and the boundary displacements are independently approximated and inter-element stress continuity is enforced explicitly. Conversely, in the displacement model the displacements in the structural domain and the boundary tractions are independently approximated and inter-element linkage is enforced in the form of displacement continuity. In both models the approximation in the domain is constrained to satisfy locally all field equations, a feature typical of the Trefftz method. Duality is used to interpret physically the finite element equations, which are derived from the fundamental relations of elastostatics. Numerical tests are presented to compare the relative performance of the alternative stress and displacement models.  相似文献   

18.
A parallel finite element solution method   总被引:9,自引:0,他引:9  
New parallel computer architectures have revolutionized the design of computer algorithms, and promise to have significant influence on algorithms for structural engineering computations. In this paper, a parallel finite element solution method is presented. The solution method proposed does not require the formation of global system equations, but computes directly the element distortions, as opposed to solving a system of nodal equations. An element or substructure is mapped on to a processor of an MIMD multiprocessing system. Each processor stores only the information relevant to the element or substructure for which the processor represents. The finite element computations can be performed in parallel, in that a processor generates the local stiffness, computes the element distortions and determines the stress-strain characteristics for the element or substructure associated with the processor.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号