期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

边值问题的并行算法

李磊《工程数学学报》1989,6(1):26-32

§1.两点边值问题的快速并行算法关于一阶线性边值问题的并行计算,目前已提出了并行打靶方法。本节对一类二阶非线性两点边值问题提出一种快速并行迭代方法,利用并行FFT算法,可有效地提高这一算法的计算速度。考虑如下类型的两点边值问题相似文献

2.

接触问题的MPI+OpenMP混合并行计算 总被引：1，自引：0，他引：1

下载免费PDF全文

肖永浩莫则尧《振动与冲击》2012,31(15):36-40

针对接触计算问题中需要大量全局通信的特点,结合当前流行的多处理器集群系统,采用了MPI+OpenMP的混合并行模式,实现了接触问题的并行计算。以双重区域剖分并行算法为基础,内力计算部分在采用MPI并行基础上,使用基于分块结构的OpenMP并行编程,使得接触力并行计算中涉及的全局通信时间无需增加,从而进一步提高了并行效率。数值模拟实验表明,这种并行方式能在上百处理器上实现千万自由度接触问题的并行计算。相似文献

3.

基于GPU的杆系离散元并行算法在大型工程结构中的应用

下载免费PDF全文

叶继红王佳《工程力学》2021,38(2):1-7

杆系DEM(离散元,discrete element method)是求解结构强非线性问题的有效方法,但随着结构数值计算规模的扩大,杆系DEM所需要的计算时间也随之急剧膨胀。为了提高杆系DEM的计算效率,该研究提出单元级并行、节点级并行的计算方法,基于CPU-GPU异构平台,建构了杆系DEM并行计算框架,编制了相应的几何非线性计算程序,实现了杆系DEM的GPU多线程并行计算。对杆系DEM并行算法的设计主要包括数据存储方式、GPU线程计算模式、节点物理量集成方式以及数据传输优化。最后采用大型三维框架、球壳结构模型分别验证了杆系DEM并行算法的计算精度,并对杆系DEM并行算法进行了计算性能测试,测试结果表明杆系DEM并行算法加速比最高可达12.7倍。相似文献

4.

基于SKELETON的并行程序设计方法的研究现状

雷利桂郭景娟《硅谷》2009,(1)

并行程序设计是并行计算的难点之一。而基于SKELETON的并行程序设计方法为程序员提供的是并行程序的框架,比使用并行库(PVM和MPI)具有更高的抽象程度和通用性。简单地介绍目前国际上三种应用此方法所开发的模型或项目以及我们所研究的DPAPD模型,并做出比较。相似文献

5.

基于EBE策略实现结构动力响应的并行计算

聂旭涛范大鹏《振动与冲击》2007,26(10):51-55

基于EBE(Element by Element)策略的并行算法不用形成总体刚度矩阵,而且无需进行三维模型的区域分解,从而提高并行计算的速度和效率,是实现结构动力响应快速分析的有效途径。采用Newmark法,结合EBE并行算法和Jacobi预处理技术实现结构动力方程的并行计算。在此基础上,利用虚拟激励方法实现结构随机振动的并行计算。最后在网络集群环境下,综合运用多种编程语言和分析工具,应用该并行算法对三维零件的冲击响应以及随机振动进行仿真计算,并与Ansys、精细时程积分法的相比较。结果表明,该并行算法的计算误差小,并行效率较高,适用于工程计算。相似文献

6.

有限元分析的并行程序设计：基于网络和PVM的分布式并行环境

余天堂姜弘道《工程力学》1999,1(A01):108-116

基于网络和ＰＶＭ的分布式并行环境的有限元并行计算是有限元并行分析的一个重要方向，此种并行计算方法具有投资少，灵活性强和得到的加速比高等优点，本文给出了基于网络和宾分布式并行环境的有限元分析的并行程序设计方法。相似文献

7.

模拟退火算法在岩石三维模型重构中的并行处理

钱旭黄耀辉鞠杨王金波《中国科技博览》2014,(3):443-444

对岩石三维模型重构的模拟退火算法进行并行性分析,针对其特殊性提出并行算法。在本文中,介绍了在岩石三维模型重构中使用模拟退火算法的情况,其存在的性能问题。探讨其改为并行算法的可能性,找到其与经典模拟退火应用并行性的不同的特殊之处,提出其可并行的条件及改进后的并行算法,并验证。相似文献

8.

网络机群下多项式预处理EBE-PCG并行算法设计与实现 总被引：5，自引：0，他引：5

乐志华程建钢姚振汉《工程力学》2002,19(5):150-155

针对单机上实现困难，计算费用高昂的大规模结构动力学问题，本文采用将总体运算分解到单元上进行的EBE计算策略和基于区域分裂的SBS存储和任务分配策略，设计了粗粒度EBE－PCG并行算法，并在网络机群环境下得以实现。在PCG迭代法中分别采用Jacobi预处理矩阵和多项式预处理矩阵，比较它们的迭代求解效率。悬臂梁受冲击载荷与吉普车车架振动响应分析问题的数值算例，证明了该算法不但能够显著地提高问题的求解规模，适合大规模结构分析计算；而且还能获得良好的并行效率，是一种适合网络机群并行环境的有效的粗粒度并行算法。相似文献

9.

块对角占优块三对角方程组的块重叠分割无通信并行求解方法

张衡张武《工程数学学报》2007,24(6):1080-1090

基于并行计算的分治思想,对块三对角线性方程组的求解提出了一个块重叠分割无通信的高效可扩展并行算法(PBOPUC算法)。当系统严格块对角占优时,在机器精度内,得到与精确解等价的近似解。通过精度分析,得到子方程组的阶数与精度的关系,并用它来控制精度和并行效率。本文的算法已经在上海大学的高性能并行计算机"自强3000"上实现,结果说明,并行计算效率接近100%,加速比几乎是线性的。相似文献

10.

结构模态多级分层并行计算方法EI北大核心CSCD

喻高远楼云锋李俊杰金先龙《振动与冲击》2023,(16):19-25

基于稀疏存储技术和传统并行模态综合法提出了一种有限元结构模态分析多级分层并行计算方法。该方法在两级分区4次变换策略的基础上不仅实现了大量数据的分布式稀疏存储,提高了数据的内存访问效率,而且实现了系统整体缩减后的广义特征方程规模的有效降低,大幅度减少了广义特征方程的求解时间。此外,它还利用计算任务和异构众核集群硬件体系结构映射实现了计算过程的多级并行,不仅有效改善了不同层级的负载均衡,而且通过通信分离有效提高了通信效率。因此,它能够充分利用异构众核分布式存储并行计算机的体系结构特点提升大规模有限元模态并行计算效率。数值算例表明,相比于传统的并行模态综合法,稀疏存储格式模态多级分层并行方法能够大幅度节省内存空间和提高计算效率。相似文献

11.

A parallel genetic algorithm for dynamic cell formation in cellular manufacturing systems

F. M. Defersha 《国际生产研究杂志》2013,51(22):6389-6413

Instead of using expensive multiprocessor supercomputers, parallel computing can be implemented on a cluster of inexpensive personal computers. Commercial accesses to high performance parallel computing are also available on the pay-per-use basis. However, literature on the use of parallel computing in production research is limited. In this paper, we present a dynamic cell formation problem in manufacturing systems solved by a parallel genetic algorithm approach. This method improves our previous work on the use of sequential genetic algorithm (GA). Six parallel GAs for the dynamic cell formation problem were developed and tested. The parallel GAs are all based on the island model using migration of individuals but are different in their connection topologies. The performance of the parallel GA approach was evaluated against a sequential GA as well as the off-shelf optimization software. The results are very encouraging. The considered dynamic manufacturing cell formation problem incorporates several design factors. They include dynamic cell configuration, alternative routings, sequence of operations, multiple units of identical machines, machine capacity, workload balancing, production cost and other practical constraints. 相似文献

12.

Dynamic analysis of large-scale SSI systems for layered unbounded media via a parallelized coupled finite-element/boundary-element/scaled boundary finite-element model

M. Cemal Genes 《Engineering Analysis with Boundary Elements》2012,36(5):845-857

An algorithm for a parallelized coupled model based on finite element method (FEM), boundary element method (BEM), and scaled boundary FEM (SBFEM) for harmonic and transient dynamic response of large-scale 2D structures embedded in or on layered soil media is presented. The BEM and SBFEM are used for modelling the dynamic response of the unbounded media. The standard FEM is used for modelling the finite region and the embedded structure. The objective of the development of this parallelized coupled model is to use the power of high performance computing, and to take into account the advantages and evade the disadvantages of the above mentioned numerical methods for modelling of the unbounded media in soil-structure interaction (SSI) systems. The development of the parallel algorithm for this model is essential for solving arbitrarily shaped large-scale SSI problems, which cannot be solved within reasonable elapsed times by a serial algorithm. The efficiency of the proposed parallel algorithm and the validity of the coupled model are shown by means of three numerical examples, indicating the excellent accuracy and applicability of the parallel algorithm with considerable time-savings in large-scale problems. 相似文献

13.

Parallelized FVM algorithm for three-dimensional viscoelastic flows 总被引：1，自引：0，他引：1

H.-S. Dou N. Phan-Thien 《Computational Mechanics》2003,30(4):265-280

A parallel implementation for the finite volume method (FVM) for three-dimensional (3D) viscoelastic flows is developed on a distributed computing environment through Parallel Virtual Machine (PVM). The numerical procedure is based on the SIMPLEST algorithm using a staggered FVM discretization in Cartesian coordinates. The final discretized algebraic equations are solved with the TDMA method. The parallelisation of the program is implemented by a domain decomposition strategy, with a master/slave style programming paradigm, and a message passing through PVM. A load balancing strategy is proposed to reduce the communications between processors. The three-dimensional viscoelastic flow in a rectangular duct is computed with this program. The modified Phan-Thien–Tanner (MPTT) constitutive model is employed for the equation system closure. Computing results are validated on the secondary flow problem due to non-zero second normal stress difference N ₂. Three sets of meshes are used, and the effect of domain decomposition strategies on the performance is discussed. It is found that parallel efficiency is strongly dependent on the grid size and the number of processors for a given block number. The convergence rate as well as the total efficiency of domain decomposition depends upon the flow problem and the boundary conditions. The parallel efficiency increases with increasing problem size for given block number. Comparing to two-dimensional flow problems, 3D parallelized algorithm has a lower efficiency owing to largely overlapped block interfaces, but the parallel algorithm is indeed a powerful means for large scale flow simulations. Received: 2 July 2002 / Accepted: 15 November 2002 This research is supported by an A^⋆STAR Grant EMT/00/011. 相似文献

14.

冲击-接触问题有限元仿真的并行计算 总被引：4，自引：5，他引：4

亓文果金先龙张晓云《振动与冲击》2006,25(4):68-72

冲击．接触问题广泛存在于汽车碰撞等的模拟计算中。简单介绍了求解该类问题的显式有限元方法，对显式有限元方法的并行性进行了讨论。根据显式有限元和冲击一接触问题的计算特点，设计并实现了接触均衡的分区算法。算例计算结果表明：该并行算法具有较好的加速比和并行效率。相似文献

15.

Cooperative parallel adaptive neighbourhood search for the disjunctively constrained knapsack problem

Zhe Quan 《工程优选》2017,49(9):1541-1557

This article investigates the use of parallel computing for solving the disjunctively constrained knapsack problem. The proposed parallel computing model can be viewed as a cooperative algorithm based on a multi-neighbourhood search. The cooperation system is composed of a team manager and a crowd of team members. The team members aim at applying their own search strategies to explore the solution space. The team manager collects the solutions from the members and shares the best one with them. The performance of the proposed method is evaluated on a group of benchmark data sets. The results obtained are compared to those reached by the best methods from the literature. The results show that the proposed method is able to provide the best solutions in most cases. In order to highlight the robustness of the proposed parallel computing model, a new set of large-scale instances is introduced. Encouraging results have been obtained. 相似文献

16.

基于共享存储的并行RGA的设计与实现

郑金华蔡自兴《高技术通讯》2000,10(3):23-27

讨论了基于共享存储的并行狭义遗传算法。该算法利用狭义遗传算法与区域搜索法的结合 ,实现了数据级的并行操作 ,具有较高的并行度。它只需要较少的通讯开销 ,从而具有很高的运行效率相似文献

17.

Parallel algorithm for setting WIP levels for multi-product CONWIP systems

L. Wang V. Prabhu 《国际生产研究杂志》2013,51(21):4681-4693

Reducing work-in-process (WIP) inventory is continuing to be an important business need because of several factors including the need to reduce working capital. Numerous techniques have been suggested for WIP reduction, and CONWIP is a competitive algorithm for WIP reduction. Prior CONWIP algorithms have been primarily sequential algorithms and can be potentially incur significant computing time, especially when dealing with inventories for multiple products. The paper proposes a card-setting algorithm for multiple product types subject to routing and throughput requirements. The proposed algorithm searches the WIP space iteratively and the step-size is adaptively selected based on the known properties of multi-chain, multi-class, closed queuing networks. Furthermore, parallelization of this search algorithm across multiple processors is proposed where each processor searches a different segment of the WIP space while adaptively adjusting its step size for all product types to ensure fast convergence. The proposed parallel algorithm can take advantage of distributed computing architectures to speed-up the overall computation. An experimental implementation of the parallel algorithm using Message Passing Interface (MPI) over a high-speed network is described. Computational results demonstrate that the proposed parallel algorithm can be parallelized over eight to ten processors to obtain a speed-up of three to five. 相似文献

18.

MPP并行机上数亿粒子规模的分子动力学模拟 总被引：3，自引：0，他引：3

曹小林莫则尧张景琳《高技术通讯》2004,14(5):10-13

提出一种基于“块-单元-链表”数据结构和HSFC动态负载平衡的并行分子动力学算法,实现了大规模、非均匀分子动力学模拟的基于MPI的可扩展并行计算,以辅助物理学家实现具有实验意义的纳米级模拟。具体地,在某MPP并行机的240个CPU上,计算3．84亿(二维)和2．76亿(三维)个粒子规模的金属微喷射问题,均获得了209倍以上的加速比。相似文献

19.

基于Jacobi-Davidson算法的大规模模态分析并行计算研究

下载免费PDF全文

范宣华陈璞吴瑞安肖世富《振动与冲击》2014,33(1):203-208

对Jacobi-Davidson（J-D）算法进行了改进和并行计算研究。通过添加谱变换、收缩和重启动等策略将J-D算法改造成了适应大规模模态分析的算法。利用改进后的算法和各种数值求解软件包,建立了一套基于PANDA框架的模态分析并行求解体系。基于该求解体系和并行机群,开展了某工程结构大规模模态分析并行可扩展性研究,测试规模从数十万自由度一直达到千万自由度,并行CPU核数达到128个;研究了改进后的J-D算法内层迭代步数、重启动向量个数等控制参数对外层迭代收敛速度的影响;获取了不同规模并行计算的加速比。研究结果表明,改进后的J-D算法完全适应千万自由度规模以上的模态分析,内存占用与规模之间呈线性增长趋势,在1 025万自由度规模模态分析仅占用39.4 GB内存;同时该算法具有优异的并行可扩展性,在128个CPU测试核内接近线性加速,并且测试规模越大,曲线越接近理想加速曲线,1 025万自由度规模在128核的并行效率达到88.1 %。相似文献