期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

非均匀区域油藏模拟负载平衡的分区并行算法^* 总被引：1，自引：0，他引：1

舒继武赵金熙周维四张德富《软件学报》1999,10(2):187-192

基于分布式并行计算机系统,对一类非均匀区域的油藏数值模拟问题,采用了区域分解方法并行求解,给出了并行求解的负载平衡模型及区域负载平衡的一种有效分区算法,从而将这类油藏数值模拟问题均衡映射到并行环境中,高效地并行求解.在黑油油藏数值模拟并行软件的研究中,实验结果表明,该算法有利于提高加速比. 相似文献

2.

Delaunay四面体网格并行生成算法研究进展 总被引：3，自引：0，他引：3

王磊聂玉峰李义强《计算机辅助设计与图形学学报》2011,23(6)

纵观近20年国内外Delaunay四面体网格并行生成算法的发展,依据其并行框架分为区域分解模式和基于节点模式,其中区域分解模式根据通信复杂程度进一步分为耦合的和非耦合的;对典型算法中网格质量传承性、串行代码嵌入率、扩展性、负载平衡和容错性等性能进行分析,结合数值测试结果总结了各类模式算法的优缺点.最后,针对各类模式算法的特点探讨了Delaunay四面体网格并行生成技术的发展趋势. 相似文献

3.

OVALS海洋资料同化系统并行计算研究

下载免费PDF全文

卢风顺宋君强朱小谦《计算机工程与科学》2010,32(1):113-116

海洋数值预报技术的发展与高性能计算密切相关。为提高OVALS海洋资料同化系统的时效性,本文实现了OVALS系统的并行化。在温盐资料同化模块并行化过程中,本文提出了层优先处理器划分算法,并研究了基于该算法的并行I/O、全局通讯等实现方法;在高度计资料同化模块并行化过程中,设计实现了基于预处理的非规则区域分解算法,较好地实现了OVALS并行计算负载平衡。数值实验表明,OVALS并行系统在36并行规模下取得了17.45的并行加速比。相似文献

4.

基于有效并行求解策略的显式有限元分析并行算法

付朝江王天奇林悦荣《计算机应用》2018,38(4):1072-1077

针对大规模结构非线性动力问题的有限元分析非常耗时,基于消息传递接口（MPI）机群环境,提出多种基于并行求解策略的显式有限元并行算法。基于显式消息传递的区域分解技术,采取重叠、非重叠区域分解技术及动态任务分配方法,通过将计算与通信重叠,优化处理器间的通信,对非重叠通信区域分解并行算法、重叠通信区域分解并行算法、群动态任务分配算法、动态任务分配算法及动态负载平衡算法进行研究。为在机群环境下实现非线性动力有限元分析,开发了基于有效并行求解策略的显式有限元并行算法。编写了基于消息传递编程模式的并行有限元程序,在工作站机群上实现了数值算例,分析了算法的性能,并与传统的Newmark算法进行了比较。算例表明：群动态任务分配算法的性能优于动态任务分配算法,低于区域分解算法的性能,动态负载平衡算法最优。对相同规模的问题提出的算法比Newmark算法快,优于Newmark算法。对结构非线性动力问题的有限元分析,所提出的并行算法是可行有效的。相似文献

5.

并行自适应有限元计算中的负载平衡研究

《数值计算与计算机应用》2015,(3)

偏微分方程的并行求解,关键问题之一是网格划分,它不仅要求每个进程拥有相等的计算负载,同时要求有良好的划分质量,以减少进程间通信.在自适应有限元计算过程中,网格/基函数不断调整,会导致负载不平衡,必须动态地调整网格分布,从而实现动态负载平衡.本文研究了不同的负载平衡方法,并在并行自适应有限元平台PHG中实现.数值实验表明我们的动态负载平衡算法具有很高的划分质量,运行速度快,可有效划分网格并减少运行时间. 相似文献

6.

利用Master-Slave-Collector模式的大规模数据集的并行体绘制

下载免费PDF全文

汤敏《计算机工程与应用》2011,47(16):176-178

以内部网络和普通配置计算机为实验平台,研究大规模数据集的并行体绘制的实现方法,以提高绘制速度和算法效率。分别介绍并行可视化、Master-Slave-Collector模式、负载平衡、任务池和结果池等关键技术。在传统的Master-Slave模式基础上的改进模式Master-Slave-Collector,具有减少计算时间、实现负载平衡、提高绘制效率等优点。实验结果表明,该方法较好地解决了运算速度和内存空间这两大难题,效果良好,实时性强,在临床诊断和科学研究中发挥重要作用。相似文献

7.

全球(z)双三次数值模式并行算法设计与实现

赵军吴建平宋君强张磊《计算机应用研究》2013,30(5):1337-1339

针对双三次数值天气预报模式进行了并行算法研究。采用一维区域分解算法,借鉴块棋盘划分矩阵转置算法,设计和实现了数据转置通信算法,并采取计算与通信重叠技术减小通信时间对并行效率的影响,最终实现了双三次数值天气预报模式的并行算法,并在机群系统上进行了并行性能测试评估。结果表明,实现的双三次数值预报模式并行算法的并行效率较高,设计实现的数据转置通信算法、计算与通信重叠技术取得了较好的效果。相似文献

8.

BFGS算法在多Transputer系统上的并行实现

王艳春许有信《计算机应用》1994,14(6):21-24

本文讨论了ＢＦＧＳ算法在多Ｔｒａｎｓｐｕｔｅｒ系统上的并行实现问题，利用向量和矩阵分解并充分考虑到多Ｔｒａｎｓｐｕｔｅｒ系统的硬件特点，构造了一个负载平衡度高，通讯量小的并行ＢＦＧＳ算法。相似文献

9.

分子动力学模拟中负载平衡方法的应用

肖永浩张亚林李丽娟《计算机工程与应用》2006,42(8):56-57

文章针对三维分子动力学并行数值模拟中出现的负载不平衡现象,在静态负载平衡基础上,提出了一种简单有效的动态负载平衡算法。通过对三维分子动力学的并行数值模拟试验,此算法可以使得负载基本达到动态平衡,并进一步提高了并行效率。相似文献

10.

基于PVM的稠密线性方程组网上并行求解 总被引：3，自引：1，他引：3

尚月强杨一都《计算机工程与设计》2006,27(9):1591-1594

将求解线性方程组的Gauss-Jordan消去法与Gauss列主元消去法结合起来,提出了利用并行计算支撑软件PVM在局域网上高效并行求解稠密线性方程组的算法.该算法处理机间的通信开销较少,实现了负载平衡和各处理机间的全并行工作.用1～24台桌面PC机按两种网络布局方式连接成的局域网,在PVM3.4 on Windows2000、VC 6.0并行计算平台上编程对该算法进行了数值试验,得到了正确的结果. 相似文献

11.

A parallel CFD rotor code using OpenMP

《Advances in Engineering Software》2001,32(8):665-671

The extended full-potential (FPX) helicopter rotor computational fluid dynamics (CFD) code of Fortran in its reduced two-dimensional version is successfully converted into a parallel version for multiprocessing. The FPX code with an internal grid generator solves the compressible full-potential equation using an approximately factored finite-difference scheme with added numerous physical modeling enhancements, including viscous boundary layers, shock-induced entropy corrections and wake-vortex embedding. The parallel version of the code uses open multi-processing (OpenMP) directives as parallel programming tool in shared-memory (SM) environment. The OpenMP code is portable and scalable, which can run on various computer platforms including UNIX platforms and Windows NT platforms. The performance study of the parallel code on SGI Origin 2000 UNIX platform is made. The results show that reasonable speedups through parallelization are obtained and that OpenMP is easy to use and an efficient parallel programming tool for the present problem. 相似文献

12.

Design and analysis of kinematically redundant parallel manipulators with configurable platforms

Mohamed M.G. Gosselin C.M. 《Robotics, IEEE Transactions on》2005,21(3):277-287

Redundancy can, in general, improve the ability and performance of parallel manipulators by implementing the redundant degrees of freedom to optimize a secondary objective function. Almost all published researches in the area of parallel manipulators redundancy were focused on the design and analysis of redundant parallel manipulators with rigid (nonconfigurable) platforms and on grasping hands to be attached to the platforms. Conventional grippers usually are not appropriate to grasp irregular or large objects. Very few studies focused on the idea of using a configurable platform as a grasping device. This paper highlights the idea of using configurable platforms in both planar and spatial redundant parallel manipulators, and generalizes their analysis. The configurable platform is actually a closed kinematic chain of mobility equal to the degree of redundancy of the manipulator. The additional redundant degrees of freedom are used in reconfiguring the shape of the platform itself. Several designs of kinematically redundant planar and spatial parallel manipulators with configurable platform are presented. Such designs can be used as a grasping device especially for irregular or large objects or even as a micro-positioning device after grasping the object. Screw algebra is used to develop a general framework that can be adapted to analyze the kinematics of any general-geometry planar or spatial kinematically redundant parallel manipulator with configurable platform. 相似文献

13.

Benchmarking parallel processing platforms: an applicationsperspective

Mueller-Thuns R.B. Saab D.G. Damiano R.F. Abraham J.A. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(8):947-954

Given the increased availability of general purpose parallel computers two issues arise: One needs to compare the performance of the different available platforms using realistic examples, and it is necessary to write application software that can be ported easily in order to take advantage of different platforms. The authors address these issues from an applications point of view. They are interested in the use of general purpose parallel computers for simulation tasks needed during the design of very large scale integrated (VLSI) circuits. They characterize the simulation task as a useful benchmark and introduce a high level process view of parallel simulation that is helpful for deriving portable parallel programs. Details of the partitioning strategy and the simulation algorithm used in the application are given. They discuss their implementation on different parallel machines and give statistics of various experiments 相似文献

14.

Fast prototyping of parallel-vision applications using functional skeletons

Jocelyn Sérot Dominique Ginhac Roland Chapuis Jean-Pierre Dérutin 《Machine Vision and Applications》2001,12(6):271-290

We present a design methodology for real-time vision applications aiming at significantly reducing the design-implement-validate cycle time on dedicated parallel platforms. This methodology is based upon the concept of algorithmic skeletons, i.e., higher order program constructs encapsulating recurring forms of parallel computations and hiding their low-level implementation details. Parallel programs are built by simply selecting and composing instances of skeletons chosen in a predefined basis. A complete parallel programming environment was built to support the presented methodology. It comprises a library of vision-specific skeletons and a chain of tools capable of turning an architecture-independent skeletal specification of an application into an optimized, deadlock-free distributive executive for a wide range of parallel platforms. This skeleton basis was defined after a careful analysis of a large corpus of existing parallel vision applications. The source program is a purely functional specification of the algorithm in which the structure of a parallel application is expressed only as combination of a limited number of skeletons. This specification is compiled down to a parametric process graph, which is subsequently mapped onto the actual physical topology using a third-party CAD software. It can also be executed on any sequential platform to check the correctness of the parallel algorithm. The applicability of the proposed methodology and associated tools has been demonstrated by parallelizing several realistic real-time vision applications both on a multi-processor platform and a network of workstations. It is here illustrated with a complete road-tracking algorithm based upon white-line detection. This experiment showed a dramatic reduction in development times (hence the term fast prototyping), while keeping performances on par with those obtained with the handcrafted parallel version. Received: 22 July 1999 / Accepted: 9 November 2000 相似文献

15.

Mapping pipeline skeletons onto heterogeneous platforms

Anne Benoit Yves Robert 《Journal of Parallel and Distributed Computing》2008

Mapping applications onto parallel platforms is a challenging problem, that becomes even more difficult when platforms are heterogeneous — nowadays a standard assumption. A high-level approach to parallel programming not only eases the application developer’s task, but it also provides additional information which can help realize an efficient mapping of the application. 相似文献

16.

Testing Moderately Parallel Environments for an Ocean Modeling Application

Jerry L. Bickham Germana Peggion Benjamin R. Seyfarth 《Journal of scientific computing》1998,13(2):185-200

Due to the high costs of accessing massively parallel and vector environments, as well as the overworking of high-performance computers, there is now a need for a different approach to parallel computing. The feasibility of ocean modeling in a moderately parallel environment is tested using a 2-D (vertically-integrated) ocean circulation model. The parallel algorithm is based on the Glenda message-passing software and follows the master-worker paradigm. It is evaluated on both internal and external communication environments. The numerical experiments show that the internal communication environment is only slightly more efficient than the external communication environment. This is due to a combination of shared memory problems in the internal communication environment and to inefficiencies in the message-passing software. The tests also demonstrate how efficiency depends on the domain sub-divisions. Most importantly, they show that both environments effectively outperform their sequential counterparts, reducing the program elapsed time, and offering quicker access to the model outputs. The parallel version provided a time-saving alternative to the sequential version of the same model on both internal and external communication platforms. This research supports the conclusion that both environments are a viable alternative to single-CPU machines and that moderately parallel environments are feasible computer platforms for ocean modeling applications. 相似文献

17.

An efficient method for inverse dynamics of kinematically defective parallel platforms

Jianfeng Li Jinsong Wang Xinjun Liu 《野外机器人技术杂志》2002,19(2):45-61

In addition the general six‐degree‐of‐freedom parallel platforms, parallel platforms with fewer than six DOF can also be used in the structural design of robotic manipulators. The common property of these parallel platforms is that the motion parameters used to describe the position and orientation of the movable platform are six, but fewer than six are independent. In their general configurations, arbitrary six‐dimensional motion of the platform cannot be achieved by the actuators mounted on the legs, therefore they are kinematically defective. Because of this defect, the inverse dynamic analysis method, which is applicable to the general six‐DOF parallel platforms, cannot be directly used for the kinematically defective parallel platforms (KDPPs). In this paper, an effective method for formulating the inverse dynamics of KDPPs is presented. Using the proposed method, three different KDPPs are studied and their inverse dynamic formulas are derived. © 2002 John Wiley & Sons, Inc. 相似文献

18.

Neville elimination on multi- and many-core systems: OpenMP,MPI and CUDA

P. Alonso R. Cortina F. J. Martínez-Zaldívar J. Ranilla 《The Journal of supercomputing》2011,58(2):215-225

This paper describes several parallel algorithmic variations of the Neville elimination. This elimination solves a system of linear equations making zeros in a matrix column by adding to each row an adequate multiple of the preceding one. The parallel algorithms are run and compared on different multi- and many-core platforms using parallel programming techniques as MPI, OpenMP and CUDA. 相似文献

19.

Parallel Algorithms for Dynamic Shortest Path Problems

Ismail Chabini & Sridevi Ganugapati 《International Transactions in Operational Research》2002,9(3):279-302

The development of intelligent transportation systems (ITS) and the resulting need for the solution of a variety of dynamic traffic network models and management problems require faster‐than‐real‐time computation of shortest path problems in dynamic networks. Recently, a sequential algorithm was developed to compute shortest paths in discrete time dynamic networks from all nodes and all departure times to one destination node. The algorithm is known as algorithm DOT and has an optimal worst‐case running‐time complexity. This implies that no algorithm with a better worst‐case computational complexity can be discovered. Consequently, in order to derive algorithms to solve all‐to‐one shortest path problems in dynamic networks, one would need to explore avenues other than the design of sequential solution algorithms only. The use of commercially‐available high‐performance computing platforms to develop parallel implementations of sequential algorithms is an example of such avenue. This paper reports on the design, implementation, and computational testing of parallel dynamic shortest path algorithms. We develop two shared‐memory and two message‐passing dynamic shortest path algorithm implementations, which are derived from algorithm DOT using the following parallelization strategies: decomposition by destination and decomposition by transportation network topology. The algorithms are coded using two types of parallel computing environments: a message‐passing environment based on the parallel virtual machine (PVM) library and a multi‐threading environment based on the SUN Microsystems Multi‐Threads (MT) library. We also develop a time‐based parallel version of algorithm DOT for the case of minimum time paths in FIFO networks, and a theoretical parallelization of algorithm DOT on an ‘ideal’ theoretical parallel machine. Performances of the implementations are analyzed and evaluated using large transportation networks, and two types of parallel computing platforms: a distributed network of Unix workstations and a SUN shared‐memory machine containing eight processors. Satisfactory speed‐ups in the running time of sequential algorithms are achieved, in particular for shared‐memory machines. Numerical results indicate that shared‐memory computers constitute the most appropriate type of parallel computing platforms for the computation of dynamic shortest paths for real‐time ITS applications. 相似文献

20.

Dimensional optimization of 6-DOF 3-CCC type asymmetric parallel manipulator

Metin Toz 《Advanced Robotics》2014,28(9):625-637

In this paper, dimensional optimization of a six-degrees-of-freedom (DOF) 3-CCC (C: cylindrical joint) type asymmetric parallel manipulator (APM) is performed by using particle swarm optimization (PSO). The 3-CCC APM constructed by defining three angle and three distance constraints between base and moving platforms is a member of 3D3A generalized Stewart–Gough platform (GSP) type parallel manipulators. The dimensional optimization purposes to find the optimum limb lengths, lengths of line segments on the base and moving platforms, attachment points of the line segments on the base platform, the orientation angles of the moving platform, and position of the end-effector in the reachable workspace in order to maximize the translational and orientational dexterous workspaces of the 3-CCC APM, separately. The dexterous workspaces are obtained by applying condition number and minimum singular values of the Jacobian matrix. The optimization results are compared with the traditional GSP manipulator for illustrating the kinematic performance of 3-CCC APM. Optimizations show that 3-CCC APM have superior dexterous workspace characteristics than the traditional GSP manipulator. 相似文献