共查询到20条相似文献,搜索用时 15 毫秒
1.
An efficient parallel iterative method for finite-element method has been developed for symmetric multiprocessor (SMP) cluster architectures with vector processors such as the Earth Simulator. The method is based on a three-level hybrid parallel programming model, including message passing for inter-SMP node communication, loop directives by OpenMP for intra-SMP node parallelization and vectorization for each processing element (PE). Simple 3D linear elastic problems with more than 2.2 × 10 9 DOF have been solved using 3 × 3 block ICCG(0) method with additive Schwarz domain decomposition and PDJDS/CM-RCM reordering on 176 nodes of the Earth Simulator, achieving performance of 3.80 TFLOPS. Furthermore, effect of color number in reordering has been evaluated on various types of computers. 相似文献
2.
Clusters of symmetric multiprocessors (SMPs) are popular platforms for parallel programming since they provide large computational power for a reasonable price. For irregular application programs with dynamically changing computation and data access behavior, a flexible programming model is needed to achieve efficiency. In this paper we propose Task Pool Teams as a hybrid parallel programming environment to realize irregular algorithms on clusters of SMPs. Task Pool Teams combine task pools on single cluster nodes by an explicit message passing layer. They offer load balance together with multi‐threaded, asynchronous communication. Appropriate communication protocols and task pool implementations are provided and accessible by an easy‐to‐use application programmer interface. As application examples we present a branch and bound algorithm and the hierarchical radiosity algorithm. Copyright © 2006 John Wiley & Sons, Ltd. 相似文献
3.
讨论了MPI+OpenMP混合编程模型的特点及其实现方法。建立了对拉普拉斯偏微分方程求解的混合并行算法,并在HL-2A高性能计算系统上同纯MPI算法作了性能方面的比较。结果表明,该混合并行算法具有更好的扩展性和加速比。 相似文献
4.
基于对称三对角特征问题的分而治之方法,提出了一个适合SMP集群环境的多级混合并行算法。SMP节点内的并行求解采用了粗粒度和细粒度两种OpenMP并行。为了改善纯MPI算法中的负载不平衡,混合并行算法使用了动态任务分配方法。在深腾6800上的试验表明,混合并行算法具有好的扩展性和加速比。 关键词:SMP集群;MPI+OpenMP;混合并行;并行求解器 相似文献
5.
研究了适用于 SMP机群的混合编程模型 ,并把它划分为 Open MP MPI和 Thread MPI两类 .通过研究指出 ,Open MP MPI优于 Thread MPI.在此基础上 ,重点研究了 Open MP MPI的实现机制、粗粒度和细粒度并行化方法、循环选择、优化措施以及注意事项等 ,得出细粒度并行化的 Open MP MPI是 SMP机群编程模型的一个较好选择的结论 相似文献
6.
为了提高分子动力学模拟在对称多处理(SMP)集群上的计算速度,在分子动力学并行方法中引入MPI+TBB的混合并行编程模型。基于该模型,在分子动力学软件LAMMPS中设计并实现混合并行算法,在节点间采用MPI及空间分解技术实施进程级并行,节点内采用TBB及临界区技术实施线程级并行。在SMP集群中的测试表明,该方法在体系较大以及节点数较多时可以明显减少通信时间,使加速比在纯MPI模型上提高45%。结果表明,MPI+TBB混合并行编程模型可促进分子动力学并行模拟且效率明显提升。 相似文献
7.
Parallel loop self‐scheduling on parallel and distributed systems has been a critical problem and it is becoming more difficult to deal with in the emerging heterogeneous cluster computing environments. In the past, some self‐scheduling schemes have been proposed as applicable to heterogeneous cluster computing environments. In recent years, multicore computers have been widely included in cluster systems. However, previous researches into parallel loop self‐scheduling did not consider certain aspects of multicore computers; for example, it is more appropriate for shared‐memory multiprocessors to adopt Open Multi‐Processing (OpenMP) for parallel programming. In this paper, we propose a performance‐based approach using hybrid OpenMP and MPI parallel programming, which partition loop iterations according to the performance weighting of multicore nodes in a cluster. Because iterations assigned to one MPI process are processed in parallel by OpenMP threads run by the processor cores in the same computational node, the number of loop iterations allocated to one computational node at each scheduling step depends on the number of processor cores in that node. Experimental results show that the proposed approach performs better than previous schemes. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献
8.
The Earth Simulator (ES) is a large scale, distributed memory, parallel computer system consisting of 640 processor nodes (PN) with shared memory vector multi-processors (64GFLOPS/PN, 5120 APs in total, AP: arithmetic processor). All the nodes are connected via a high speed (16GB/s) single-stage crossbar network called the Interconnection Network (IN). The operating system for the Earth Simulator is based on SUPER-UX, the UNIX operating system for the SX series scientific supercomputers. In order to realize high-performance parallel processing on the highly parallel machine, the operating system is enhanced for scalability. The Earth Simulator system is managed as a two-level cluster system called the Super Cluster System. In the Super Cluster System, the Earth Simulator system is divided into 40 clusters (16 PNs/cluster). A single controller called Super Cluster Control Station (SCCS) manages all these clusters. This management system provides Single System Image (SSI) operation, management and job control for the large scale multi-node system. The Job Scheduler (JS) and NQS running on the SCCS control all jobs of the system. They schedule the resources such as processing nodes and files which have not usually been treated as scheduling resources. This allows efficient scheduling of large scale jobs. The MPI library (MPI/ES) and the HPF compiler (HPF/ES) are available for distributed parallel programming on the Earth Simulator. MPI/ES conforms to the MPI 2.0 standard and is optimized to exploit the hardware features. HPF/ES conforms to the core part of HPF 2.0 and supports some features of the HPF 2.0 approved extensions and HPF/JA 1.0 extensions. HPF/ES suitably handles the 3-level parallelism of the Earth Simulator system, that is, vectorization, shared-memory parallelization, and distributed-memory parallelization. Moreover, HPF/ES extends the language to easily handle irregular problems. 相似文献
9.
Nowadays, NVIDIA's CUDA is a general purpose scalable parallel programming model for writing highly parallel applications. It provides several key abstractions – a hierarchy of thread blocks, shared memory, and barrier synchronization. This model has proven quite successful at programming multithreaded many core GPUs and scales transparently to hundreds of cores: scientists throughout industry and academia are already using CUDA to achieve dramatic speedups on production and research codes. In this paper, we propose a parallel programming approach using hybrid CUDA OpenMP, and MPI programming, which partition loop iterations according to the number of C1060 GPU nodes in a GPU cluster which consists of one C1060 and one S1070. Loop iterations assigned to one MPI process are processed in parallel by CUDA run by the processor cores in the same computational node. 相似文献
10.
In this paper we study the solution to optimal control problems for constrained discrete-time linear hybrid systems based on quadratic or linear performance criteria. The aim of the paper is twofold. First, we give basic theoretical results on the structure of the optimal state-feedback solution and of the value function. Second, we describe how the state-feedback optimal control law can be constructed by combining multiparametric programming and dynamic programming. 相似文献
11.
针对水声传播模型的计算量大,难以满足实时化、精细化水下声传播信息保障需求的难题,基于MPI+OpenMP混合并行编程方法,开展了WKBZ简正波模型混合并行计算方法研究,实现了水下声场2级混合并行计算。该方法通过节点间消息传递、节点内内存共享的方式,有效克服了MPI并行编程模型通信开销大和OpenMP并行编程环境可扩展性差的缺点,较好地解决了水下声传播快速计算的问题。测试结果表明,该方法能够较好地利用SMP集群节点间和节点内多级并行机制,充分发挥消息传递编程模型和共享内存编程模型各自的优势,大幅降低MPI进程间通信带来的时间开销,有效提升程序的可扩展性和并行效率。 相似文献
12.
This paper describes the ideas and developments of the project EP-CACHE. Within this project new methods and tools are developed to improve the analysis and the optimization of programs for cache architectures, especially for SMP clusters. The tool set comprises the semi-automatic instrumentation of user programs, the monitoring of the cache behavior, the visualization of the measured data, and optimization techniques for improving the user program for better cache usage. As current hardware performance counters do not give sufficient user relevant information, new hardware monitors are designed that provide more detailed information about the cache utilization related to the data structures and code blocks in the user program. The expense of the hardware and software realization will be assessed to minimize the risk of a real implementation of the investigated monitors. The usefulness of the hardware monitors is evaluated by a cache simulator. 相似文献
13.
对称矩阵三对角化和三对角对称矩阵的特征值求解是稠密对称矩阵特征问题并行求解器的关键步.针对SMP集群系统的多级体系结构,基于Householder变换的矩阵三对角化和三对角矩阵特征值问题的分而治之算法,给出了它们的MPI+OpenMP混合并行算法.算法研究集中在SMP集群系统环境下的负载平衡、通信开销和性能评价.混合并行算法的设计结合了粗粒度线程并行模式和任务共享的动态调用方法,改善了MPI算法中的负载平衡问题、降低了通信开销.在深腾6800上的实验表明,基于混合并行算法的求解器比纯MPI版本的求解器具有更好的性能和可扩展性. 相似文献
14.
The conventional unconstrained binary quadratic programming (UBQP) problem is known to be a unified modeling and solution framework for many combinatorial optimization problems. This paper extends the single-objective UBQP to the multiobjective case (mUBQP) where multiple objectives are to be optimized simultaneously. We propose a hybrid metaheuristic which combines an elitist evolutionary multiobjective optimization algorithm and a state-of-the-art single-objective tabu search procedure by using an achievement scalarizing function. Finally, we define a formal model to generate mUBQP instances and validate the performance of the proposed approach in obtaining competitive results on large-size mUBQP instances with two and three objectives. 相似文献
15.
In this paper, the simultaneous order acceptance and scheduling problem is developed by considering the variety of customers’ requests. To that end, two agents with different scheduling criteria including the total weighted lateness for the first and the weighted number of tardy orders for the second agent are considered. The objective is to maximize the sum of the total profit of the first and the total revenue of the second agents’ orders when the weighted number of tardy orders of the second agent is bounded by an upper bound value. In this study, it is shown that this problem is NP-hard in the strong sense, and then to optimally solve it, an integer linear programming model is proposed based on the properties of optimal solution. This model is capable of solving problem instances up to 60 orders in size. Also, the LP-relaxation of this model was used to propose a hybrid meta-heuristic algorithm which was developed by employing genetic algorithm and linear programming. Computational results reveal that the proposed meta-heuristic can achieve near optimal solutions so efficiently that for the instances up to 60 orders in size, the average deviation of the model from the optimal solution is lower than 0.2% and for the instances up to 150 orders in size, the average deviation from the problem upper bound is lower than 1.5%. 相似文献
17.
Heterogeneous multiprocessor systems, where commodity multicore processors are coupled with graphics processing units (GPUs), have been widely used in high performance computing (HPC). In this work, we focus on the design and optimization of Computational Fluid Dynamics (CFD) applications on such HPC platforms. In order to fully utilize the computational power of such heterogeneous platforms, we propose to design the performance-critical part of CFD applications, namely the linear equation solvers, in a hybrid way. A hybrid linear solver includes both one CPU version and one GPU version of code for solving a linear equations system. When a hybrid linear equation solver is invoked during the CFD simulation, the CPU portion and the GPU portion will be run on corresponding processing devices respectively in parallel according to the execution configuration. Furthermore, we propose to build functional performance models (FPMs) of processing devices and use FPM-based heterogeneous decomposition method to distribute workload between heterogeneous processing devices, in order to ensure balanced workload and optimized communication overhead. Efficiency of this approach is demonstrated by experiments with numerical simulation of lid-driven cavity flow on both a hybrid server and a hybrid cluster. 相似文献
18.
In this paper, a linear programming method is proposed to solve model predictive control for a class of hybrid systems. Firstly, using the (max, +) algebra, a typical subclass of hybrid systems called max-plus-linear (MPL) systems is obtained. And then, model predictive control (MPC) framework is extended to MPL systems. In general, the nonlinear optimization approach or extended linear complementarity problem (ELCP) were applied to solve the MPL-MPC optimization problem. A new optimization method based on canonical forms for max-min-plus-scaling (MMPS) functions (using the operations maximization, minimization, addition and scalar multiplication) with linear constraints on the inputs is presented. The proposed approach consists in solving several linear programming problems and is more efficient than nonlinear optimization. The validity of the algorithm is illustrated by an example. 相似文献
19.
In this paper, a linear programming method is proposed to solve
model predictive control for a class of hybrid systems. Firstly,
using the (max, +) algebra, a typical subclass of hybrid systems
called max-plus-linear (MPL) systems is obtained. And then, model
predictive control (MPC) framework is extended to MPL systems. In
general, the nonlinear optimization approach or extended linear
complementarity problem (ELCP) were applied to solve the MPL-MPC
optimization problem. A new optimization method based on canonical
forms for max-min-plus-scaling (MMPS) functions (using the
operations maximization, minimization, addition and scalar
multiplication) with linear constraints on the inputs is presented.
The proposed approach consists in solving several linear programming
problems and is more efficient than nonlinear optimization. The
validity of the algorithm is illustrated by an example. 相似文献
20.
为应对传统遗传算法在处理大规模组合优化问题面临的进化速度缓慢,难以达到实时要求的严峻挑战,提出了一种在多核PC集群系统上实现“粗粒度一主从式”混合并行遗传算法的模型:通过把“粗粒度一主从式”并行遗传算法映射到多核PC集群上,结合消息传递和共享存储两种并行编程模型,在节点间使用消息传递模型(MPI),对应的遗传算法为粗粒度并行遗传算法,在节点内使用共享存储模型(OpcnMP),对应的遗传算法为主从式并行遗传算法,用MPI和OpenMP混合编程的方式以进程和线程两级并行在多核集群上实现具体的混合并行遗传算法。理论分析和实验结果表明,提出的实现模型有较好的性能,可大大改进传统遗传算法的缺陷。为利用并行遗传算法在普通多核PC集群上处理大规模组合优化问题提出了一种有效、可行的解决方案。 相似文献
|