共查询到20条相似文献,搜索用时 15 毫秒
1.
针对对称逐步超松驰预处理共轭梯度(Symmetric Successive Over Relaxation Preconditioned Conjugate Gradient,SSOR-PCG)法并行化时每步迭代都要并行求解2个三角方程组的困难,采用多色排序技术提高并行度,基于MPI+OpenMP混合编程模型开发适合于分布共享内存计算机的并行程序,通过测试选择有效的MPI通信函数,并给出3种避免共享数据竞争的措施,供不同规模问题和不同内存容量计算机情况选用. 相似文献
2.
为了研究GPU的通用计算能力和适合SMP集群的编程模型,首次提出MPI+CUDA多粒度混合并行编程的新方法,节点间采用MPI实现粗粒度并行,节点内采用CUDA实现细粒度并行的混合编程方式.利用此方法在搭建的3节点SMP集群环境中,测试了大规模矩阵乘问题的并行计算能力.实验结果表明,该方法能够显著提升并行效率,同时证明MPI+CUDA混合编程模型能够充分发挥SMP集群节点间分布式存储和节点内共享内存的优势,为装有CUDA-enabled GPU的SMP集群提供了一种有效的并行策略. 相似文献
3.
A method is presented that eliminates the problem of the conventional quadratic performance criterion not being effective for some real-world systems because the performance parameters are seldom related to meaningful quantities. Globally searching the performance index allows the index to have local minimums as well as discontinuities, so it can be defined in meaningful terms. This ability to define meaningful performance indexes potentially can reduce the design time and produce better controls for nonlinear systems. The method has been implemented and tested with a simulated nonlinear system. Comparison to optimal control theory shows that the methodology has merit. Gains occur because the controller can be nonlinear and the system can be efficiently optimized to have the desired characteristics 相似文献
4.
James Dinan Pavan Balaji Darius Buntinas David Goodell William Gropp Rajeev Thakur 《Concurrency and Computation》2016,28(17):4385-4404
The Message Passing Interface (MPI) 3.0 standard includes a significant revision to MPI's remote memory access (RMA) interface, which provides support for one‐sided communication. MPI‐3 RMA is expected to greatly enhance the usability and performance of MPI RMA. We present the first complete implementation of MPI‐3 RMA and document implementation techniques and performance optimization opportunities enabled by the new interface. Our implementation targets messaging‐based networks and is publicly available in the latest release of the MPICH MPI implementation. Using this implementation, we explore the performance impact of new MPI‐3 functionality and semantics. Results indicate that the MPI‐3 RMA interface provides significant advantages over the MPI‐2 interface by enabling increased communication concurrency through relaxed semantics in the interface and additional routines that provide new window types, synchronization modes, and atomic operations. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献
5.
6.
This paper describes the Java MPI bindings that have been included in the Open MPI distribution. Open MPI is one of the most popular implementations of MPI, the Message-Passing Interface, which is the predominant programming paradigm for parallel applications on distributed memory computers. We have added Java support to Open MPI, exposing MPI functionality to Java programmers. Our approach is based on the Java Native Interface, and has similarities with previous efforts, as well as important differences. This paper serves as a reference for the application program interface, and in addition we provide details of the internal implementation to justify some of the design decisions. We also show some results to assess the performance of the bindings. 相似文献
7.
Wen Long Ximing Liang Yafei Huang Yixiong Chen 《Neural computing & applications》2014,24(3-4):911-926
In recognition of high-quality wideband speech codecs, several standardization activities have been conducted, resulting in the selection of a wideband speech codec called adaptive multi-rate wideband (AMR-WB). The algebraic code-excited linear prediction (ACELP) technique is recommended in AMR-WB, and it is noted that most of the complexity in the ACELP structure comes from the codebook search. In this paper, a new method is proposed for codebook search based on the behavior of backward filtered target signal, d(n), introduced in ITU-T G.722.2 recommendation. To optimize the proposed scheme, five optimization algorithms (i.e., modified genetic algorithm, particle swarm optimization with dynamic inertia weight, bee colony optimization, modified differential evolution, and imperialist competition algorithm) are investigated. Experimental results show that the reduction in codebook search operations of the proposed method is able to reach up to 59 percent as compared with ITU-T G.722.2 recommendation. Meanwhile, BCO-based codebook search scheme has better convergence speed without significant degradation in quality metrics, such as segmental signal-to-noise ratio, mean opinion score, and perceptual evaluation of speech quality, when used in an AMR-WB speech codec. 相似文献
8.
Jianjun Liu K. L. Teo Xiangyu Wang Changzhi Wu 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2016,20(4):1305-1313
Differential search (DS) is a recently developed derivative-free global heuristic optimization algorithm for solving unconstrained optimization problems. In this paper, by applying the idea of exact penalty function approach, a DS algorithm, where an S-type dynamical penalty factor is introduced so as to achieve a better balance between exploration and exploitation, is developed for constrained global optimization problems. To illustrate the applicability and effectiveness of the proposed approach, a comparison study is carried out by applying the proposed algorithm and other widely used evolutionary methods on 24 benchmark problems. The results obtained clearly indicate that the proposed method is more effective and efficient over the other widely used evolutionary methods for most these benchmark problems. 相似文献
9.
Mojtaba Shivaie Mohammad T. Ameli 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2014,18(8):1615-1630
In this paper a new scenario-based framework is presented for transmission expansion planning (TEP) under normal and N–1 conditions. The proposed framework takes into account cost of network losses, cost of the transmission circuits and substations in the optimization process as objective functions, while considers short-term and also long-term constraints under normal and N–1 conditions as problem constraints. The proposed model is a non-convex optimization problem having a non-linear mixed-integer nature. A new improved harmony search algorithm (IHSA) is used in order to obtain the final optimal solution. The IHSA is a recently developed optimization algorithm which imitates the music improvisation process. In this process, the harmonists improvise their instrument pitches searching for the perfect state of harmony. The newly planning methodology has been demonstrated on the well-known Garver’s 6-bus test system and a real life network of south Brazilian electric power grid in order to demonstrate the feasibility and capabilities of the proposed algorithm. The detailed results of the case studies are presented and thoroughly analyzed. The obtained TEP results illustrate the sufficiency and profitableness of the newly developed method in expansion planning when compared with other methods. 相似文献
10.
随着科学工程计算大规模、高维数和长时程的特性越来越显著,浮点舍入误差的累积效应往往使得计算结果不可信,提高计算精度成为了并行计算领域研究的热点之一.基于M PIC H3框架,采用无误差变换技术构建新的数据格式和相应运算操作符,设计了高精度归约函数M PI_ACCU_REDUCE,实现了高精度的求和、求积和求L2范数3种... 相似文献
11.
Arrays that are distributed in a block-cyclic fashion are important for many applications in the computational sciences since they often lead to parallel algorithms with good load balancing properties. We consider the problem of redistributing such an array to a new block size. This operation is directly expressible in High Performance Fortran (HPF) and will arise in applications written in this language. Efficient message passing algorithms are given for the redistribution operation, expressed in the standardized message passing interface, MPI. The algorithms are analyzed and performance results from the IBM SP-1 and Intel Paragon are given and discussed. The results show that redistribution can be done in time comparable to other collective communication operations, such as broadcast and MPI_ALLTOALL. 相似文献
12.
13.
14.
15.
16.
During the past few years the interest paid to global optimization has rapidly increased. One of the main reasons is the new technology of parallel computers which offer computational power capable of solving global optimization problems in reasonable time. The method studied in this work is based on interval analysis which provides a reliable way for solving the problem. Despite the fact that the method contains a high degree of potential parallelism, it is not straight forward to parallelize due to its irregular and unpredictable computational behaviour. This paper deals with the problem of balancing the load dynamically, both with respect to the quantity and to the quality of the tasks. Efficient strategies are proposed and implemented on an Intel iPSC/2 hypercube. Since the sequential algorithm is used as a base it will be modified to suit the parallel algorithm. 相似文献
17.
针对单层型MPI集群通信效率不高的特点,通过对比分析单层型结构和树型结构在集群聚合通信中的不同,提出了一种基于树型结构的MPI集群系统设计方案.用以降低全局通信流量和均衡主控节点负载,从而改善集群通信效率,使集群的扩展更加灵活,通过实验验证了该方案的可行性. 相似文献
18.
We introduce Lemon, an MPI parallel I/O library that provides efficient parallel I/O of both binary and metadata on massively parallel architectures. Motivated by the demands of the Lattice Quantum Chromodynamics community, the data is stored in the SciDAC Lattice QCD Interchange Message Encapsulation format. This format allows for storing large blocks of binary data and corresponding metadata in the same file. Even if designed for LQCD needs, this format might be useful for any application with this type of data profile. The design, implementation and application of Lemon are described. We conclude with presenting the excellent scaling properties of Lemon on state-of-the-art high performance computers.Program summaryProgram title: LemonCatalogue identifier: AELP_v1_0Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AELP_v1_0.htmlProgram obtainable from: CPC Program Library, Queen?s University, Belfast, N. IrelandLicensing provisions: GNU General Public License version 3No. of lines in distributed program, including test data, etc.: 32 860No. of bytes in distributed program, including test data, etc.: 223 762Distribution format: tar.gzProgramming language: MPI and CComputer: Any which supports MPI I/OOperating system: AnyHas the code been vectorised or parallelised?: Yes. Includes MPI directives.RAM: Depending on input usedClassification: 11.5External routines: MPINature of problem: Distributed file I/O with metadataSolution method: MPI parallel I/O based implementation of LIME formatRunning time: Varies depending on file and architecture size, in the order of seconds 相似文献
19.
This contribution presents a new procedure for quantifying valve stiction in control loops based on global optimisation. Measurements of the controlled variable (PV) and controller output (OP) are used to estimate the parameters of a Hammerstein system, consisting of a connection of a two-parameter stiction model and a linear low-order process model. As the objective function is non-smooth, gradient-free optimisation algorithms, i.e., pattern search (PS) methods or genetic algorithms (GA), are used for fixing the global minimum of the parameters of the stiction model, subordinated with a least-squares estimator for identifying the linear model parameters. Some approaches for selecting the model structure of the linear model part are discussed. Results show that this novel optimisation-based technique recovers accurate and reliable estimates of the stiction model parameters, dead-band plus stick band (S) and slip jump (J), from normal (closed-loop) operating data for self-regulating and integrating processes. The robustness of the proposed approach was proven considering a range of test conditions including different process types, controller settings and measurement noise. Numerous simulation and industrial case studies are described to demonstrate the applicability of the presented techniques for different loops and for different amounts of stiction. 相似文献