共查询到20条相似文献,搜索用时 10 毫秒
1.
为了研究GPU的通用计算能力和适合SMP集群的编程模型,首次提出MPI+CUDA多粒度混合并行编程的新方法,节点间采用MPI实现粗粒度并行,节点内采用CUDA实现细粒度并行的混合编程方式.利用此方法在搭建的3节点SMP集群环境中,测试了大规模矩阵乘问题的并行计算能力.实验结果表明,该方法能够显著提升并行效率,同时证明MPI+CUDA混合编程模型能够充分发挥SMP集群节点间分布式存储和节点内共享内存的优势,为装有CUDA-enabled GPU的SMP集群提供了一种有效的并行策略. 相似文献
2.
A method is presented that eliminates the problem of the conventional quadratic performance criterion not being effective for some real-world systems because the performance parameters are seldom related to meaningful quantities. Globally searching the performance index allows the index to have local minimums as well as discontinuities, so it can be defined in meaningful terms. This ability to define meaningful performance indexes potentially can reduce the design time and produce better controls for nonlinear systems. The method has been implemented and tested with a simulated nonlinear system. Comparison to optimal control theory shows that the methodology has merit. Gains occur because the controller can be nonlinear and the system can be efficiently optimized to have the desired characteristics 相似文献
3.
4.
Wen Long Ximing Liang Yafei Huang Yixiong Chen 《Neural computing & applications》2014,24(3-4):911-926
In recognition of high-quality wideband speech codecs, several standardization activities have been conducted, resulting in the selection of a wideband speech codec called adaptive multi-rate wideband (AMR-WB). The algebraic code-excited linear prediction (ACELP) technique is recommended in AMR-WB, and it is noted that most of the complexity in the ACELP structure comes from the codebook search. In this paper, a new method is proposed for codebook search based on the behavior of backward filtered target signal, d(n), introduced in ITU-T G.722.2 recommendation. To optimize the proposed scheme, five optimization algorithms (i.e., modified genetic algorithm, particle swarm optimization with dynamic inertia weight, bee colony optimization, modified differential evolution, and imperialist competition algorithm) are investigated. Experimental results show that the reduction in codebook search operations of the proposed method is able to reach up to 59 percent as compared with ITU-T G.722.2 recommendation. Meanwhile, BCO-based codebook search scheme has better convergence speed without significant degradation in quality metrics, such as segmental signal-to-noise ratio, mean opinion score, and perceptual evaluation of speech quality, when used in an AMR-WB speech codec. 相似文献
5.
Jianjun Liu K. L. Teo Xiangyu Wang Changzhi Wu 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2016,20(4):1305-1313
Differential search (DS) is a recently developed derivative-free global heuristic optimization algorithm for solving unconstrained optimization problems. In this paper, by applying the idea of exact penalty function approach, a DS algorithm, where an S-type dynamical penalty factor is introduced so as to achieve a better balance between exploration and exploitation, is developed for constrained global optimization problems. To illustrate the applicability and effectiveness of the proposed approach, a comparison study is carried out by applying the proposed algorithm and other widely used evolutionary methods on 24 benchmark problems. The results obtained clearly indicate that the proposed method is more effective and efficient over the other widely used evolutionary methods for most these benchmark problems. 相似文献
6.
Mojtaba Shivaie Mohammad T. Ameli 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2014,18(8):1615-1630
In this paper a new scenario-based framework is presented for transmission expansion planning (TEP) under normal and N–1 conditions. The proposed framework takes into account cost of network losses, cost of the transmission circuits and substations in the optimization process as objective functions, while considers short-term and also long-term constraints under normal and N–1 conditions as problem constraints. The proposed model is a non-convex optimization problem having a non-linear mixed-integer nature. A new improved harmony search algorithm (IHSA) is used in order to obtain the final optimal solution. The IHSA is a recently developed optimization algorithm which imitates the music improvisation process. In this process, the harmonists improvise their instrument pitches searching for the perfect state of harmony. The newly planning methodology has been demonstrated on the well-known Garver’s 6-bus test system and a real life network of south Brazilian electric power grid in order to demonstrate the feasibility and capabilities of the proposed algorithm. The detailed results of the case studies are presented and thoroughly analyzed. The obtained TEP results illustrate the sufficiency and profitableness of the newly developed method in expansion planning when compared with other methods. 相似文献
7.
随着科学工程计算大规模、高维数和长时程的特性越来越显著,浮点舍入误差的累积效应往往使得计算结果不可信,提高计算精度成为了并行计算领域研究的热点之一.基于M PIC H3框架,采用无误差变换技术构建新的数据格式和相应运算操作符,设计了高精度归约函数M PI_ACCU_REDUCE,实现了高精度的求和、求积和求L2范数3种... 相似文献
8.
Arrays that are distributed in a block-cyclic fashion are important for many applications in the computational sciences since they often lead to parallel algorithms with good load balancing properties. We consider the problem of redistributing such an array to a new block size. This operation is directly expressible in High Performance Fortran (HPF) and will arise in applications written in this language. Efficient message passing algorithms are given for the redistribution operation, expressed in the standardized message passing interface, MPI. The algorithms are analyzed and performance results from the IBM SP-1 and Intel Paragon are given and discussed. The results show that redistribution can be done in time comparable to other collective communication operations, such as broadcast and MPI_ALLTOALL. 相似文献
9.
10.
11.
12.
During the past few years the interest paid to global optimization has rapidly increased. One of the main reasons is the new technology of parallel computers which offer computational power capable of solving global optimization problems in reasonable time. The method studied in this work is based on interval analysis which provides a reliable way for solving the problem. Despite the fact that the method contains a high degree of potential parallelism, it is not straight forward to parallelize due to its irregular and unpredictable computational behaviour. This paper deals with the problem of balancing the load dynamically, both with respect to the quantity and to the quality of the tasks. Efficient strategies are proposed and implemented on an Intel iPSC/2 hypercube. Since the sequential algorithm is used as a base it will be modified to suit the parallel algorithm. 相似文献
13.
针对单层型MPI集群通信效率不高的特点,通过对比分析单层型结构和树型结构在集群聚合通信中的不同,提出了一种基于树型结构的MPI集群系统设计方案.用以降低全局通信流量和均衡主控节点负载,从而改善集群通信效率,使集群的扩展更加灵活,通过实验验证了该方案的可行性. 相似文献
14.
We introduce Lemon, an MPI parallel I/O library that provides efficient parallel I/O of both binary and metadata on massively parallel architectures. Motivated by the demands of the Lattice Quantum Chromodynamics community, the data is stored in the SciDAC Lattice QCD Interchange Message Encapsulation format. This format allows for storing large blocks of binary data and corresponding metadata in the same file. Even if designed for LQCD needs, this format might be useful for any application with this type of data profile. The design, implementation and application of Lemon are described. We conclude with presenting the excellent scaling properties of Lemon on state-of-the-art high performance computers.Program summaryProgram title: LemonCatalogue identifier: AELP_v1_0Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AELP_v1_0.htmlProgram obtainable from: CPC Program Library, Queen?s University, Belfast, N. IrelandLicensing provisions: GNU General Public License version 3No. of lines in distributed program, including test data, etc.: 32 860No. of bytes in distributed program, including test data, etc.: 223 762Distribution format: tar.gzProgramming language: MPI and CComputer: Any which supports MPI I/OOperating system: AnyHas the code been vectorised or parallelised?: Yes. Includes MPI directives.RAM: Depending on input usedClassification: 11.5External routines: MPINature of problem: Distributed file I/O with metadataSolution method: MPI parallel I/O based implementation of LIME formatRunning time: Varies depending on file and architecture size, in the order of seconds 相似文献
15.
This contribution presents a new procedure for quantifying valve stiction in control loops based on global optimisation. Measurements of the controlled variable (PV) and controller output (OP) are used to estimate the parameters of a Hammerstein system, consisting of a connection of a two-parameter stiction model and a linear low-order process model. As the objective function is non-smooth, gradient-free optimisation algorithms, i.e., pattern search (PS) methods or genetic algorithms (GA), are used for fixing the global minimum of the parameters of the stiction model, subordinated with a least-squares estimator for identifying the linear model parameters. Some approaches for selecting the model structure of the linear model part are discussed. Results show that this novel optimisation-based technique recovers accurate and reliable estimates of the stiction model parameters, dead-band plus stick band (S) and slip jump (J), from normal (closed-loop) operating data for self-regulating and integrating processes. The robustness of the proposed approach was proven considering a range of test conditions including different process types, controller settings and measurement noise. Numerous simulation and industrial case studies are described to demonstrate the applicability of the presented techniques for different loops and for different amounts of stiction. 相似文献
16.
An Assessment of MPI Environments for Windows NT 总被引:1,自引:0,他引:1
Takeda K. Allsopp N. K. Hardwick J. C. Macey P. C. Nicole D. A. Cox S. J. Lancaster D. J. 《The Journal of supercomputing》2001,19(3):315-323
In this paper we evaluate the MPI environments currently available for Windows NT on the Intel IA32 and Compaq/DEC Alpha architectures. We present benchmark results for low-level communication and for the NAS Parallel Benchmarks to allow comparison with other systems, but our primary interest is determining real application performance and robustness in production cluster environments. For this we use PAFEC-FE, a large FORTRAN code for finite-element analysis. We present results from three MPI implementations, two architectures, and three networking technologies (10 and 100 Mbit/s Ethernet and 1 Gbit/s Myrinet). 相似文献
17.
18.
An efficient digital search algorithm that is based on an internal array structure called a double array, which combines the fast access of a matrix form with the compactness of a list form, is presented. Each arc of a digital search tree, called a DS-tree, can be computed from the double array in 0(1) time; that is to say, the worst-case time complexity for retrieving a key becomes 0(k ) for the length k of that key. The double array is modified to make the size compact while maintaining fast access, and algorithms for retrieval, insertion, and deletion are presented. If the size of the double array is n +cm , where n is the number of nodes of the DS-tree, m is the number of input symbols, and c is a constant particular to each double array, then it is theoretically proved that the worst-case times of deletion and insertion are proportional to cm and cm 2, respectively, and are independent of n . Experimental results of building the double array incrementally for various sets of keys show that c has an extremely small value, ranging from 0.17 to 1.13 相似文献
19.
对验证码识别技术进行介绍,例示了使用JCAPTCHA开源框架实现验证码的方法,并基于该框架设计完成了一个验证码构件,构件将验证码的实现方法进行封装.验证码构件可以方便地集成到Java EE项目中,实践证明,该构件的确能让验证码开发工作变得更为简单有效. 相似文献