首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Particle-in-cell (PIC) simulations with Monte-Carlo collisions are used in plasma science to explore a variety of kinetic effects. One major problem is the long run-time of such simulations. Even on modern computer systems, PIC codes take a considerable amount of time for convergence. Most of the computations can be massively parallelized, since particles behave independently of each other within one time step. Current graphics processing units (GPUs) offer an attractive means for execution of the parallelized code. In this contribution we show a one-dimensional PIC code running on NVIDIA® GPUs using the CUDA environment. A distinctive feature of the code is that size of the cells that the code uses to sort the particles with respect to their coordinates is comparable to size of the grid cells used for discretization of the electric field. Hence, we call the corresponding algorithm “fine-sorting”. Implementation details and optimization of the code are discussed and the speed-up compared to classical CPU approaches is computed.  相似文献   

2.
A parallel implementation via CUDA of the dynamic programming method for the knapsack problem on NVIDIA GPU is presented. A GTX 260 card with 192 cores (1.4 GHz) is used for computational tests and processing times obtained with the parallel code are compared to the sequential one on a CPU with an Intel Xeon 3.0 GHz. The results show a speedup factor of 26 for large size problems. Furthermore, in order to limit the communication between the CPU and the GPU, a compression technique is presented which decreases significantly the memory occupancy.  相似文献   

3.
We consider an inverse linear programming (LP) problem in which the parameters in both the objective function and the constraint set of a given LP problem need to be adjusted as little as possible so that a known feasible solution becomes the optimal one. We formulate this problem as a linear complementarity constrained minimization problem. With the help of the smoothed Fischer–Burmeister function, we propose a perturbation approach to solve the inverse problem and demonstrate its global convergence. An inexact Newton method is constructed to solve the perturbed problem and numerical results are reported to show the effectiveness of the approach.  相似文献   

4.
This paper considers the problem of choosing a single constant linear state feedback control law which produces satisfactory performance for each of several operating points of a system. The model for each operating point is assumed to be linear and the criterion for satisfactory performance is taken to be an infinite horizon quadratic cost functional. This problem is reformulated as a finite dimensional optimization over the linear feedback gains which can be readily solved using standard nonlinear optimization techniques provided a stabilizing initial value of the gains can be found. Although the direct solution of this problem will be discussed briefly, the major portion of the paper will be devoted to solution techniques when an initial stabilizing guess is not available.  相似文献   

5.
传统的多目标进化算法多是基于Pareto最优概念的类随机搜索算法,求解速度较慢,特别是当问题维度变高,需要群体规模较大时,上述问题更加凸显。这一问题已经获得越来越多研究人员以及从业人员的关注。实验仿真中可以发现,构造非支配集和保持群体多样性这两部分工作占用了算法99%以上的执行时间。解决上述问题的一个有效方法就是对这一部分算法进行并行化改造。本文提出了一种基于CUDA平台的并行化解决方案,采用小生境技术实现共享适应度来维持候选解集的多样性,将多目标进化算法的实现全部置于GPU端,区别于以往研究中非支配排序的部分工作以及群体多样性保持的全部工作仍在CPU上执行。通过对ZDT系列函数的仿真结果,可以看出本文算法性能远远优于NSGA-Ⅱ和NPGA。最后通过求解油品调和过程这一有约束多目标优化问题,可以看出在解决化工应用中的有约束多目标优化问题时,该算法依然表现出优异的加速效果。  相似文献   

6.
Many engineering and scientific problems need to solve boundary value problems for partial differential equations or systems of them. For most cases, to obtain the solution with desired precision and in acceptable time, the only practical way is to harness the power of parallel processing. In this paper, we present some effective applications of parallel processing based on hybrid CPU/GPU domain decomposition method. Within the family of domain decomposition methods, the so-called optimized Schwarz methods have proven to have good convergence behaviour compared to classical Schwarz methods. The price for this feature is the need to transfer more physical information between subdomain interfaces. For solving large systems of linear algebraic equations resulting from the finite element discretization of the subproblem for each subdomain, Krylov method is often a good choice. Since the overall efficiency of such methods depends on effective calculation of sparse matrix–vector product, approaches that use graphics processing unit (GPU) instead of central processing unit (CPU) for such task look very promising. In this paper, we discuss effective implementation of algebraic operations for iterative Krylov methods on GPU. In order to ensure good performance for the non-overlapping Schwarz method, we propose to use optimized conditions obtained by a stochastic technique based on the covariance matrix adaptation evolution strategy. The performance, robustness, and accuracy of the proposed approach are demonstrated for the solution of the gravitational potential equation for the data acquired from the geological survey of Chicxulub crater.  相似文献   

7.
The Finite-Difference Time-Domain (FDTD) method is commonly used for electromagnetic field simulations. Recently, successful hardware-accelerations using Graphics Processing Unit (GPU) have been reported for the large-scale FDTD simulations. In this paper, we present a performance analysis of the three-dimensional (3D) FDTD on GPU using the roofline model. We find that theoretical predictions on maximum performance agrees well with the experimental results. We also suggest the suitable optimization methods for the best performance of FDTD on GPU. In particular, the optimized 3D FDTD program on GPU (NVIDIA Geforce GTX 480) is shown to be 64 times faster than the naively implemented program on CPU (Intel Core i7 2600).  相似文献   

8.
基于图形处理器(GPU)的计算统一设备体系结构(compute unified device architecture,CUDA)构架,阐述了GPU用于通用计算的原理和方法.在Geforce8800GT下,完成了矩阵乘法运算实验.实验结果表明,随着矩阵阶数的递增,无论是GPU还是CPU处理,速度都在减慢.数据增加100倍后,GPU上的运算时间仅增加了3.95倍,而CPU的运算时间增加了216.66倍.  相似文献   

9.
Examples of homogeneous linear programming problems are investigated. The objective function of such a problem is not bounded below on a feasible set. A starting point is presented such that the affine scaling method generates a sequence of vectors that converges to zero. __________ Translated from Kibernetika i Sistemnyi Analiz, No. 1, pp. 178–179, January–February 2006.  相似文献   

10.
偏微分方程数值解法(包括有限差分法、有限元法)以及大量的数学物理方程数值解法最终都会演变成求解大型线性方程组。因此,探讨快速、稳定、精确的大型线性方程组解法一直是数值计算领域不断深入研究的课题且具有特别重要的意义。在迭代法中,共轭斜量法(又称共轭梯度法)被公认为最好的方法之一。但是,该方法最大缺点是仅适用于线性方程组系数矩阵为对称正定矩阵的情况,而且常规的CPU算法实现非常耗时。为此,通过将线性方程组系数矩阵作转换成对称矩阵后实施基于GPU-CUDA的快速共轭斜量法来解决一般性大型线性方程组的求解问题。试验结果表明:在求解效率方面,基于GPU-CUDA的共轭斜量法运行效率高,当线性方程组阶数超过3000时,其加速比将超过14;在解的精确性与求解过程的稳定性方面,与高斯列主元消去法相当。基于GPU-CUDA的快速共轭斜量法是求解一般性大型线性方程组快速而非常有效的方法。  相似文献   

11.
刘进锋  郭雷 《微型机与应用》2011,30(18):69-71,75
基于CUDA架构在GPU上实现了神经网络前向传播算法,该算法利用神经网络各层内神经元计算的并行性,每层使用一个Kernel函数来并行计算该层神经元的值,每个Kernel函数都根据神经网络的特性和CUDA架构的特点进行优化。实验表明,该算法比普通的CPU上的算法快了约7倍。研究结果对于提高神经网络的运算速度以及CUDA的适用场合都有参考价值。  相似文献   

12.
Graphics-processing units (GPUs) suitable for general-purpose numerical computation are now available with performances in excess of 1 Teraflops, faster by one to two orders of magnitude than conventional desktop CPUs. Monte Carlo particle transport algorithms are ideally suited to parallel processing architectures and so are good candidates for acceleration using a GPU. We have developed a general-purpose code that computes the transport of high energy (>1 keV) photons through arbitrary 3-dimensional geometry models, simulates their physical interactions and performs tallying and variance reduction. We describe a new algorithm, the particle-per-block technique, that provides a good match with the underlying GPU multiprocessor hardware design. Benchmarking against an existing CPU-based simulation running on a single-core of a commodity desktop CPU demonstrates that our code can accurately model X-ray transport, with an approximately 35-fold speed-up factor.  相似文献   

13.
This paper is an amendment to Hop’s paper [N.V. Hop, Solving linear programming problems under fuzziness and randomness environment using attainment values, Information Sciences 177 (2007) 2971-2984], in solving linear programming problems under fuzziness and randomness environments. Hop introduced a new characterization of relationship, attainment values, to enable the conversion of fuzzy (stochastic) linear programming models into corresponding deterministic linear programming models. The purpose of this paper is to provide a correction and an improvement of Hop’s analytical work through rationalization and simplification. More importantly, it is shown that Hop’s analysis does not support his demonstration or the solution-finding mechanism; the attainment values approach as he had proposed does not result in superior performance as compared to other existing approaches because it neglects some relevant and inevitable theoretical essentials. Two numerical examples from Hop’s paper are also employed to show that his approach, in the conversion of fuzzy (stochastic) linear programming problems to corresponding problems, is questionable and can neither find the maximum nor the minimum in the examples. The models of the examples are subsequently amended in order to derive the correct optimal solutions.  相似文献   

14.
This paper presents implementation strategies and optimization approaches for a D3Q19 lattice Boltzmann flow solver on nVIDIA graphics processing units (GPUs). Using the STREAM benchmarks we demonstrate the GPU parallelization approach and obtain an upper limit for the flow solver performance. We discuss the GPU-specific implementation of the solver with a focus on memory alignment and register shortage. The optimized code is up to an order of magnitude faster than standard two-socket x86 servers with AMD Barcelona or Intel Nehalem CPUs. We further analyze data transfer rates for the PCI-express bus to evaluate the potential benefits of multi-GPU parallelism in a cluster environment.  相似文献   

15.
Spherical harmonic transforms (SHT) are at the heart of many scientific and practical applications ranging from climate modelling to cosmological observations. In many of these areas, new cutting‐edge science goals have been recently proposed requiring simulations and analyses of experimental or observational data at very high resolutions and of unprecedented volumes. Both these aspects pose formidable challenge for the currently existing implementations of the transforms. This paper describes parallel algorithms for computing SHT with two variants of intra‐node parallelism appropriate for novel supercomputer architectures, multi‐core processors and Graphic Processing Units (GPU). It also discusses their performance, alone and embedded within a top‐level, Message Passing Interface‐based parallelisation layer ported from the S2HAT library, in terms of their accuracy, overall efficiency and scalability. We show that our inverse SHT run on GeForce 400 Series GPUs equipped with latest Compute Unified Device Architecture architecture (Fermi) outperforms the state of the art implementation for a multi‐core processor executed on a current Intel Core i7‐2600K. Furthermore, we show that an Message Passing Interface/Compute Unified Device Architecture version of the inverse transform run on a cluster of 128 Nvidia Tesla S1070 is as much as 3 times faster than the hybrid Message Passing Interface/OpenMP version executed on the same number of quad‐core processors Intel Nehalem for problem sizes motivated by our target applications. Performance of the direct transforms is however found to be at the best comparable in these cases. We discuss in detail the algorithmic solutions devised for the major steps involved in the transforms calculation, emphasising those with a major impact on their overall performance and elucidates the sources of the dichotomy between the direct and the inverse operations.Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

16.
Fractional Linear Programming (FLP) has many applications in management science as well as in engineering. We have developed a microcomputer program to solve linear and FLP problems. It is written in TURBO PASCAL which can be used on a wide variety of microcomputers. Because data entry constitutes a large proportion of the total computer solution time, careful attention has been placed on the human factors of human-computer interaction in that stage of program development. A test example is presented to demonstrate the usefulness of this program.  相似文献   

17.
Hybrid methods are promising tools in integer programming, as they combine the best features of different methods in a complementary fashion. This paper presents such a framework, integrating the notions of genetic algorithm, linear programming, and ordinal optimization in an effort to shorten computation times for large and/or difficult integer programming problems. Capitalizing on the central idea of ordinal optimization and on the learning capability of genetic algorithms to quickly generate good feasible solutions, and then using linear programming to solve the problem that results from fixing the integer part of the solution, one may be able to obtain solutions that are close to optimal. Indeed ordinal optimization guarantees the quality of the solutions found. Numerical testing on a real-life complex scheduling problem demonstrates the effectiveness and efficiency of this approach.  相似文献   

18.
针对移动终端上GPU的高功耗问题,提出一种基于Android系统的GPU动态调频方案。方案根据各种应用对GPU的性能需求,引入了GPU的频率-性能模型,包括选择工作频率和测量相对性能的方法。动态调频算法通过历史负载计算出预测负载,将其代入频率-性能模型后预测出下一周期GPU的频率。实验结果表明,方案在典型场景下可以快速跟踪GPU负载的变化,预测GPU频率的准确率达到95%以上。  相似文献   

19.
In this study, an integration of the analytic hierarchy process (AHP) and a multi-objective possibilistic linear programming (MOPLP) technique is developed to account for all tangible, intangible, quantitative, and qualitative factors which are used to evaluate and select suppliers and to define the optimum order quantities assigned to each. A multi-objective linear programming technique is first employed to solve the problem. To model the uncertainties encountered in the integrated supplier evaluation and order allocation methodology, fuzzy theory is adopted. Hence, possibilistic linear programming (PLP) is proposed for solving the problem, as it is believed to be the best approach for absorbing the imprecise nature of the real world. In the supplier evaluation phase, environmental criteria are also considered.  相似文献   

20.
In this paper, we revisit the mean-variance model of Markowitz and the construction of the risk-return efficient frontier. A few other models, such as the mean absolute deviation, the minimax and maximin, and models with diagonal quadratic form as objectives, which use alternative metrics for risk are also introduced. Then we present a neurodynamic model for solving these kinds of problems. By employing Lyapunov function approach, it is also shown that the proposed neural network model is stable in the sense of Lyapunov and it is globally convergent to an exact optimal solution of the original problem. The validity and transient behavior of the neural network are demonstrated by using several examples of portfolio selection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号