首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
Modern graphical processing units (GPUs) offer much more computational power than modern central processing units. Therefore, it is natural that GPUs are applied not only for their original purposes, but also for general processing (GPGPU). In the field of sequence processing, one of the most important problems is the measuring of sequence similarity. There are many sequence similarity measures, e.g. edit distance, longest common subsequence length, and their derivatives. We examine the possibility of speeding up the algorithms computing some of them. We chose three measures useful in different situations. The experimental results show that the GPU versions of the examined algorithms are faster than their serial counterparts by a factor between 4 and 65. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

Particle-in-cell (PIC) simulations with Monte-Carlo collisions are used in plasma science to explore a variety of kinetic effects. One major problem is the long run-time of such simulations. Even on modern computer systems, PIC codes take a considerable amount of time for convergence. Most of the computations can be massively parallelized, since particles behave independently of each other within one time step. Current graphics processing units (GPUs) offer an attractive means for execution of the parallelized code. In this contribution we show a one-dimensional PIC code running on NVIDIA® GPUs using the CUDA environment. A distinctive feature of the code is that size of the cells that the code uses to sort the particles with respect to their coordinates is comparable to size of the grid cells used for discretization of the electric field. Hence, we call the corresponding algorithm “fine-sorting”. Implementation details and optimization of the code are discussed and the speed-up compared to classical CPU approaches is computed.  相似文献   

Because layered low‐density parity‐check (LDPC) decoding algorithm was proposed, one can exploit the diversity gain to achieve performance comparable to the traditional two‐phase message passing (TPMP) decoding but with about twice faster decoding convergence compared to TPMP. In order to reduce the decoding time of layered LDPC decoder, a graphics processing unit (GPU) is exploited as the modem processor so that the decoding procedure can be processed in parallel using numerous threads in the GPU. In this paper, we present the parallel algorithms and efficient implementations on the GPU for two different layered message passing schemes, the row‐layered and column‐layered decoding. In the experiments, the quasicyclic LDPC codes for WiFi (802.11n) and WiMAX (802.16e) are decoded by the proposed layered LDPC decoders. The experimental results show that our decoder has good bit error ratio (BER) performance comparable to TPMP decoder. The peak throughput is 712 Mbps, which is about two orders of magnitude faster than that of CPU implementation and comparable to the dedicated hardware solutions. Compared to the existing fastest GPU‐based implementation, the presented decoder can achieve a performance improvement of 2.3 times. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

Finding similar items in a large and unstructured dataset is a challenging task in many applications of data science, such as searching, indexing, and retrieval. With the increasing data volume and demand for real time responses, similarity search has gained much consideration. In this paper, a parallel computational approach for similarity search using Bloom filters (PCASSB) has been proposed, which uses Bloom filter for the representation of features of document and comparison with user's query. Query features are stored in integer query array (IQA), an array of integer. The PCASSB, an approximate similarity search technique, has been implemented on graphics processing unit with compute unified device architecture as the programming platform. To compute the similarity score between query and reference dataset, Dice coefficient has been used as a baseline method. The accuracy of the results generated by PCASSB is compared with the baseline method and other state‐of‐the‐art methods. The experimental results show that the proposed technique is quite effective in processing large number of text documents as it takes less computational time.  相似文献   

A parallel implementation via CUDA of the dynamic programming method for the knapsack problem on NVIDIA GPU is presented. A GTX 260 card with 192 cores (1.4 GHz) is used for computational tests and processing times obtained with the parallel code are compared to the sequential one on a CPU with an Intel Xeon 3.0 GHz. The results show a speedup factor of 26 for large size problems. Furthermore, in order to limit the communication between the CPU and the GPU, a compression technique is presented which decreases significantly the memory occupancy.  相似文献   

在给出灰色线性双层指派问题数学模型及相关定义的基础上,利用灰色理论中的定位规划技术得到原问题的漂移型模型。针对其漂移型模型,基于遗传算法提出一个智能全局优化方法,并给出具体算法步骤。为了验证算法的有效性,分别构造小规模测试问题和中大规模测试问题验证了所给算法的正确性和实时性。  相似文献   

贺怀清  孙希栋 《计算机应用》2012,32(7):1939-1942
针对串行情况下光子映射算法速度慢的问题,对光子映射算法并行化进行可行性分析,充分利用图像处理器(GPU)的统一设备计算架构(CUDA)的并行和计算能力,实现光子映射算法的并行化。同时针对算法中光子发射追踪阶段生成GPU线程数与光子数相同的方法的不足以及平均分配方法所造成的资源浪费等,提出线程之间协同工作的方法并采用动态平衡处理,使光子渲染速度提升了将近一倍。实验结果证明了多线程间协同工作及动态平衡相结合方法的有效性。  相似文献   

传统的多目标进化算法多是基于Pareto最优概念的类随机搜索算法,求解速度较慢,特别是当问题维度变高,需要群体规模较大时,上述问题更加凸显。这一问题已经获得越来越多研究人员以及从业人员的关注。实验仿真中可以发现,构造非支配集和保持群体多样性这两部分工作占用了算法99%以上的执行时间。解决上述问题的一个有效方法就是对这一部分算法进行并行化改造。本文提出了一种基于CUDA平台的并行化解决方案,采用小生境技术实现共享适应度来维持候选解集的多样性,将多目标进化算法的实现全部置于GPU端,区别于以往研究中非支配排序的部分工作以及群体多样性保持的全部工作仍在CPU上执行。通过对ZDT系列函数的仿真结果,可以看出本文算法性能远远优于NSGA-Ⅱ和NPGA。最后通过求解油品调和过程这一有约束多目标优化问题,可以看出在解决化工应用中的有约束多目标优化问题时,该算法依然表现出优异的加速效果。  相似文献   

We consider an inverse linear programming (LP) problem in which the parameters in both the objective function and the constraint set of a given LP problem need to be adjusted as little as possible so that a known feasible solution becomes the optimal one. We formulate this problem as a linear complementarity constrained minimization problem. With the help of the smoothed Fischer–Burmeister function, we propose a perturbation approach to solve the inverse problem and demonstrate its global convergence. An inexact Newton method is constructed to solve the perturbed problem and numerical results are reported to show the effectiveness of the approach.  相似文献   

Many engineering and scientific problems need to solve boundary value problems for partial differential equations or systems of them. For most cases, to obtain the solution with desired precision and in acceptable time, the only practical way is to harness the power of parallel processing. In this paper, we present some effective applications of parallel processing based on hybrid CPU/GPU domain decomposition method. Within the family of domain decomposition methods, the so-called optimized Schwarz methods have proven to have good convergence behaviour compared to classical Schwarz methods. The price for this feature is the need to transfer more physical information between subdomain interfaces. For solving large systems of linear algebraic equations resulting from the finite element discretization of the subproblem for each subdomain, Krylov method is often a good choice. Since the overall efficiency of such methods depends on effective calculation of sparse matrix–vector product, approaches that use graphics processing unit (GPU) instead of central processing unit (CPU) for such task look very promising. In this paper, we discuss effective implementation of algebraic operations for iterative Krylov methods on GPU. In order to ensure good performance for the non-overlapping Schwarz method, we propose to use optimized conditions obtained by a stochastic technique based on the covariance matrix adaptation evolution strategy. The performance, robustness, and accuracy of the proposed approach are demonstrated for the solution of the gravitational potential equation for the data acquired from the geological survey of Chicxulub crater.  相似文献   

The Finite-Difference Time-Domain (FDTD) method is commonly used for electromagnetic field simulations. Recently, successful hardware-accelerations using Graphics Processing Unit (GPU) have been reported for the large-scale FDTD simulations. In this paper, we present a performance analysis of the three-dimensional (3D) FDTD on GPU using the roofline model. We find that theoretical predictions on maximum performance agrees well with the experimental results. We also suggest the suitable optimization methods for the best performance of FDTD on GPU. In particular, the optimized 3D FDTD program on GPU (NVIDIA Geforce GTX 480) is shown to be 64 times faster than the naively implemented program on CPU (Intel Core i7 2600).  相似文献   

This paper proposes a parallel scheme for accelerating parameter sweep applications on a graphics processing unit. By using hundreds of cores on the graphics processing unit, we found that our scheme simultaneously processes multiple parameters rather than a single parameter. The simultaneous sweeps exploit the similarity of computing behaviors shared by different parameters, thus allowing memory accesses to be coalesced into a single access if similar irregularities appear among the parameters’ computational tasks. In addition, our scheme reduces the amount of off‐chip memory access by unifying the data that are commonly referenced by multiple parameters and by placing the unified data in the fast on‐chip memory. In several experiments, we applied our scheme to practical applications and found that our scheme can perform up to 8.5 times faster than a naive scheme that processes a single parameter at a time. We also include a discussion on application characteristics that are required for our scheme to outperform the naive scheme. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

This paper considers the problem of choosing a single constant linear state feedback control law which produces satisfactory performance for each of several operating points of a system. The model for each operating point is assumed to be linear and the criterion for satisfactory performance is taken to be an infinite horizon quadratic cost functional. This problem is reformulated as a finite dimensional optimization over the linear feedback gains which can be readily solved using standard nonlinear optimization techniques provided a stabilizing initial value of the gains can be found. Although the direct solution of this problem will be discussed briefly, the major portion of the paper will be devoted to solution techniques when an initial stabilizing guess is not available.  相似文献   

大多数图像处理算法都可利用GPU进行加速以达到更好的执行性能,但数据传输操作与核函数执行之间的调度策略问题仍是桎梏加速性能进一步提升的主要瓶颈。为了解决这个问题,通常采用GPU任务流将核函数执行与数据传输操作进行重叠,以隐藏部分数据传输与核函数执行耗时。但是,由于CUDA编程模型的特性以及GPU硬件资源的限制,在某些情况下,即使创建较多的任务流用于任务重叠,每个流上仍会存在串行执行的任务,导致加速效果无法进一步提升。因此,考虑利用CSS将待处理图像进行合并从而将单个流中的算子核函数及数据传输操作进行合并,以减少数据传输操作和核函数执行的固定代价及调用间隙。通过实验结果可知,提出的CSS结构不仅能在单流的情况下提高GPU图像处理算法执行性能,在多流的情况下其加速性能也得到了进一步提升,具有较好的实用性及可扩展性,适用于包含较多算子操作或较小尺寸图像批量处理的情况。此外,提出的方法对图像处理算法的GPU加速提供了新的研究思路。  相似文献   

图形处理器在通用计算中的应用   总被引:1,自引:1,他引:0  
基于图形处理器(GPU)的计算统一设备体系结构(compute unified device architecture,CUDA)构架,阐述了GPU用于通用计算的原理和方法.在Geforce8800GT下,完成了矩阵乘法运算实验.实验结果表明,随着矩阵阶数的递增,无论是GPU还是CPU处理,速度都在减慢.数据增加100倍后,GPU上的运算时间仅增加了3.95倍,而CPU的运算时间增加了216.66倍.  相似文献   

Examples of homogeneous linear programming problems are investigated. The objective function of such a problem is not bounded below on a feasible set. A starting point is presented such that the affine scaling method generates a sequence of vectors that converges to zero. __________ Translated from Kibernetika i Sistemnyi Analiz, No. 1, pp. 178–179, January–February 2006.  相似文献   

偏微分方程数值解法(包括有限差分法、有限元法)以及大量的数学物理方程数值解法最终都会演变成求解大型线性方程组。因此,探讨快速、稳定、精确的大型线性方程组解法一直是数值计算领域不断深入研究的课题且具有特别重要的意义。在迭代法中,共轭斜量法(又称共轭梯度法)被公认为最好的方法之一。但是,该方法最大缺点是仅适用于线性方程组系数矩阵为对称正定矩阵的情况,而且常规的CPU算法实现非常耗时。为此,通过将线性方程组系数矩阵作转换成对称矩阵后实施基于GPU-CUDA的快速共轭斜量法来解决一般性大型线性方程组的求解问题。试验结果表明:在求解效率方面,基于GPU-CUDA的共轭斜量法运行效率高,当线性方程组阶数超过3000时,其加速比将超过14;在解的精确性与求解过程的稳定性方面,与高斯列主元消去法相当。基于GPU-CUDA的快速共轭斜量法是求解一般性大型线性方程组快速而非常有效的方法。  相似文献   

高效求解整数线性规划问题的分支算法   总被引:1,自引:0,他引:1  
高培旺 《计算机应用》2010,30(4):1019-1021
为了提高求解一般整数线性规划问题的效率,提出了一种基于目标函数超平面移动的分支算法。对于给定的目标函数整数值,首先利用线性规划松弛问题的最优单纯形表确定变量的上、下界,然后将变量的上、下界条件加入约束条件中对相应的目标函数超平面进行切割,最后应用分支定界算法中的分支方法来搜寻目标函数超平面上的可行解。通过对一些经典的数值例子的求解计算并与经典的分支定界算法进行比较,结果表明,该算法减少了分支数和单纯形迭代数,具有较大的实用价值。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号