首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
图形处理器CUDA编程模型的应用研究   总被引:5,自引:0,他引:5  
由于图形处理器(GPU)最近几年的快速发展,基于 GPU 的通用计算已经成为一个新的研究领域.通过对nVIDIA 公司最新的通用计算 GPU 编程模型-CUDA 的研究,阐明了 CUDA 应用程序的结构和它本身特征,讨论和分析了 CUDA 编程方法与普通 CPU 编程的差别,并以 H.264 数字视频编解码中,以消除宏块边界锯齿为主要目的的去块滤波模块为实例.详细描述了 CUDA 编程的方法和特点,最后通过与 CPU 编程实现的去块滤波模块的性能比较,揭示了 CUDA 在计算能力上的优势,为进一步优化编解码器性能和 GPU 通用计算提供了新的方法和思路.  相似文献   

2.
张润梅  王霄 《计算机科学》2011,38(2):302-305
由于内存、运算速度以及磁盘空间的限制,暴力破解MD5几乎无法在PC机上实现。CUDA意在使GPU的超高计算性能在数据处理和科学计算等通用计算领域发挥优势。主要研究基于CUD八架构的MD5破解方法,并使用VS2005与NVCC进行混合编译。实验选择在GeForce9600UT显卡和四核CPUQ660。上分别运行所提程序和标准C语言版程序。结果表明,在高计算负荷与巨量数据情况下,中低端显卡的计算速度比高端CPU高30~50。倍。CUDA使GPU流处理器阵列的性能得到充分发挥,极大地提高了并行计算程序的效率。  相似文献   

3.
基于统一计算设备架构技术的并行图像处理研究   总被引:1,自引:0,他引:1  
对统一计算设备架构CUDA技术进行研究,分析了CUDAGPU的显著特性,总结了CUDA的通用并行程序模式,详细介绍了用CUDA实现直方图均衡化的过程,接着简要介绍了CUDA在其它图像处理算法中的应用;最后对比CPU和GPU计算256级直方图均衡化的时间,实验结果表明随着图像像素的增大,CUDA可以把计算速度提高40多倍,在其它的图像算法中,甚至可以上百倍地提高速度.  相似文献   

4.
为满足文本检索、计算生物学等领域海量数据匹配对高性能计算的要求,提出一种基于计算统一设备架构(CUDA)的位并行近似串匹配算法。结合图形处理器(GPU)的高并行计算结构及存储带宽特性,通过优化数据存储方式,实现并行化动态规划矩阵算法(BPM)的加速,并对加速性能进行对比测试。实验结果表明,BPM算法通过GPU加速能获得20倍左右的加速比。  相似文献   

5.
This paper presents implementation strategies and optimization approaches for a D3Q19 lattice Boltzmann flow solver on nVIDIA graphics processing units (GPUs). Using the STREAM benchmarks we demonstrate the GPU parallelization approach and obtain an upper limit for the flow solver performance. We discuss the GPU-specific implementation of the solver with a focus on memory alignment and register shortage. The optimized code is up to an order of magnitude faster than standard two-socket x86 servers with AMD Barcelona or Intel Nehalem CPUs. We further analyze data transfer rates for the PCI-express bus to evaluate the potential benefits of multi-GPU parallelism in a cluster environment.  相似文献   

6.
Recent development in Graphics Processing Units (GPUs) has enabled inexpensive high performance computing for general-purpose applications. Compute Unified Device Architecture (CUDA) programming model provides the programmers adequate C language like APIs to better exploit the parallel power of the GPU. Data mining is widely used and has significant applications in various domains. However, current data mining toolkits cannot meet the requirement of applications with large-scale databases in terms of speed. In this paper, we propose three techniques to speedup fundamental problems in data mining algorithms on the CUDA platform: scalable thread scheduling scheme for irregular pattern, parallel distributed top-k scheme, and parallel high dimension reduction scheme. They play a key role in our CUDA-based implementation of three representative data mining algorithms, CU-Apriori, CU-KNN, and CU-K-means. These parallel implementations outperform the other state-of-the-art implementations significantly on a HP xw8600 workstation with a Tesla C1060 GPU and a Core-quad Intel Xeon CPU. Our results have shown that GPU + CUDA parallel architecture is feasible and promising for data mining applications.  相似文献   

7.
张丹丹  徐莹  徐磊 《计算机科学》2012,39(4):296-298,303
对CPU+GPU异构平台下的多种并行编程模式进行了研究,并针对格子Boltzmann方法实现了CUDA,MPI+CUDA,MPI+OpenMP+CUDA多级并行算法。结果表明,算法具有较好的加速性能;提出的根据计算量比例参数调节CPU和GPU之间负载均衡的方法,对于在异构平台上实现多级并行处理及资源的有效利用具有一定的参考和应用价值。  相似文献   

8.
Methods for implementing variable surface tension in the multiphase Lattice Boltzmann model with the color model and Shan-Chen scheme are tested by analyzing the models’ abilities to reproduce a theoretical result by Levich and Kuznetzov. If the surface tension around a droplet is asymmetrical, the droplet moves towards the side where the surface tension is lower. The droplet’s velocity is proportional to the surface tension gradient, the droplet’s radius, and the inverse of the viscosity. The model is tested to determine whether the simulated droplets move in the manner predicted by theory. Although the discreteness of the underlying lattice causes a spurious oscillation to the velocity, the numerical results concerning the average velocity show a good correspondence between theory and the model in regards to the surface tension gradient and droplet size. The color model also produces good simulations in the scenarios with different viscosities, while the diffusive properties and unknown relationships between the parameters and surface tension in the Shan-Chen model make the numerical results of that model more dubious, even though several of the results are qualitatively in agreement.  相似文献   

9.
The implementation of a proof-of-concept Lattice Quantum Chromodynamics kernel on the Cell processor is described in detail, illustrating issues encountered in the porting process. The resulting code performs up to 45 GFlop/s per socket (without inter-node parallel communications), indicating that the Cell processor is likely to be a good platform for future Lattice QCD calculations.  相似文献   

10.
陈妍妍 《计算机安全》2007,(7):38-39,44
该文分析了企业网络的安全需求及其采用的安全策略的不足,阐述了什么是一元化安全结构,一元化安全结构的好处,及目前一元化安全结构的应用。  相似文献   

11.
研究了对流扩散方程、Burgers方程和Modified-Burgers方程等具有相同形式的一类偏微分方程。并且构建了带修正函数项的D1Q3格子Boltzmann模型求解这类方程。为了能准确地恢复出此宏观方程,利用Chapman-Enskog展开和多尺度分析技术,推导出了各个方向的平衡态分布函数和修正函数的具体表达式。数值计算结果表明该模型是稳定、有效的。  相似文献   

12.
The suitable surface modification of microfluidic channels can enable a neutral electrolyte solution to develop an electric double layer (EDL). The ions contained within the EDL can be moved by applying an external electric field, inducing electroosmotic flows (EOFs) that results in associated stirring. This provides a solution for the rapid mixing required for many microfluidic applications. We have investigated EOFs generated by applying a steady electric field across a square cavity that has homogenous electric potentials along its walls. The flowfield is simulated using the lattice Boltzmann method. The extent of mixing is characterized for different electrode configurations and electric field strengths. We find that rapid mixing can be achieved by using this simple configuration which increases with increasing electric field strength. The mixing time for water-soluble organic molecules can be decreased by four orders of magnitude by suitable choice of wall zeta potential and electric field. We dedicate this paper to the memory of our colleagues Professors Kevin Granata and Liviu Librescu who fell tragically on April 16, 2007 while answering their call to serve higher education. They continue to inspire us. AM gratefully acknowledges support from Jadavpur University under the World Bank funded Technical Education Quality Improvement Programme of the Government of India and the hospitality of the Virginia Tech ESM Department where he conducted a portion of this work.  相似文献   

13.
The efficiency of the valve-less rectification micropump depends primarily on the microfluidic diodicity (the ratio of the backward pressure drop to the forward pressure drop). In this study, different rectifying structures, including the conventional structures (nozzle/diffuser and Tesla structures), were investigated at very low Reynolds numbers (between 0.2 and 60). The rectifying structures were characterized with respect to their design, and a numerical approach was illustrated to calculate the diodicity for the rectifying structures. In this study, the microfluidic diodicity was evaluated numerically for different rectifying structures including half circle, semicircle, heart, triangle, bifurcation, nozzle/diffuser, and Tesla structures. The Lattice Boltzmann Method (LBM) was utilized as a numerical method to simulate the fluid flow in the microscale. The results suggest that at very low Reynolds number flow, rectification and multifunction micropumping may be achievable by using a number of the presented structures. The results for the conventional structures agree with the reported results.  相似文献   

14.
This work is concerned with the computation of two- and four-sided lid-driven square cavity flows and also two-sided rectangular cavity flows with parallel wall motion by the Lattice Boltzmann Method (LBM) to obtain multiple stable solutions. In the two-sided square cavity two of the adjacent walls move with equal velocity and in the four-sided square cavity all the four walls move in such a way that parallel walls move in opposite directions with the same velocity; in the two-sided rectangular lid-driven cavity flow the longer facing walls move in the same direction with equal velocity. Conventional numerical solutions show that the symmetric solutions exist for all Reynolds numbers for all the geometries, whereas multiplicity of stable states exist only above certain critical Reynolds numbers. Here we demonstrate that Lattice Boltzmann method can be effectively used to capture multiple steady solutions for all the aforesaid geometries. The strategy employed to obtain these solutions is also described.  相似文献   

15.
Owing to its kinetic nature and distinctive computational features, the lattice Boltzmann method for simulating rarefied gas flows has attracted significant research interest in recent years. In this article, a lattice Boltzmann (LB) model is presented to study microchannel flows in the transition flow regime, which have gained much attention because of fundamental scientific issues and technological applications in various micro-electro-mechanical system (MEMS) devices. In the model, a Bosanquet-type effective viscosity is used to account for the rarefaction effect on gas viscosity. To match the introduced effective viscosity and to gain an accurate simulation, a modified second-order slip boundary condition with a new set of slip coefficients is proposed. Numerical investigations demonstrate that the results, including the velocity profile, the non-linear pressure distribution along the channel, and the mass flow rate, are in good agreement with the solution of the linearized Boltzmann equation, the direct simulation Monte Carlo (DSMC) results, and the experimental results over a broad range of Knudsen numbers. It is shown that taking the rarefaction effect on gas viscosity into consideration and employing an appropriate slip boundary condition can lead to a significant improvement in the modeling of rarefied gas flows with moderate Knudsen numbers in the transition flow regime.  相似文献   

16.
In the last few years, the applications of support vector machine (SVM) have substantially increased due to the high generalization performance and modeling of non-linear relationships. However, whether SVM behaves well largely depends on its adopted kernel function. The most commonly used kernels include linear, polynomial inner product functions and the Radial Basis Function (RBF), etc. Since the nature of the data is usually unknown, it is very difficult to make, on beforehand, a proper choice from the mentioned kernels. Usually, more than one kernel are applied to select the one which gives the best prediction performance but with a very time-consuming optimization procedure. This paper presents a kernel function based on Lorentzian function which is well-known in the field of statistics. The presented kernel can properly deal with a large variety of mapping problems due to its flexibility to vary. The applicability, suitability, performance and robustness of the presented kernel are investigated on bi-spiral benchmark data set as well as seven data sets from the UCI benchmark repository. The experiment results demonstrate that the presented kernel is robust and has stronger mapping ability comparing with the standard kernel functions, and it can obtain better generalization performance. In general, the proposed kernel can be served as a generic alternative for the common linear, polynomial and RBF kernels.  相似文献   

17.
An implementation of the lattice Boltzmann method on a homogeneous cluster of IBM RISC System/6000 superscalar workstations is presented.  相似文献   

18.
The lattice Boltzmann method (LBM) for multicomponent immiscible fluids is applied to simulations of the deformation and breakup of a particle-cluster aggregate in shear flows. In the simulations, the solid particle is modeled by a droplet with strong interfacial tension and large viscosity. The van der Waals attraction force is taken into account for the interaction between the particles. The ratio of the hydrodynamic drag force to cohesive force, I, is introduced, and the effect of I on the aggregate deformation and breakup in shear flows is investigated. It is found that the aggregate is easier to deform and to be dispersed when I is over 100.  相似文献   

19.
In this paper, the viscous fingering phenomenon of two immiscible fluids in a channel is studied by applying the lattice Boltzmann method (LBM). The fundamental physical mechanisms of a finger formation or the interface evolution between immiscible fluids are described in terms of the relative importance of viscous forces, surface tension, and gravity, which are quantifiable via the dimensionless quantities, namely, capillary number, Bond number and viscosity ratio between displaced fluid and displacing fluid. In addition, the effect of wettability on flow behaviour of fluids is investigated for the cases with and without consideration of gravity, respectively. The numerical results provide a good understanding of the mechanisms of viscous fingering phenomenon from a mesoscopic point of view and confirm that the LBM can be viewed as a promising tool for investigating fluid behaviour and other immiscible displacement problems.  相似文献   

20.
In this paper, the discrete velocity model proposed by Kataoka and Tsutahara (Phys. Rev. E 69(5):056702, 2004) for simulating inviscid flows is employed. Three approaches for improving the stability and the accuracy of this model, especially for high Mach numbers, are suggested and implemented in this research. First, the TVD scheme (Harten in J. Comput. Phys. 49:357?C393, 1983) is used for space discretization of the convective term in the Lattice Boltzmann equation. Next, the modified Lax-Wendroff with artificial viscosity is employed to increase the robustness of the method in supersonic flows. Finally, a combination of TVD and the 2nd order derivative of the distribution function is employed using a differentiable switch. It is found that the recent technique is a more suitable approach for a wide range of Mach numbers. Moreover, the WENO scheme for space discretization has been applied and compared with these newly applied methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号