首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 183 毫秒
1.
为了提高H.264视频编码效率,基于计算统一设备架构(CUDA)的并行全搜索运动估计算法,并利用GPU强大的计算能力和CUDA优化的存储层次结构,以加速H.264编码中的运动估计.与传统的以牺牲视频质量来提升运动估计性能的方法不同,该算法在保证视频质量的同时,结合运动估计计算密集、计算量大等特点,充分利用CUDA架构的并行性加快运动估计的速度,从而达到提高实时编码速度的目的.在GTX280实验平台上的实验结果显示,采用文中算法比优化的CPU实现可获得高达70倍的加速比.  相似文献   

2.
针对传统边界元法计算量大、计算效率低的问题,以三维弹性静力学的边界元法为对象,将基于CUDA的GPU并行计算应用到其边界元计算中,提出了基于CUDA架构的GPU并行算法.该算法首先对不同类型的边界元系数积分进行并行性分析,描述了相关的GPU并行算法,然后阐述了边界元方程组的求解方法及其并行策略.实验结果表明,文中算法较传统算法具有显著的加速效果.  相似文献   

3.
基于GPU的位并行多模式串匹配研究   总被引:1,自引:0,他引:1       下载免费PDF全文
赵光南  吴承荣 《计算机工程》2011,37(14):265-267
图形处理器(GPU)具有较强的单一运算能力及高度并行的体系结构。根据上述特点,选择基于位并行技术的多模式串匹配算法M-BNDM,将其移植到GPU上加以实现和优化。通过对需要处理的数据进行预处理,将串匹配的过程简化为更适合CUDA计算数据的位操作。对基于CUDA架构的并行串匹配算法的性能影响因子进行分析。实验结果表明,与同等CPU算法相比,该算法能够获得约十几倍的加速比。  相似文献   

4.
应用GPU通用高性能编程技术实现一种加速地震叠前时间偏移的新方法.该技术是地震勘探处理的常规流程,其核心算法具有计算密集、数据独立性强、并行性高等特点.通过性能剖析获得其计算热点,通过CUDA技术对其进行并行化改造,并利用CUDA的流技术实现CPU到GPU的异步传输.通过集群环境下的性能测试,应用GPU并行化的PSTM程序可明显缩短运行时间.  相似文献   

5.
基于CUDA平台的遗传算法并行实现研究   总被引:2,自引:0,他引:2       下载免费PDF全文
CUDA技术方便程序员在GPU上进行通用计算,但并没有提供随机数产生的应用接口。为此,本文提出并实现在CUDA开发平台上并行产生均匀随机数算法,测试证明算法可行。在此基础上优化基本遗传算法,并在GPU上并行实现其所有操作,提高其运行速度和准确度;分析了种群大小和遗传代数对此算法加速比及准确度的影响,并与MAT-LAB工具箱进行比较。实验表明,相比MATLAB遗传算法工具箱,基于CUDA平台实现的遗传算法性能更高,准确度更好。  相似文献   

6.
基于CUDA的高速FFT计算*   总被引:1,自引:0,他引:1  
针对快速傅里叶算法FFT在图形图像处理和科学计算领域的重要作用,提出了一种基于CUDA的高速FFT计算方法,在分析GPU硬件平台执行模式及FFT算法并行性特征的基础上,采用多线程并行的映射方法实现算法,并从存储层次优化算法。实验结果表明该算法的高效性,优化后的FFT加速比能达到CUFFT库加速比的2-6倍。  相似文献   

7.
针对并行处理H.264标准视频流解码问题,提出基于CPU/GPU的协同运算算法。以统一设备计算架构(CUDA)语言作为GPU编程模型,实现DCT逆变换与帧内预测在GPU中的加速运算。在保持较高计算精度的前提下,结合CUDA混合编程,提高系统的计算性能。利用NIVIDIA提供的CUDA语言,在解码过程中使DCT逆变换和帧内预测在GPU上并行实现,将并行算法与CPU单机实现进行比较,并用不同数量的视频流验证并行解码算法的加速效果。实验结果表明,该算法可大幅提高视频流的编解码效率,比CPU单机的平均计算加速比提高10倍。  相似文献   

8.
基于CUDA的并行粒子群优化算法的设计与实现   总被引:1,自引:0,他引:1  
针对处理大量数据和求解大规模复杂问题时粒子群优化(PSO)算法计算时间过长的问题, 进行了在显卡(GPU)上实现细粒度并行粒子群算法的研究。通过对传统PSO算法的分析, 结合目前被广泛使用的基于GPU的并行计算技术, 设计实现了一种并行PSO方法。本方法的执行基于统一计算架构(CUDA), 使用大量的GPU线程并行处理各个粒子的搜索过程来加速整个粒子群的收敛速度。程序充分使用CUDA自带的各种数学计算库, 从而保证了程序的稳定性和易写性。通过对多个基准优化测试函数的求解证明, 相对于基于CPU的串行计算方法, 在求解收敛性一致的前提下, 基于CUDA架构的并行PSO求解方法可以取得高达90倍的计算加速比。  相似文献   

9.
张硕  何发智  周毅  鄢小虎 《计算机应用》2016,36(12):3274-3279
基于统一计算设备架构(CUDA)对图形处理器(GPU)下的并行粒子群优化(PSO)算法作改进研究。根据CUDA的硬件体系结构特点,可知Block是串行执行的,线程束(Warp)才是流多处理器(SM)调度和执行的基本单位。为了充分利用Block中线程的并行性,提出基于自适应线程束的GPU并行PSO算法:将粒子的维度和线程相对应;利用GPU的Warp级并行,根据维度的不同自适应地将每个粒子与一个或多个Warp相对应;自适应地将一个或多个粒子与每个Block相对应。与已有的粗粒度并行方法(将每个粒子和线程相对应)以及细粒度并行方法(将每个粒子和Block相对应)进行了对比分析,实验结果表明,所提出的并行方法相对前两种并行方法,CPU加速比最多提高了40。  相似文献   

10.
GPU拥有高度并行性和可编码的特点,在大规模数据并行计算方面得到广泛应用。NTRU算法是一种安全性高,易于并行化的公钥密码算法。研究了NTRU算法基于CUDA的并行化实现技术,将计算中最耗时的卷积运算分解到多个线程并行计算,引入大量的独立并发的加解密线程块来完成整个加解密过程,并给出了具体的数据编码及存储结构、线程组织以及基于合并访问和共享内存的性能优化技术。实验结果表明,基于CUDA的NTRU加解密算法实现了硬件加速,相对于NTRU算法在CPU的实现,CUDA实现能够达到12.38 MB/s的吞吐量,可获得最大为95倍的加速比。  相似文献   

11.
This paper proposes a novel approach to structuring behavioral knowledge based on symbolization of human whole body motions, hierarchical classification of the motions, and extraction of the causality among the motions. The motion patterns are encoded into parameters of corresponding Hidden Markov Models (HMMs), where each HMM abstracts the dynamics of motion pattern, and hereafter is referred to as “motion symbol”. The motion symbols allow motion recognition and synthesis. The motion symbols are organized into a hierarchical tree structure representing the property of spatial similarity among the motion patterns, and this tree is referred to as “motion symbol tree”. Seamless motion is segmented into a sequence of motion primitives, each of which is classified as a motion symbol based on the motion symbol tree. The seamless motion results in a sequence of the motion symbols, which is stochastically represented as transitions between the motion symbols by an N-gram model. The motion symbol N-gram model is referred to as “motion symbol graph”. The motion symbol graph extracts the temporal causality among the human behaviors. The integration of the motion symbol tree and the motion symbol graph makes it possible to recognize motion patterns fast and predict human behavior during observation. The experiments on a motion dataset of radio calisthenics and on a large motion dataset provided by CMU motion database validate the proposed framework.  相似文献   

12.
Multidimensional binary search tree (abbreviated k-d tree) is a popular data structure for the organization and manipulation of spatial data. The data structure is useful in several applications including graph partitioning, hierarchical applications such as molecular dynamics and n-body simulations, and databases. In this paper, we study efficient parallel construction of k-d trees on coarse-grained distributed memory parallel computers. We consider several algorithms for parallel k-d tree construction and analyze them theoretically and experimentally, with a view towards identifying the algorithms that are practically efficient. We have carried out detailed implementations of all the algorithms discussed on the CM-5 and report on experimental results  相似文献   

13.
A computational strategy is presented for calculating sensitivity coefficients for the non-linear large-deflection and postbuckling responses of laminated composite structures on distributed-memory parallel computers. The strategy is applicable to any message-passing distributed computational environment. The key elements of the proposed strategy are: (a) a multiple-parameter reduced basis technique; (b) a parallel sparse equation solver based on a nested dissection (or multilevel substructuring) node ordering scheme; and (c) a multilevel parallel procedure for evaluating hierarchical sensitivity coefficients. The hierarchical sensitivity coefficients measure the sensitivity of the composite structure response to variations in three sets of interrelated parameters; namely, laminate, layer and micromechanical (fiber, matrix and interface/interphase) parameters. The effectiveness of the strategy is assessed by performing hierarchical sensitivity analysis for the large-deflection and postbuckling responses of stiffened composite panels with cutouts on three distributed-memory computers. The panels are subjected to combined mechanical and thermal loads. The numerical studies presented demonstrate the advantages of the reduced basis technique for hierarchical sensitivity analysis on distributed-memory machines.  相似文献   

14.
Database of human motion has been widely used for recognizing human motion and synthesizing humanoid motions. In this paper, we propose a data structure for storing and extracting human motion data and demonstrate that the database can be applied to the recognition and motion synthesis problems in robotics. We develop an efficient method for building a human motion database from a collection of continuous, multi-dimensional motion clips. The database consists of a binary tree representing the hierarchical clustering of the states observed in the motion clips, as well as node transition graphs representing the possible transitions among the nodes in the binary tree. Using databases constructed from real human motion data, we demonstrate that the proposed data structure can be used for human motion recognition, state estimation and prediction, and robot motion planning.  相似文献   

15.
Expert knowledge is the key to modeling milling fault detection systems based on the belief rule base. The construction of an initial expert knowledge base seriously affects the accuracy and interpretability of the milling fault detection model. However, due to the complexity of the milling system structure and the uncertainty of the milling failure index, it is often impossible to construct model expert knowledge effectively. Therefore, a milling system fault detection method based on fault tree analysis and hierarchical BRB (FTBRB) is proposed. Firstly, the proposed method uses a fault tree and hierarchical BRB modeling. Through fault tree analysis (FTA), the logical correspondence between FTA and BRB is sorted out. This can effectively embed the FTA mechanism into the BRB expert knowledge base. The hierarchical BRB model is used to solve the problem of excessive indexes and avoid combinatorial explosion. Secondly, evidence reasoning (ER) is used to ensure the transparency of the model reasoning process. Thirdly, the projection covariance matrix adaptation evolutionary strategies (P-CMA-ES) is used to optimize the model. Finally, this paper verifies the validity model and the method's feasibility techniques for milling data sets.  相似文献   

16.
We present a new motion‐compensated hierarchical compression scheme (HMLFC) for encoding light field images (LFI) that is suitable for interactive rendering. Our method combines two different approaches, motion compensation schemes and hierarchical compression methods, to exploit redundancies in LFI. The motion compensation schemes capture the redundancies in local regions of the LFI efficiently (local coherence) and the hierarchical schemes capture the redundancies present across the entire LFI (global coherence). Our hybrid approach combines the two schemes effectively capturing both local as well as global coherence to improve the overall compression rate. We compute a tree from LFI using a hierarchical scheme and use phase shifted motion compensation techniques at each level of the hierarchy. Our representation provides random access to the pixel values of the light field, which makes it suitable for interactive rendering applications using a small run‐time memory footprint. Our approach is GPU friendly and allows parallel decoding of LF pixel values. We highlight the performance on the two‐plane parameterized light fields and obtain a compression ratio of 30–800× with a PSNR of 40–45 dB. Overall, we observe a ~2–5× improvement in compression rates using HMLFC over prior light field compression schemes that provide random access capability. In practice, our algorithm can render new views of resolution 512 × 512 on an NVIDIA GTX‐980 at ~200 fps.  相似文献   

17.
公理化设计与DFA集成的产品信息模型   总被引:9,自引:2,他引:7  
在公理化设计理论的基础上,讨论了产品设计过程中功能要求的分解以及从功能域向结构域的曲折映射过程;建立了产品功能一结构层次模型和面向装配设计的产品信息模型.该产品信息模型不仅能够描述产品的层次结构信息,而且能够描述零部件之间的装配关系信息,能够实现CAD/CAE/CAPP/CAM的信息集成,为产品并行开发过程的信息集成打下良好的基础.以发动机减速器为例,对产品的功能树、结构树以及装配模型和装配关联矩阵的形成进行了说明。  相似文献   

18.
风场作用下的动态森林场景的实时仿真   总被引:4,自引:1,他引:3  
为真实地模拟树在风中摇曳的动态效果,将风场视为一个随机过程(场),利用谱分析的方法生成给定条件下的风速矢量场;进而采用简化的动力学方程,实时计算风力作用下树的位移和形变.采用一种混合式几何和图像的表示方法对树木进行参数化建模,并在模型中内建多个细节层次,以便加速场景视点相关的绘制.综合以上算法的交互式漫游系统能实时地模拟包含数千棵树的森林场景在风场作用下的动态效果,初步满足了虚拟现实、战场仿真与数字娱乐等应用的要求.  相似文献   

19.
This paper presents an efficient parallel implementation of matrix multiplication on three parallel architectures, namely a linear array, a binary tree, and a mesh-of-trees.  相似文献   

20.
Visual analytics of multidimensional multivariate data is a challenging task because of the difficulty in understanding metrics in attribute spaces with more than three dimensions. Frequently, the analysis goal is not to look into individual records but to understand the distribution of the records at large and to find clusters of records with similar attribute values. A large number of (typically hierarchical) clustering algorithms have been developed to group individual records to clusters of statistical significance. However, only few visualization techniques exist for further exploring and understanding the clustering results. We propose visualization and interaction methods for analyzing individual clusters as well as cluster distribution within and across levels in the cluster hierarchy. We also provide a clustering method that operates on density rather than individual records. To not restrict our search for clusters, we compute density in the given multidimensional multivariate space. Clusters are formed by areas of high density. We present an approach that automatically computes a hierarchical tree of high density clusters. To visually represent the cluster hierarchy, we present a 2D radial layout that supports an intuitive understanding of the distribution structure of the multidimensional multivariate data set. Individual clusters can be explored interactively using parallel coordinates when being selected in the cluster tree. Furthermore, we integrate circular parallel coordinates into the radial hierarchical cluster tree layout, which allows for the analysis of the overall cluster distribution. This visual representation supports the comprehension of the relations between clusters and the original attributes. The combination of the 2D radial layout and the circular parallel coordinates is used to overcome the overplotting problem of parallel coordinates when looking into data sets with many records. We apply an automatic coloring scheme based on the 2D radial layout of the hierarchical cluster tree encoding hue, saturation, and value of the HSV color space. The colors support linking the 2D radial layout to other views such as the standard parallel coordinates or, in case data is obtained from multidimensional spatial data, the distribution in object space.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号