首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 771 毫秒
1.
近年来,基于图形处理器的通用计算获得了广泛关注,并在多个领域取得了进展.内存OLAP减少了磁盘I/O,但基于单核或多核CPU的计算能力及cache miss成为新的性能瓶颈,从而无法保证好的效率.而图形处理器由于其众多核和高带宽能够很好地适应OLAP计算特性.通过图形处理器来加速任一cuboid的计算,从而提高整个内存OLAP系统的性能.提出了基于图形处理器的分块并行算法,并对算法进行了优化及讨论了数据稀疏和数据分布倾斜等不同条件下的算法.算法通过扩展可以突破内存限制,组成磁盘、内存、显存三级流水线,适应海量数据计算;同时算法也可以作为计算整个cube的基础.通过实验比较,基于图形处理器的算法明显优于四核CPU算法.  相似文献   

2.
图形处理器在数据管理领域的应用研究综述   总被引:1,自引:0,他引:1       下载免费PDF全文
比较了中央处理器和图形处理器体系结构的异同,并简要介绍了最新的图形处理器通用计算平台及不同体系结构间并行算法的异同。详细叙述了图形处理器在空间数据库、关系数据库、数据流和数据挖掘及信息检索等方面应用的技术特点;探讨了基于图形处理器的各种内外存排序算法及性能;描述了基于图形处理器的各种数据结构和索引技术;阐述了图形处理器算法优化方面的工作。最后,展望了图形处理器应用于数据管理的发展前景,并分析了这一领域未来所面临的挑战。  相似文献   

3.
提出了三种新的GPU并行的自适应邻域模拟退火算法,分别是GPU并行的遗传-模拟退火算法,多条马尔可夫链并行的退火算法,基于BLOCK分块的GPU并行模拟退火算法,并通过对GPU端的程序采取合并内存访问,避免bank冲突,归约法等方式进一步提升了性能。实验中选取了11个典型的基准函数,实验结果证明这三种GPU并行退火算法比nonu-SA算法具有更好的精度和更快的收敛速度。  相似文献   

4.
In this paper, we present the analysis and development of a cross-platform OpenCL implementation of the box-counting algorithm, which is one of the most widely-used methods for estimating the Fractal Dimension. The Fractal Dimension is a relevant image analysis method used in several disciplines, but computing it is in general a time consuming process, especially when working with 3D images. Unlike parallel programming models that strictly depend on the hardware type and manufacturer, like CUDA, OpenCL allows us to provide an implementation suitable for execution on both GPUs and multi-core CPUs, whatever the hardware manufacturer. Sorting is a key part of the fast box-counting algorithm and the final speedup is highly conditioned by the efficiency of the sorting algorithm used. Our study reveals that current OpenCL implementations of sorting algorithms are clearly slower when compared with both CUDA for GPU and specific multi-core CPU implementations. Our OpenCL algorithm has been specifically optimized according the type of the target device and the results show an average speedup of up to 7.46× and 4×, when executed on the GPU and the multi-core CPU respectively, both compared with the single-threaded (sequential) CPU implementation.  相似文献   

5.
将基于Retinex理论的光照补偿算法分别与直方图均衡化、gamma灰度变换两种常见的、简单的预处理算法结合,得到两种新的光照补偿预处理算法。从实验结果看,结合算法明显改善了Retinex理论光照补偿算法对侧光纠正不均匀的现象,可大幅度提高复杂光照环境下的人脸识别率。同时,由于是两种简单预处理算法的结合,保证了预处理效率。  相似文献   

6.
一种新颖的基于颜色信息的粒子滤波器跟踪算法   总被引:3,自引:0,他引:3  
传统的基于直方图的粒子滤波器算法常常需要在准确表达颜色分布和计算效率之间做出妥协,从而影响跟踪算法的性能甚至导致跟踪算法失败.针对这一问题,文中提出一种新颖的基于颜色信息的粒子滤波器跟踪算法.该算法采用自适应剖分颜色空间的概率模型,能够用较少的子空间准确地表达目标的颜色分布.文中进一步提出一种推广的积分图像,通过在该积分图像上进行数组索引操作得到每一个子空间的像素数目、均值向量和协方差矩阵,从而能够快速地计算出颜色模型.然而在CPU上计算积分图像十分耗时,为此文中提出一种基于GPU的并行算法快速计算积分图像.该并行算法在显卡的GPU上创建3个线程网格,分别顺序执行3个Kernel函数,依次完成创建原始积分图像以及对它的行和列执行前缀求和算法的任务.同传统的基于直方图的粒子滤波器算法相比,新算法每帧平均跟踪时间显著减少,同时跟踪准确性和鲁棒性都有较大提高.  相似文献   

7.
In this paper, we propose a high-speed vision system that can be applied to real-time face tracking at 500 fps using GPU acceleration of a boosting-based face tracking algorithm. By assuming a small image displacement between frames, which is a property of high-frame rate vision, we develop an improved boosting-based face tracking algorithm for fast face tracking by enhancing the Viola–Jones face detector. In the improved algorithm, face detection can be efficiently accelerated by reducing the number of window searches for Haar-like features, and the tracked face pattern can be localized pixel-wise even when the window is sparsely scanned for a larger face pattern by introducing skin color extraction in the boosting-based face detector. The improved boosting-based face tracking algorithm is implemented on a GPU-based high-speed vision platform, and face tracking can be executed in real time at 500 fps for an 8-bit color image of 512 × 512 pixels. In order to verify the effectiveness of the developed face tracking system, we install it on a two-axis mechanical active vision system and perform several experiments for tracking face patterns.  相似文献   

8.
人物跟踪技术是目前智能监控系统的核心方法之一,针对人脸运动的非线性非高斯的特点,引入粒子滤波算法来进行运动预测估计,抵抗遮挡干扰。同时,根据人脸结构特点,提出了一种分块颜色直方图,用以描述人脸的特征。并且根据预测精度对预测过程中目标运动速度和过程噪声方差进行自适应更新。实验结果表明,在人脸的旋转,肤色和部分遮挡影响下跟踪精度较高,抵抗光照环境变化,以及人脸大小变化等的鲁棒性较强。  相似文献   

9.
特征点检测被广泛应用于目标识别、跟踪及三维重建等领域。针对三维重建算法中特征点检测算法运算量大、耗时多的特点,对高斯差分(Difference-of-Gaussian,DoG)算法进行改进,提出特征点检测DoG并行算法。基于OpenMP的多核CPU、CUDA及OpenCL架构的GPU并行环境,设计实现DoG特征点检测并行算法。对hallFeng图像集在不同实验平台进行对比实验,实验结果表明,基于OpenMP的多核CPU的并行算法表现出良好的多核可扩展性,基于CUDA及OpenCL架构的GPU并行算法可获得较高加速比,最高加速比可达96.79,具有显著的加速效果,且具有良好的数据和平台可扩展性。  相似文献   

10.
针对目前并行Prim最小生成树算法效率不高的问题,在分析现有并行Prim算法的基础上,提出了适于GPU架构的压缩邻接表图表示形式,开发了基于GPU的minreduction数据并行原语,在NVIDIA GPU上设计并实现了基于Prim算法思想的并行最小生成树算法。该算法通过使用原语缩短关键步骤的查找时间,从而获得较高效率。实验表明,相对于传统CPU实现算法和不使用原语的算法,该算法具有较明显的性能优势。  相似文献   

11.
Feature tracking and matching in video using programmable graphics hardware   总被引:2,自引:0,他引:2  
This paper describes novel implementations of the KLT feature tracking and SIFT feature extraction algorithms that run on the graphics processing unit (GPU) and is suitable for video analysis in real-time vision systems. While significant acceleration over standard CPU implementations is obtained by exploiting parallelism provided by modern programmable graphics hardware, the CPU is freed up to run other computations in parallel. Our GPU-based KLT implementation tracks about a thousand features in real-time at 30 Hz on 1,024 × 768 resolution video which is a 20 times improvement over the CPU. The GPU-based SIFT implementation extracts about 800 features from 640 × 480 video at 10 Hz which is approximately 10 times faster than an optimized CPU implementation.  相似文献   

12.
Graphics processing units (GPUs) have an SIMD architecture and have been widely used recently as powerful general-purpose co-processors for the CPU. In this paper, we investigate efficient GPU-based data cubing because the most frequent operation in data cube computation is aggregation, which is an expensive operation well suited for SIMD parallel processors. H-tree is a hyper-linked tree structure used in both top-k H-cubing and the stream cube. Fast H-tree construction, update and real-time query response are crucial in many OLAP applications. We design highly efficient GPU-based parallel algorithms for these H-tree based data cube operations. This has been made possible by taking effective methods, such as parallel primitives for segmented data and efficient memory access patterns, to achieve load balance on the GPU while hiding memory access latency. As a result, our GPU algorithms can often achieve more than an order of magnitude speedup when compared with their sequential counterparts on a single CPU. To the best of our knowledge, this is the first attempt to develop parallel data cubing algorithms on graphics processors.  相似文献   

13.
14.
以颜色直方图为特征的运动目标跟踪算法容易受到光线变化及视场内其它同色目标的干扰.采用运动目标的边缘方向直方图作为特征,利用序列重要性采样原理和粒子滤波算法实现了对人体运动目标的跟踪.实验显示了该算法在光线变化及存在同色目标干扰时能够有效跟踪目标.在算法实现过程中,采用积分图计算边缘方向直方图,减少了计算时间,提高了计算速度,达到了实时跟踪的效果.  相似文献   

15.
This work presents an effective approach to visual tracking using a graphics processing unit (GPU) for computation purposes. In order to get a performance improvement against other platforms it is convenient to select proper algorithms such as population-based ones. They expose a parallel-friendly nature needing from many independent evaluations that map well to the parallel architecture of the GPU. To this end we propose a particle filter (PF) hybridized with a memetic algorithm (MA) to produce a MAPF tracking algorithm for single and multiple object tracking problems. Previous experimental results demonstrated that the MAPF algorithm showed more accurate tracking results than the standard PF, and now we extend those results with the first complete adaptation of the PF and the MAPF for visual tracking to the NVIDIA CUDA architecture. Results show a GPU speedup between 5×–16× for different configurations.  相似文献   

16.
为克服交叉相关外推算法时间复杂度高、运算时间过长的缺点,提出一种基于GPU的快速并行化算法,应用于地闪落点的外推预测。首先分析串行的算法流程,然后对算法进行并行化分析设计,再针对AMD系列GPU硬件架构特点,运用OpenCL技术从主存与设备内存之间的数据传输、显存访问模式等方面对算法进一步优化。最后将地闪监测实况数据与本算法外推计算结果进行比对,分析不同精度下串行与并行算法的计算效率。实验结果表明,该算法充分利用GPU强大的并行计算能力,计算速度提高了近17倍。  相似文献   

17.
多核图像处理并行设计范式的研究与应用   总被引:1,自引:0,他引:1       下载免费PDF全文
王成良  谢克家  刘昕 《计算机工程》2011,37(14):220-222
多核计算环境下采用图像处理并行算法可提高图像处理的速度,但已有的并行设计只针对边缘检测、图像投影等特定算法进行,没有形成通用的并行算法设计范式。为此,在研究图像处理算法可并行处理机制和多核架构特点的基础上,提出分析、建模、映射、调试和性能评价及测试发布等5个设计步骤的基于多核计算环境的图像处理算法并行设计范式,以图像傅里叶变换并行算法设计为例在单核、双核、四核、八核计算环境下验证了该并行范式的有效性。实验结果表明,该范式在图像处理并行设计方面可扩展图像处理的应用空间。  相似文献   

18.
以颜色和形状直方图为线索的粒子滤波人脸跟踪   总被引:2,自引:0,他引:2       下载免费PDF全文
跟踪器的设计和跟踪线索的选择与表达是人脸跟踪中的两大关键因素,针对一般人脸跟踪算法中常用简单椭圆来描述人脸形状线索时易受背景干扰的缺点,以及视频目标跟踪中动态模型和观测模型的非线性非高斯特点,提出了一种以颜色和形状直方图为线索的粒子滤波人脸跟踪算法,该算法在粒子滤波基本框架之下,引入了一种新的用直方图来描述人脸形状的方法,并对其进行了改进,用来作为人脸跟踪的形状线索。同时,为了减轻背景干扰,提出了一种经验有效边缘的检测方法。实验表明,该跟踪方法不仅能有效地处理人脸旋转、背景中的肤色干扰和部分遮掩问题,并且能够在由于大面积遮掩等原因而丢失目标的情况下,及时有效地重新捕获已丢失的目标。  相似文献   

19.
Speeding up the evaluation phase of GP classification algorithms on GPUs   总被引:2,自引:1,他引:1  
The efficiency of evolutionary algorithms has become a studied problem since it is one of the major weaknesses in these algorithms. Specifically, when these algorithms are employed for the classification task, the computational time required by them grows excessively as the problem complexity increases. This paper proposes an efficient scalable and massively parallel evaluation model using the NVIDIA CUDA GPU programming model to speed up the fitness calculation phase and greatly reduce the computational time. Experimental results show that our model significantly reduces the computational time compared to the sequential approach, reaching a speedup of up to 820×. Moreover, the model is able to scale to multiple GPU devices and can be easily extended to any evolutionary algorithm.  相似文献   

20.
This paper presents a new algorithm for force directed graph layout on the GPU. The algorithm, whose goal is to compute layouts accurately and quickly, has two contributions. The first contribution is proposing a general multi-level scheme, which is based on spectral partitioning. The second contribution is computing the layout on the GPU. Since the GPU requires a data parallel programming model, the challenge is devising a mapping of a naturally unstructured graph into a well-partitioned structured one. This is done by computing a balanced partitioning of a general graph. This algorithm provides a general multi-level scheme, which has the potential to be used not only for computation on the GPU, but also on emerging multi-core architectures. The algorithm manages to compute high quality layouts of large graphs in a fraction of the time required by existing algorithms of similar quality. An application for visualization of the topologies of ISP (Internet Service Provider) networks is presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号