首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Metric-space similarity search has proven suitable in a number of application domains such as multimedia retrieval and computational biology to name a few. These applications usually work on very large databases that are often indexed to speed-up on-line searches. To achieve efficient throughput, it is essential to exploit the intrinsic parallelism in the respective search query processing algorithms. Many strategies have been proposed in the literature to parallelize these algorithms either on shared or distributed memory multiprocessor systems. Lately, GPUs have been used to implement brute-force parallel search strategies instead of using index data structures. Indexing poses difficulties when it comes to achieve efficient exploitation of GPU resources. In this paper we propose single and multi GPU metric space techniques that efficiently exploit GPU tailored index data structures for parallel similarity search in large databases. The experimental results show that our proposal outperforms previous index-based sequential and OpenMP search strategies.  相似文献   

2.
This paper presents an effective scheme for clustering a huge data set using a PC cluster system, in which each PC is equipped with a commodity programmable graphics processing unit (GPU). The proposed scheme is devised to achieve three-level hierarchical parallel processing of massive data clustering. The divide-and-conquer approach to parallel data clustering is employed to perform the coarse-grain parallel processing by multiple PCs with a message passing mechanism. By taking advantage of the GPU’s parallel processing capability, moreover, the proposed scheme can exploit two types of the fine-grain data parallelism at the different levels in the nearest neighbor search, which is the most computationally-intensive part of the data-clustering process. The performance of our scheme is discussed in comparison with that of the implementation entirely running on CPU. Experimental results clearly show that the proposed hierarchial parallel processing can remarkably accelerate the data clustering task. Especially, GPU co-processing is quite effective to improve the computational efficiency of parallel data clustering on a PC cluster. Although data-transfer from GPU to CPU is generally costly, acceleration by GPU co-processing is significant to save the total execution time of data-clustering.  相似文献   

3.
Similarity search and content-based retrieval have become widely used in multimedia database systems that often manage huge data collections. Unfortunately, many effective content-based similarity models cannot be fully utilized for larger datasets, as they are computationally demanding and require massive parallel processing for both feature extraction and query evaluation tasks. In this work, we address the performance issues of effective similarity models based on feature signatures, where we focus on fast feature extraction from image thumbnails using affordable hardware. More specifically, we propose a multi-GPU implementation that increases the extraction speed by two orders of magnitude with respect to a single-threaded CPU implementation. Since the extraction algorithm is not directly parallelizable, we propose a modification of the algorithm embracing the SIMT execution model. We have experimentally verified that our GPU extractor can be successfully used to index large image datasets comprising millions of images. In order to obtain optimal extraction parameters, we employed the GPU extractor in an extensive empirical investigation of the parameter space. The experimental results are discussed from the perspectives of both performance and similarity precision.  相似文献   

4.
基于GPU的快速Level Set图像分割   总被引:5,自引:1,他引:5       下载免费PDF全文
水平集(1evel set)图像分割方法是图像分割中的一个重要方法,但是该算法的计算量大,往往不能达到实时处理的要求。给出了利用新一代的可编程图形处理器(GPU)实现level set的加速算法。首先介绍了如何在GPU上利用片元渲染程序进行网格化的线性运算和有限差分PDE计算,把level set方法的离散化算子映射到GPU上。由于以数据流处理方式的GPU的存储访问快,具有并行运算能力,同时level set算法演化的显示不再需要把数据从CPU传到GPU,因此较大地提高了算法速度与交互显示。文中实现并测试了一个与初始化状态独立的二维level set的算子用于图像分割,并对其运算结果和性能进行了比较,结果表明该方法具有更快的速度。  相似文献   

5.
Gabor wavelet transform is one of the most effective texture feature extraction techniques and has resulted in many successful practical applications. However, real-time applications cannot benefit from this technique because of the high computational cost arising from the large number of small-sized convolutions which require over 10 min to process an image of 256 × 256 pixels on a dual core CPU. As the computation in Gabor filtering is parallelizable, it is possible and beneficial to accelerate the feature extraction process using GPU. Conventionally, this can be achieved simply by accelerating the 2D convolution directly, or by expediting the CPU-efficient FFT-based 2D convolution. Indeed, the latter approach, when implemented with small-sized Gabor filters, cannot fully exploit the parallel computation power of GPU due to the architecture of graphics hardware. This paper proposes a novel approach tailored for GPU acceleration of the texture feature extraction algorithm by using separable 1D Gabor filters to approximate the non-separable Gabor filter kernels. Experimental results show that the approach improves the timing performance significantly with minimal error introduced. The method is specifically designed and optimized for computing unified device architecture and is able to achieve a speed of 16 fps on modest graphics hardware for an image of 2562 pixels and a filter kernel of 322 pixels. It is potentially applicable for real-time applications in areas such as motion tracking and medical image analysis.  相似文献   

6.
为解决视频流的稳定实时拼接,结合图形处理器GPU强大的并行计算能力,提出了一种基于GPU的视频流拼接算法.提取视频流的帧图像,利用尺度不变特征变换(scale invariant feature transform,SIFT)算法在GPU上实现帧图像的特征提取与匹配,实现图像拼接,进而实现视频流的稳定实时拼接.基于GPU的SIFT算法充分利用了GPU的并行处理能力,加快了视频流拼接算法执行的速度,真正意义上实现了几个差异较大但具有公共视野的视频流快速稳定的拼接.  相似文献   

7.
Color quantization is one of the most important preprocessing stages in many applications in computer graphics and image processing. In this article, a new algorithm for color image quantization based on the harmony search (HS) algorithm is proposed. The proposed algorithm utilizes the clustering method, which is one of the most extensively applied methods to the color quantization problem. Two variants of the algorithm are examined. The first is based on a standalone HS algorithm, and the second is a hybrid algorithm of k-means (KM) and HS. The objective of the hybrid algorithm is to strengthen the local search process and balance the quantization quality and computational complexity. In the first stage, the high-resolution color space is initially condensed to a lower-dimensional color space by multilevel thresholding. In the second stage, the compressed colors are clustered to a palette using the hybrid KMHS to obtain final quantization results. The algorithm aims to design a postclustering quantization scheme at the color-space level instead of the pixel level. This significantly reduces the computational complexity while maintaining the quantization quality. Experimental results on some of the most commonly used test images in the quantization literature demonstrate that the proposed method is a powerful method, suggesting a higher degree of precision and robustness compared to existing algorithms.  相似文献   

8.
在粒子方法中,运用邻近粒子搜索算法可以快速获取每个粒子的邻近粒子信息。由于粒子方法模拟一个体系的行为所采用的粒子数据是十分庞大的,对计算机的运算速度提出了挑战。研究了GPU的计算能力和CUDA开发环境,利用GPU的并行多线程处理技术,提出了一种并行邻近粒子搜索算法。实验结果表明,基于CUDA的并行邻近粒子搜索算法,加快了邻近粒子搜索过程,显著地减少了计算时间,成功实现了硬件加速,可获取290以上的加速比,对大规模粒子系统呈现出高效的处理能力。  相似文献   

9.
The Signature Quadratic Form Distance on feature signatures represents a flexible distance-based similarity model for effective content-based multimedia retrieval. Although metric indexing approaches are able to speed up query processing by two orders of magnitude, their applicability to large-scale multimedia databases containing billions of images is still a challenging issue. In this paper, we propose a parallel approach that balances the utilization of CPU and many-core GPUs for efficient similarity search with the Signature Quadratic Form Distance. In particular, we show how to process multiple distance computations and other parts of the search procedure in parallel, achieving maximal performance of the combined CPU/GPU system. The experimental evaluation demonstrates that our approach implemented on a common workstation with 2?GPU cards outperforms traditional parallel implementation on a high-end 48-core NUMA server in terms of efficiency almost by an order of magnitude. If we consider also the price of the high-end server that is ten times higher than that of the GPU workstation then, based on price/performance ratio, the GPU-based similarity search beats the CPU-based solution by almost two orders of magnitude. Although proposed for the SQFD, our approach of fast GPU-based similarity search is applicable for any distance function that is efficiently parallelizable in the SIMT execution model.  相似文献   

10.
李卓  徐哲  陈昕  李淑琴 《计算机应用》2018,38(8):2393-2397
现有追求高压缩质量的高光谱图像压缩算法普遍存在计算复杂度高、离线式处理、嵌入式平台实现难度大等问题,目前很难得到实际应用。为解决以上问题,设计一种基于KLT和HEVC的嵌入式高光谱图像实时压缩方法。首先基于KLT去除谱间相关性,然后基于HEVC去除空间相关性并完成量化编码的过程。基于NVIDIA Jetson TX1平台,设计并实现了CPU和GPU异构并行压缩处理系统。利用真实数据集对所设计算法和所实现平台进行了性能及可行性验证。实验结果表明:在相同压缩比下,与离散小波变换(DWT)+JPEG2000算法相比,该系统明显提升了重建精度,在峰值信噪比(PSNR)方面平均提高了1.36 dB;同时,相比CPU,在GPU中进行KLT计算也至多可缩短33%的运行时间。  相似文献   

11.
高光谱图像分类是遥感信息处理领域的热点问题,在核稀疏表示分类框架下,联合光谱信息和像元空间信息,空谱联合核稀疏表示高光谱图像分类能够取得较好的分类效果,但较高的计算复杂度及高光谱图像较大的数据量限制了其在实时性要求较高情况下的应用。基于GPU/CUDA架构,提出了一种空谱联合核稀疏表示高光谱分类的并行优化方法,设计访存优化策略对主机和设备端数据交互进行优化;充分利用GPU并行计算能力,加速分类过程中核矩阵的计算;采用依据GPU并行特性实现的矩阵运算,优化基于交替方向乘子法的分类模型求解过程。利用实际高光谱图像数据进行的实验,验证了该方法的有效性和高效性。  相似文献   

12.
Peng  Lizhi  Zhang  Haibo  Hassan  Houcine  Chen  Yuehui  Yang  Bo 《The Journal of supercomputing》2019,75(6):2930-2949

Data gravitation-based classification model, a new physic law inspired classification model, has been demonstrated to be an effective classification model for both standard and imbalanced tasks. However, due to its large scale of gravitational computation during the feature weighting process, DGC suffers from high computational complexity, especially for large data sets. In this paper, we address the problem of speeding up gravitational computation using graphics processing unit (GPU). We design a GPU parallel algorithm namely GPU–DGC to accelerate the feature weighting process of the DGC model. Our GPU–DGC model distributes the gravitational computing process to parallel GPU threads, in order to compute gravitation simultaneously. We use 25 open classification data sets to evaluate the parallel performance of our algorithm. The relationship between the speedup ratio and the number of GPU threads is discovered and discussed based on the empirical studies. The experimental results show the effectiveness of GPU–DGC, with the maximum speedup ratio of 87 to the serial DGC. Its sensitivity to the number of GPU threads is also discovered in the empirical studies.

  相似文献   

13.
This paper introduces a parallel iterated tabu search heuristic for solving four different routing problems: the classical vehicle routing problem (VRP), the periodic VRP, the multi-depot VRP, and the site-dependent VRP. In addition, it is applicable to the time-window constrained variant of these problems. Using the iterated local search framework, the heuristic combines tabu search with a simple perturbation mechanism to ensure a broad exploration of the search space. We also describe a parallel implementation of the heuristic to take advantage of multiple-core processors. Extensive computational results show that the proposed heuristic outperforms tabu search alone and is competitive with recent heuristics designed for each particular problem.  相似文献   

14.
GPU以及集成式的CPU-GPU架构凭借其强大的并行处理能力和可编程流水线方式,已经成为数据库领域的研究热点。为充分利用异构平台的并行计算能力,提升列存储系统的查询性能,在研究异构平台结构特性的基础上,首先提出了GPU多线程平台上进行连接的数据划分策略--ICMD(Improved CMD),利用GPU流处理器并行处理各个子空间上的连接,然后利用任务评估分配模型实现查询负载的动态分配,使得查询操作能在多核CPU、GPU上高效并行执行。同时利用片上全局同步机制、局部内存重用技术优化ICMD连接算法。最后采用SSB基准测试集测试,结果表明:Intel? HD Graphics 4600平台上并行连接查询相比于CPU版本获得了35%的性能提升,较GPU查询引擎的Ocelot性能上提升了18%。  相似文献   

15.

The Steered Response Power with Phase Transform (SRP-PHAT) algorithm is a well-known method for sound source localization due to its robust performance in noisy and reverberant environments. This algorithm is used in a large number of acoustic applications such as automatic camera steering systems, human–machine interaction, video gaming and audio surveillance. SPR-PHAT implementations require to handle a high number of signals coming from a microphone array and a huge search grid that influences the localization accuracy of the system. In this context, high performance in the localization process can only be achieved by using massively parallel computational resources. Different types of multi-core machines based either on multiple CPUs or on GPUs are commonly employed in diverse fields of science for accelerating a number of applications, mainly using OpenMP and CUDA as programming frameworks, respectively. This implies the development of multiple source codes which limits the portability and application possibilities. On the contrary, OpenCL has emerged as an open standard for parallel programming that is nowadays supported by a wide range of architectures. In this work, we evaluate an OpenCL-based implementations of the SRP-PHAT algorithm in two state-of-the-art CPU and GPU platforms. Results demonstrate that OpenCL achieves close-to-CUDA performance in GPU (considered as upper bound) and outperforms in most of the CPU configurations based on OpenMP.

  相似文献   

16.
Digital image halftoning is a widely used technique. However, achieving high fidelity tone reproduction and structural preservation with low computational time cost remains a challenging problem. This paper presents a highly parallel algorithm to boost real-time application of serial structure-preserving error diffusion. The contrast-aware halftoning approach is one such technique with superior structure preservation, but it offers only a limited opportunity for graphics processing unit (GPU) accel- eration. Our method integrates contrast-aware halftoning into a new parallelizable error-diffusion halftoning framework. To eliminate visually disturbing artifacts resulting from parallelization, we propose a novel multiple quantization model and space-filling curve to maintain tone consistency, blue-noise property, and structure consistency. Our GPU implementation on a commodity personal computer achieves a real-time performance for a moderately sized image. We demonstrate the high quality and performance of the proposed approach with a variety of examples, and provide comparisons with state-of-the-art methods.  相似文献   

17.
基于空间特征的图像检索   总被引:2,自引:1,他引:1  
史婷婷  李岩 《计算机应用》2008,28(9):2292-2296
提出一种新的基于空间特征的图像特征描述子SCH,利用基于颜色向量角和欧几里得距离的MCVAE算法共同检测原始彩色图像边缘,同时利用一种新的“最大最小分量颜色不变量模型”对原始图像量化,对边缘像素建立边缘相关矩阵;对非边缘像素使用颜色直方图描述局部颜色分布信息;然后,利用新的sin相似性度量法则衡量图像特征间的相似度。实验采用VC++6.0开发了基于内容的图像检索原型系统“SttImageRetrieval”,基于Oracle 9i数据库建立了一个综合型图像数据库“IMAGEDB”。实验分析结果证明,利用SCH描述子的检索准确度明显高于仅基于颜色统计特征的检索结果。  相似文献   

18.
目的 海量图像检索技术是计算机视觉领域研究热点之一,一个基本的思路是对数据库中所有图像提取特征,然后定义特征相似性度量,进行近邻检索。海量图像检索技术,关键的是设计满足存储需求和效率的近邻检索算法。为了提高图像视觉特征的近似表示精度和降低图像视觉特征的存储空间需求,提出了一种多索引加法量化方法。方法 由于线性搜索算法复杂度高,而且为了满足检索的实时性,需把图像描述符存储在内存中,不能满足大规模检索系统的需求。基于非线性检索的优越性,本文对非穷尽搜索的多索引结构和量化编码进行了探索新研究。利用多索引结构将原始数据空间划分成多个子空间,把每个子空间数据项分配到不同的倒排列表中,然后使用压缩编码的加法量化方法编码倒排列表中的残差数据项,进一步减少对原始空间的量化损失。在近邻检索时采用非穷尽搜索的策略,只在少数倒排列表中检索近邻项,可以大大减少检索时间成本,而且检索过程中不用存储原始数据,只需存储数据集中每个数据项在加法量化码书中的码字索引,大大减少内存消耗。结果 为了验证算法的有效性,在3个数据集SIFT、GIST、MNIST上进行测试,召回率相比近几年算法提升4%~15%,平均查准率提高12%左右,检索时间与最快的算法持平。结论 本文提出的多索引加法量化编码算法,有效改善了图像视觉特征的近似表示精度和存储空间需求,并提升了在大规模数据集的检索准确率和召回率。本文算法主要针对特征进行近邻检索,适用于海量图像以及其他多媒体数据的近邻检索。  相似文献   

19.
Color quantization is a process to compress image color space while minimizing visual distortion. The quantization based on preclustering has low computational complexity but cannot guarantee quantization precision. The quantization based on postclustering can produce high quality quantization results. However, it has to traverse image pixels iteratively and suffers heavy computational burden. Its computational complexity was not reduced although the revised versions have improved the precision. In the work of color quantization, balancing quantization quality and quantization complexity is always a challenging point. In this paper, a two-stage quantization framework is proposed to achieve this balance. In the first stage, high-resolution color space is initially compressed to a condensed color space by thresholding roughness indices. Instead of linear compression, we propose generic roughness measure to generate the delicate segmentation of image color. In this way, it causes less distortion to the image. In the second stage, the initially compressed colors are further clustered to a palette using Weighted Rough K-means to obtain final quantization results. Our objective is to design a postclustering quantization strategy at the color space level rather than the pixel level. Applying the quantization in the precisely compressed color space, the computational cost is greatly reduced; meanwhile, the quantization quality is maintained. The substantial experimental results validate the high efficiency of the proposed quantization method, which produces high quality color quantization while possessing low computational complexity.  相似文献   

20.
In biological research, alignment of protein sequences by computer is often needed to find similarities between them. Although results can be computed in a reasonable time for alignment of two sequences, it is still very central processing unit (CPU) time-consuming when solving massive sequences alignment problems such as protein database search. In this paper, an optimized protein database search method is presented and tested with Swiss-Prot database on graphic processing unit (GPU) devices, and further, the power of CPU multi-threaded computing is also involved to realize a GPU-based heterogeneous parallelism. In our proposed method, a hybrid alignment approach is implemented by combining Smith–Waterman local alignment algorithm with Needleman–Wunsch global alignment algorithm, and parallel database search is realized with compute unified device architecture (CUDA) parallel computing framework. In the experiment, the algorithm is tested on a lower-end and a higher-end personal computers equipped with GeForce GTX 750 Ti and GeForce GTX 1070 graphics cards, respectively. The results show that the parallel method proposed in this paper can achieve a speedup up to 138.86 times over the serial counterpart, improving efficiency and convenience of protein database search significantly.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号