期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Real-time virtual environment signal extraction and denoising using programmable graphics hardware

Yang Su Zhi-Jie Xu Xiang-Qian Jiang 《国际自动化与计算杂志》2009,6(4):326-334

The sense of being within a three-dimensional (3D) space and interacting with virtual 3D objects in a computer-generated virtual environment (VE) often requires essential image, vision and sensor signal processing techniques such as differentiating and denoising. This paper describes novel implementations of the Gaussian filtering for characteristic signal extraction and wavelet-based image denoising algorithms that run on the graphics processing unit (GPU). While significant acceleration over standard CPU implementations is obtained through exploiting data parallelism provided by the modern programmable graphics hardware, the CPU can be freed up to run other computations more efficiently such as artificial intelligence (AI) and physics. The proposed GPU-based Gaussian filtering can extract surface information from a real object and provide its material features for rendering and illumination. The wavelet-based signal denoising for large size digital images realized in this project provided better realism for VE visualization without sacrificing real-time and interactive performances of an application. 相似文献

2.

Real-time object detection on CUDA

Adam Herout Radovan Jo?th Roman Jur��nek Ji?�� Havel Michal Hradi? Pavel Zem?��k 《Journal of Real-Time Image Processing》2011,6(3):159-170

The aim of the research described in this article is to accelerate object detection in images and video sequences using graphics processors. It includes algorithmic modifications and adjustments of existing detectors, constructing variants of efficient implementations and evaluation comparing with efficient implementations on the CPUs. This article focuses on detection by statistical classifiers based on boosting. The implementation and the necessary algorithmic alterations are described, followed by experimental measurements of the created object detector and discussion of the results. The final solution outperforms the reference efficient CPU/SSE implementation, by approximately 6–8× for high-resolution videos using nVidia GeForce 9800GTX and Intel Core2 Duo E8200. 相似文献

3.

Fully pipelined FPGA-based architecture for real-time SIFT extraction

《Microprocessors and Microsystems》2016

相似文献

4.

FPGA accelerator for real-time SIFT matching with RANSAC support

《Microprocessors and Microsystems》2017

相似文献

5.

基于GPU的视频流拼接算法研究

张燕赵新灿谭同德《计算机工程与设计》2012,33(4):1472-1476

为解决视频流的稳定实时拼接,结合图形处理器GPU强大的并行计算能力,提出了一种基于GPU的视频流拼接算法.提取视频流的帧图像,利用尺度不变特征变换(scale invariant feature transform,SIFT)算法在GPU上实现帧图像的特征提取与匹配,实现图像拼接,进而实现视频流的稳定实时拼接.基于GPU的SIFT算法充分利用了GPU的并行处理能力,加快了视频流拼接算法执行的速度,真正意义上实现了几个差异较大但具有公共视野的视频流快速稳定的拼接. 相似文献

6.

Deployment of parallel linear genetic programming using GPUs on PC and video game console platforms

Garnett Wilson Wolfgang Banzhaf 《Genetic Programming and Evolvable Machines》2010,11(2):147-184

相似文献

7.

A GPU implementation for LBG and SOM training 总被引：1，自引：1，他引：0

Yi Xiao Chi Sing Leung Tze-Yui Ho Ping-Man Lam 《Neural computing & applications》2011,20(7):1035-1042

Vector quantization (VQ) is an effective technique applicable in a wide range of areas, such as image compression and pattern recognition. The most time-consuming procedure of VQ is codebook training, and two of the frequently used training algorithms are LBG and self-organizing map (SOM). Nowadays, desktop computers are usually equipped with programmable graphics processing units (GPUs), whose parallel data-processing ability is ideal for codebook training acceleration. Although there are some GPU algorithms for LBG training, their implementations suffer from a large amount of data transfer between CPU and GPU and a large number of rendering passes within a training iteration. This paper presents a novel GPU-based training implementation for LBG and SOM training. More specifically, we utilize the random write ability of vertex shader to reduce the overheads mentioned above. Our experimental results show that our approach can run four times faster than the previous approach. 相似文献

8.

Accelerating IP routing algorithm using graphics processing unit for high speed multimedia communication

Jia Uddin In-Kyu Jeong Myeongsu Kang Cheol-Hong Kim Jong-Myon Kim 《Multimedia Tools and Applications》2016,75(23):15365-15379

This paper presents a Graphics Processing Unit (GPU)-based implementation of a Bellman-Ford (BF) routing algorithm using NVIDIA’s Compute Unified Device Architecture (CUDA). In the proposed GPU-based approach, multiple threads run concurrently over numerous streaming processors in the GPU to dynamically update routing information. Instead of computing the individual vertex distances one-by-one, a number of threads concurrently update a larger number of vertex distances, and an individual vertex distance is represented in a single thread. This paper compares the performance of the GPU-based approach to an equivalent CPU implementation while varying the number of vertices. Experimental results show that the proposed GPU-based approach outperforms the equivalent sequential CPU implementation in terms of execution time by exploiting the massive parallelism inherent in the BF routing algorithm. In addition, the reduction in energy consumption (about 99 %) achieved by using the GPU is reflective of the overall merits of deploying GPUs across the entire landscape of IP routing for emerging multimedia communications. 相似文献

9.

基于CUDA的尺度不变特征变换快速算法 总被引：2，自引：2，他引：0

下载免费PDF全文

田文徐帆王宏远周波《计算机工程》2010,36(8):219-221

针对尺度不变特征变换(SIFT)算法耗时多限制其应用范围的缺点,提出一种基于统一计算设备架构(CUDA)的尺度不变特征变换快速算法,分析其并行特性,在图像处理单元(GPU)的线程和内存模型方面对算法进行优化。实验证明,相对于CPU,算法速度提升了30~50倍,对640×480图像的处理速度达到每秒24帧,满足实时应用的需求。相似文献

10.

Multi-scale neural texture classification using the GPU as a stream processing engine

M. Mart��nez-Zarzuela F. J. D��az-Pernas M. Ant��n-Rodr��guez J. F. D��ez-Higuera D. Gonz��lez-Ortega D. Boto-Giralda F. L��pez-Gonz��lez I. De La Torre 《Machine Vision and Applications》2011,22(6):947-966

A neural architecture for texture classification running on the Graphics Processing Unit (GPU) under a stream processing model is presented in this paper. Textural features extraction is done in three different scales, it is based on the computations that take place on the mammalian primary visual pathway and incorporates both structural and color information. Feature vectors classification is done using a fuzzy neural network which introduces pattern analysis for orientation invariant texture recognition. Performance tests are done over a varying number of textures and the entire VisTex database. The intrinsic parallelism of the neural system led us to implement the whole architecture to run on GPUs, providing a speed-up between × 16 and × 25 for classifying textures of sizes 128 × 128 and 512 × 512 px with respect to an implementation on the CPU. A comparison of classification rates obtained with other methods is included and shows the great performance of the architecture. An average classification rate of 85.2% is obtained for 167 textures of size 512 × 512 px. 相似文献

11.

基于GPU_CPU异构并行加速的人头检测方法

彭景维童基均《计算机系统应用》2017,26(11):95-100

多尺度协同的人头检测系统中,梯度方向直方图应用于高清视频监控领域时常因特征提取时的海量计算而不能满足监控视频的实时性要求,提出一种基于GPU_CPU异构并行加速的人头检测方法,GPU端负责HOG特征提取的庞大的密集型的区块的并行计算,CPU端负责检测过程中的其它模块的执行.传统的并行归约算法因其在HOG特征提取过程中的时间复杂度不够理想,提出改进的并行归约算法,通过“下扫”的并行计算方式,减少节点被计算的次数,降低了HOG特征提取时的时间复杂度.实验表明,提出的方法检测速率优于传统的CPU的检测方法,其效率提升约10倍. 相似文献

12.

Fermi架构下超声成像组织运动可视化并行算法

何兴无《计算机系统应用》2013,22(4):147-152

在临床超声实时成像系统中组织运动情况是医生想要获取的重要诊断信息, 例如心脏运动. 基于线积分卷积的二维矢量场可视化技术可以同时展现运动矢量场的强度和方向. 但这一算法在处理时涉及大量的复杂计算, 尤其是流线追踪处理部分, 使其成为临床实时成像系统中的一大性能提升瓶颈. 为此研究并提出了一种基于新兴的高性能并行计算平台Fermi架构GPU(graphics processing unit图形处理单元)的并行运动可视化算法. 数据测试结果显示, 与基于CPU的实现相比, 采用Fermi架构的GPU处理不仅可相似文献

13.

A real-time embedded architecture for SIFT

《Journal of Systems Architecture》2013,59(1):16-29

相似文献

14.

Three‐dimensional thinning algorithms on graphics processing units and multicore CPUs

J. Jimnez J. Ruiz de Miras 《Concurrency and Computation》2012,24(14):1551-1571

Three‐dimensional curve skeletons are a very compact representation of three‐dimensional objects with many uses and applications in ﬁelds such as computer graphics, computer vision, and medical imaging. An important problem is that the calculation of the skeleton is a very time‐consuming process. Thinning is a widely used technique for calculating the curve skeleton because of the properties it ensures and the ease of implementation. In this paper, we present parallel versions of a thinning algorithm for eﬃcient implementation in both graphics processing units and multicore CPUs. The parallel programming models used in our implementations are Compute Uniﬁed Device Architecture (CUDA) and Open Computing Language (OpenCL). The speedup achieved with the optimized parallel algorithms for the graphics processing unit achieves 106.24x against the CPU single‐process version and more than 19x over the CPU multithreaded version. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献

15.

GPU加速的基于增量式聚类的视频拷贝检测方法 总被引：1，自引：1，他引：0

任化敏张勇东林守勋《计算机辅助设计与图形学学报》2010,22(3)

为有效地保护版权,提高大规模视频集的拷贝检测速度,提出一种完全实现在GPU上的基于增量式聚类的拷贝检测方法.对数据库中新增加的视频,首先调用GPU上的硬件解码单元对视频流解码,以实时的速度提取高维SIFT特征点;然后对特征点进行增量K-means聚类,以动态地反映数据库的变化,并根据聚类结果更新视觉关键词词典;再将每帧表示成归一化的词频向量;最后使用基于帧级别词频向量的时空顺序匹配法来判定查询视频是否为数据库中视频的拷贝.实验结果表明,该方法比原有的CPU实现方法整体提速最高达63倍. 相似文献

16.

Multiscale and local search methods for real time region tracking with particle filters: local search driven by adaptive scale estimation on GPUs 总被引：1，自引：0，他引：1

Raúl Cabido Antonio S. Montemayor Juan José Pantrigo Bryson R. Payne 《Machine Vision and Applications》2009,21(1):43-58

Tracking systems are important in computervision, with applications in surveillance, human computer interaction, etc. Consumer graphics processing units (GPUs) have experienced an extraordinary evolution in both computing performance and programmability, leading to greater use of the GPU for non-rendering applications. In this work we propose a real-time object tracking algorithm, based on the hybridization of particle filtering (PF) and a multi-scale local search (MSLS) algorithm, presented for both CPU and GPU architectures. The developed system provides successful results in precise tracking of single and multiple targets in monocular video, operating in real-time at 70 frames per second for 640 × 480 video resolutions on the GPU, up to 1,100% faster than the CPU version of the algorithm. 相似文献

17.

GPU-based anisotropic diffusion algorithm for video image denoising

《Microprocessors and Microsystems》2017

Image filtering is the process of removing noise which perturbs image analysis methods. In some applications like segmentation, denoising is intended to smooth homogeneous areas while preserving the contours. Real-time denoising is required in a lot of applications like image-guided surgical interventions, video analysis and visual serving. This paper presents an anisotropic diffusion method named the Oriented Speckle Reducing Anisotropic Diffusion (OSRAD) filter. The OSRAD works very well for denoising images with speckle noise. However, this filter has a powerful computational complexity and is not suitable for real time implementation. The purpose of this study is to decrease the processing time implementation of the OSRAD filter using a parallel processor through the optimization of the graphics processor unit. The results show that the suggested method is very effective for real-time video processing. This implementation yields a denoising video rate of 25 frames per second for 128 × 128 pixels. The proposed model magnifies the acceleration of the image filtering to 30 × compared to the standard implementation of central processing units (CPU). A quantitative comparison measure is given by parameters like the mean structural similarity index, the peak signal-to-noise ratio and the figure of merit. The modified filter is faster than the conventional OSRAD and keeps a high image quality compared to the bilateral filter and the wavelet transformation. 相似文献

18.

<Emphasis Type="Italic">Video Extruder</Emphasis>: a semi-dense point tracker for extracting beams of trajectories in real time

Matthieu?Garrigues Antoine?Manzanera Email author Thierry?M.?Bernard 《Journal of Real-Time Image Processing》2016,11(4):785-798

Two crucial aspects of general-purpose embedded visual point tracking are addressed in this paper. First, the algorithm should reliably track as many points as possible. Second, the computation should achieve real-time video processing, which is challenging on low power embedded platforms. We propose a new multi-scale semi-dense point tracker called Video Extruder, whose purpose is to fill the gap between short-term, dense motion estimation (optical flow) and long-term, sparse salient point tracking. This paper presents a new detector, including a new salience function with low computational complexity and a new selection strategy that allows to obtain a large number of keypoints. Its density and reliability in mobile video scenarios are compared with those of the FAST detector. Then, a multi-scale matching strategy is presented, based on hybrid regional coarse-to-fine and temporal prediction, which provides robustness to large camera and object accelerations. Filtering and merging strategies are then used to eliminate most of the wrong or useless trajectories. Thanks to its high degree of parallelism, the proposed algorithm extracts beams of trajectories from the video very efficiently. We compare it with the state-of-the-art pyramidal Lucas–Kanade point tracker and show that, in short range mobile video scenarios, it yields similar quality results, while being up to one order of magnitude faster. Three different parallel implementations of this tracker are presented, on multi-core CPU, GPU and ARM SoCs. On a commodity 2010 CPU, it can track 8,500 points in a 640 × 480 video at 150 Hz. 相似文献

19.

大数乘法的GPU加速实现

唐天泽孙玲黄新明谢星韩赛飞《计算机应用研究》2018,35(10)

大数乘法是公钥加密中最为核心的计算环节之一,快速实现大数乘法单元也是RSA、ElGamal、全同态等密码体制急需解决的问题之一。目前,基于C 的NTL GMP库函数虽然能在CPU上实现高精度的大数乘法,但其仍不能满足加密对实时性的要求。针对全同态加密应用需求,本文提出了一种基于Sch?nhage-Strassen算法的大数乘法GPU加速方法。通过比较相同实验平台下仅用CPU和GPU CPU异构方法实现的大数乘法运算,验证了本文设计方法的正确性和有效性。实验结果表明,采用本文方法实现的相同大数乘法运算所需的时间比在多核CPU平台实现所需的时间有12倍以上的加速。相似文献

20.

Accelerating bioinspired lateral interaction in accumulative computation for real-time moving object detection with graphics processing units

Sánchez José L. López María T. Pastor José Manuel Delgado Ana E. Fernández-Caballero Antonio 《Natural computing》2019,18(2):217-227

Biologically-inspired computer vision is a research area that offers prominent directions in a large variety of fields. Several processing algorithms inspired in natural vision enable detecting moving objects from video sequences so far. One example is lateral interaction in accumulative computation (LIAC), a classical bioinspired method that has been applied to numerous environments and applications. LIAC is the implementation for computer vision of two biologically-inspired methods denominated algorithmic lateral interaction and accumulative computation. The method has traditionally reached high precision but unfortunately requires high computing times. This paper introduces a proposal based on graphics processing units in order to speed up the original sequential code. This way not only excellent performance in terms of accuracy is maintained, but also real-time is obtained. A speed-up of 67× from the parallel over its sequential counterpart is achieved for several tested video sequences.

相似文献