期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

肖雄斌厉小润赵辽英《计算机应用与软件》2012,29(4):125-128,158

RX算法和核RX算法能很好地分离目标和背景,是较为广泛使用的异常检测算法,但是高光谱图像数据量大且存在冗余信息和噪声,直接进行RX及核RX异常探测运算量大且容易受噪声影响.针对此问题,提出一种基于最小噪声分离变换的高光谱图像异常检测方法,首先采用残差分析法估计噪声协方差矩阵以改进最小噪声分离变换,然后利用改进后的最小噪声分离变换来有效地降低高光谱图像数据的维数并分离出噪声,最后对低维降噪后的数据进行RX及核RX异常检测,避免了随机噪声对RX及核RX异常检测结果的影响并提高了异常检测率.对真实的AVIRIS数据测试表明,该算法优于传统的相应的RX、核RX异常检测算法. 相似文献

2.

Enhanced GPU-Based Anti-Noise Hybrid Edge Detection Method

Sa’ed Abed Mohammed H. Ali Mohammad Al-Shayeji 《计算机系统科学与工程》2020,35(1):21-37

Today, there is a growing demand for computer vision and image processing in different areas and applications such as military surveillance, and biological and medical imaging. Edge detection is a vital image processing technique used as a pre-processing step in many computer vision algorithms. However, the presence of noise makes the edge detection task more challenging; therefore, an image restoration technique is needed to tackle this obstacle by presenting an adaptive solution. As the complexity of processing is rising due to recent high-definition technologies, the expanse of data attained by the image is increasing dramatically. Thus, increased processing power is needed to speed up the completion of certain tasks. In this paper,we present a parallel implementation of hybrid algorithm-comprised edge detection and image restoration along with other processes using Computed Unified Device Architecture (CUDA) platform, exploiting a Single Instruction Multiple Thread (SIMT) execution model on a Graphical Processing Unit (GPU). The performance of the proposed method is tested and evaluated using well-known images from various applications. We evaluated the computation time in both parallel implementation on the GPU, and sequential execution in the Central Processing Unit (CPU) natively and using Hyper-Threading (HT) implementations. The gained speedup for the naïve approach of the proposed edge detection using GPU under global memory direct access is up to 37 times faster, while the speedup of the native CPU implementation when using shared memory approach is up to 25 times and 1.5 times over HT implementation. 相似文献

3.

Real-time anomaly detection in hyperspectral images using multivariate normal mixture models and GPU processing

Yuliya Tarabalka Trym Vegard Haavardsholm Ingebjørg Kåsen Torbjørn Skauli 《Journal of Real-Time Image Processing》2009,4(3):287-300

Hyperspectral imaging, which records a detailed spectrum of light arriving in each pixel, has many potential uses in remote sensing as well as other application areas. Practical applications will typically require real-time processing of large data volumes recorded by a hyperspectral imager. This paper investigates the use of graphics processing units (GPU) for such real-time processing. In particular, the paper studies a hyperspectral anomaly detection algorithm based on normal mixture modelling of the background spectral distribution, a computationally demanding task relevant to military target detection and numerous other applications. The algorithm parts are analysed with respect to complexity and potential for parallellization. The computationally dominating parts are implemented on an Nvidia GeForce 8800 GPU using the Compute Unified Device Architecture programming interface. GPU computing performance is compared to a multi-core central processing unit implementation. Overall, the GPU implementation runs significantly faster, particularly for highly data-parallelizable and arithmetically intensive algorithm parts. For the parts related to covariance computation, the speed gain is less pronounced, probably due to a smaller ratio of arithmetic to memory access. Detection results on an actual data set demonstrate that the total speedup provided by the GPU is sufficient to enable real-time anomaly detection with normal mixture models even for an airborne hyperspectral imager with high spatial and spectral resolution. 相似文献

4.

并行时空处理模型下的快速N-body算法

下载免费PDF全文

王伟曾栩鸿王福焕傅丽丽曾国荪《计算机科学与探索》2011,5(11):1006-1013

图形处理器(graphic processing unit,GPU)的最新发展已经能够以低廉的成本提供高性能的通用计算。基于GPU的CUDA(compute unified device architecture)和OpenCL(open computing language)编程模型为程序员提供了充足的类似于C语言的应用程序接口(application programming interface,API),便于程序员发挥GPU的并行计算能力。采用图形硬件进行加速计算,通过一种新的GPU处理模型——并行时间空间模型,对现有GPU上的N-body实现进行了分析,从而提出了一种新的GPU上快速仿真N-body问题的算法,并在AMD的HD Radeon 5850上进行了实现。实验结果表明,相对于CPU上的实现,获得了400倍左右的加速;相对于已有GPU上的实现,也获得了2至5倍的加速。相似文献

5.

Fast anomaly detection in hyperspectral images with RX method on heterogeneous clusters

J. M. Molero A. Paz E. M. Garzón J. A. Martínez A. Plaza I. García 《The Journal of supercomputing》2011,58(3):411-419

Remotely sensed hyperspectral sensors provide image data containing rich information in both the spatial and the spectral domain, and this information can be used to address detection tasks in many applications. One of the most widely used and successful algorithms for anomaly detection in hyperspectral images is the RX algorithm. Despite its wide acceptance and high computational complexity when applied to real hyperspectral scenes, few approaches have been developed for parallel implementation of this algorithm. In this paper, we evaluate the suitability of using a hybrid parallel implementation with a high-dimensional hyperspectral scene. A general strategy to automatically map parallel hybrid anomaly detection algorithms for hyperspectral image analysis has been developed. Parallel RX has been tested on an heterogeneous cluster using this routine. The considered approach is quantitatively evaluated using hyperspectral data collected by the NASA’s Airborne Visible Infra-Red Imaging Spectrometer system over the World Trade Center in New York, 5 days after the terrorist attacks. The numerical effectiveness of the algorithms is evaluated by means of their capacity to automatically detect the thermal hot spot of fires (anomalies). The speedups achieved show that a cluster of multi-core nodes can highly accelerate the RX algorithm. 相似文献

6.

基于噪声可见性函数的SAR图像增强快速算法

下载免费PDF全文

朱逸飞杨国王强吴文《计算机测量与控制》2018,26(2):158-160

针对SAR成像中图像模糊并伴有噪声的问题,结合噪声可见性函数,提出了一种SAR图像增强快速算法。该算法在图像分层的基础上,结合人眼视觉特性,引入噪声可见性函数,实现细节层图像的增益控制。根据GPU架构和存储结构特点,并行计算各个像素在基本层和细节层上的处理过程,完成该算法的并行优化设计与实现。实验结果表明,该算法能够有效提高图像质量,增强图像细节;同时,能够充分利用GPU的并行计算能力,有效提高SAR图像增强的实时性。相似文献

7.

RX architectures for real-time anomaly detection in hyperspectral images

A. Rossi N. Acito M. Diani G. Corsini 《Journal of Real-Time Image Processing》2014,9(3):503-517

In the field of hyperspectral image processing, anomaly detection (AD) is a deeply investigated task whose goal is to find objects in the image that are anomalous with respect to the background. In many operational scenarios, detection, classification and identification of anomalous spectral pixels have to be performed in real time to quickly furnish information for decision-making. In this framework, many studies concern the design of computationally efficient AD algorithms for hyperspectral images in order to assure real-time or nearly real-time processing. In this work, a sub-class of anomaly detection algorithms is considered, i.e., those algorithms aimed at detecting small rare objects that are anomalous with respect to their local background. Among such techniques, one of the most established is the Reed–Xiaoli (RX) algorithm, which is based on a local Gaussian assumption for background clutter and locally estimates its parameters by means of the pixels inside a window around the pixel under test (PUT). In the literature, the RX decision rule has been employed to develop computationally efficient algorithms tested in real-time systems. Initially, a recursive block-based parameter estimation procedure was adopted that makes the RX processing and the detection performance differ from those of the original RX. More recently, an update strategy has been proposed which relies on a line-by-line processing without altering the RX detection statistic. In this work, the above-mentioned RX real-time oriented techniques have been improved using a linear algebra-based strategy to efficiently update the inverse covariance matrix thus avoiding its computation and inversion for each pixel of the hyperspectral image. The proposed strategy has been deeply discussed pointing out the benefits introduced on the two analyzed architectures in terms of overall number of elementary operations required. The results show the benefits of the new strategy with respect to the original architectures. 相似文献

8.

基于CUDA的拉普拉斯边缘检测算法

下载免费PDF全文

孟小华刘坚强区业祥张庆丰《计算机工程》2012,38(18):190-193

拉普拉斯边缘检测算法常用于去除CCD天文图像中的宇宙射线噪声,但其串行算法计算复杂度较高。为此,分析拉普拉斯边缘检测算法的并行性,在统一计算设备架构(CUDA)并行编程环境下,提出一种基于CUDA的拉普拉斯边缘检测图形处理单元(GPU)并行算法。分割天文图像得到多幅子图,根据GPU的硬件配置设定Block和Grid的大小,将子图依次传输到显卡进行并行计算,传回主存后拼接得到完整的图像输出。实验结果表明,图像尺寸越大,该并行算法与串行算法相比具有的速度优势越大,可获得10倍以上的加速比。相似文献

9.

基于OpenCL的JPEG压缩算法并行化设计与实现

张敏华张剑贤裘雪红周端《计算机工程与科学》2017,39(5):860-860

随着图像数据的大量增加,传统单处理器或多处理器结构的计算设备已无法满足实时性数据处理要求。异构并行计算技术因其高效的计算效率和并行的实时性数据处理能力,正得到广泛关注和应用。利用GPU在图形图像处理方面并行性的优势,提出了基于OpenCL的JPEG压缩算法并行化设计方法。将JPEG算法功能分解为多个内核程序,内核之间通过事件信息传递进行顺序控制,并在GPU+CPU的异构平台上完成了并行算法的仿真验证。实验结果表明,与CPU串行处理方式相比,本文提出的并行化算法在保持相同图像质量情况下有效提高了算法的执行效率,大幅降低了算法的执行时间,并且随着图形尺寸的增加,算法效率获得明显的提升。相似文献

10.

Accelerating computation of Euclidean distance map using the GPU with efficient memory access

《International Journal of Parallel, Emergent and Distributed Systems》2013,28(5):383-406

Recent graphics processing units (GPUs), which have many processing units, can be used for general purpose parallel computation. To utilise the powerful computing ability, GPUs are widely used for general purpose processing. Since GPUs have very high memory bandwidth, the performance of GPUs greatly depends on memory access. The main contribution of this paper is to present a GPU implementation of computing Euclidean distance map (EDM) with efficient memory access. Given a two-dimensional (2D) binary image, EDM is a 2D array of the same size such that each element stores the Euclidean distance to the nearest black pixel. In the proposed GPU implementation, we have considered many programming issues of the GPU system such as coalesced access of global memory and shared memory bank conflicts, and so on. To be concrete, by transposing 2D arrays, which are temporal data stored in the global memory, with the shared memory, the main access from/to the global memory enables to be performed by coalesced access. In practice, we have implemented our parallel algorithm in the following three modern GPU systems: Tesla C1060, GTX 480 and GTX 580. The experimental results have shown that, for an input binary image with size of 9216 × 9216, our implementation can achieve a speedup factor of 54 over the sequential algorithm implementation. 相似文献

11.

基于图形处理器的通用计算模式* 总被引：4，自引：4，他引：0

王磊张春燕《计算机应用研究》2009,26(6):2356-2358

针对GPU图形处理的特点,分析其应用于通用计算的并行处理机制和数据映射,提出了一种GPU通用计算模式的映射机制和一般性设计方法,并针对GPU的吞吐量、数据流处理能力和基本数学运算能力等进行性能测试,为GPU通用计算的算法设计、实现和性能优化提供参考依据。相似文献

12.

Massively Parallel Hierarchical Scene Processing with Applications in Rendering

Marek Vinkler Jiří Bittner Vlastimil Havran Michal Hapala 《Computer Graphics Forum》2013,32(8):13-25

We present a novel method for massively parallel hierarchical scene processing on the GPU, which is based on sequential decomposition of the given hierarchical algorithm into small functional blocks. The computation is fully managed by the GPU using a specialized task pool which facilitates synchronization and communication of processing units. We present two applications of the proposed approach: construction of the bounding volume hierarchies and collision detection based on divide‐and‐conquer ray tracing. The results indicate that using our approach we achieve high utilization of the GPU even for complex hierarchical problems which pose a challenge for massive parallelization. The results indicate that using our approach we achieve high utilization of the GPU even for complex hierarchical problems which pose a challenge for massive parallelization. 相似文献

13.

Discrete Wavelet Transform on Consumer-Level Graphics Hardware 总被引：1，自引：0，他引：1

Tien-Tsin Wong Chi-Sing Leung Pheng-Ann Heng Jianqing Wang 《Multimedia, IEEE Transactions on》2007,9(3):668-673

Discrete wavelet transform (DWT) has been heavily studied and developed in various scientific and engineering fields. Its multiresolution and locality nature facilitates applications requiring progressiveness and capturing high-frequency details. However, when dealing with enormous data volume, its performance may drastically reduce. On the other hand, with the recent advances in consumer-level graphics hardware, personal computers nowadays usually equip with a graphics processing unit (GPU) based graphics accelerator which offers SIMD-based parallel processing power. This paper presents a SIMD algorithm that performs the convolution-based DWT completely on a GPU, which brings us significant performance gain on a normal PC without extra cost. Although the forward and inverse wavelet transforms are mathematically different, the proposed algorithm unifies them to an almost identical process that can be efficiently implemented on GPU. Different wavelet kernels and boundary extension schemes can be easily incorporated by simply modifying input parameters. To demonstrate its applicability and performance, we apply it to wavelet-based geometric design, stylized image processing, texture-illuminance decoupling, and JPEG2000 image encoding 相似文献

14.

A fast Hough Transform algorithm for straight lines detection in an image using GPU parallel computing with CUDA-C

R. Yam-Uicab J. L. Lopez-Martinez J. A. Trejo-Sanchez H. Hidalgo-Silva S. Gonzalez-Segura 《The Journal of supercomputing》2017,73(11):4823-4842

The Hough Transform (HT) is a digital image processing method for the detection of shapes which has multiple uses today. A disadvantage of this method is its sequential computational complexity, particularly when a single processor is used. An optimized algorithm of HT for straight lines detection in an image is presented in this article. Optimization is realized by using a decomposition of the input image recently proposed via central processing unit (CPU), and the technique known as segment decomposition. Optimized algorithms improve execution times significantly. In this paper, the optimization is implemented in parallel using graphics processing unit (GPU) programming, allowing a reduction of total run time and achieving a performance more than 20 times better than the sequential method and up to 10 times better than the implementation recently proposed. Additionally, we introduce the concept of Performance Ratio, to emphasize the outperforming of the GPU over the CPUs. 相似文献

15.

CUDA架构下H.264快速去块滤波算法 总被引：1，自引：0，他引：1

刘虎孙召敏陈启美《计算机应用》2010,30(12):3252-3254

针对H.264/AVC视频编码标准中去块滤波器运算复杂度高、耗时巨大这一难题,提出了一种基于NVIDIA计算统一设备架构（CUDA）平台的H.264并行快速去块滤波算法,介绍了CUDA平台硬件结构特点与软件开发流程,根据图形处理器（GPU）的并发结构特点,对BS判定与滤波计算进行了并行优化,降低了算法复杂度,利用共享内存提高了数据访问速率,实现了去块滤波器的并行处理。实验结果表明,在图像质量基本不变的情况下,GPU算法能够明显提高运算速度,平均加速比在20倍左右,取得了良好的效果。相似文献

16.

基于GPU多流并发并行模型的NDVI提取算法

左宪禹张哲苏岳瀚刘扬葛强田军锋《计算机科学》2020,47(4):25-29

利用GPU进行加速的归一化差分植被指数(Normalized Differential Vegetation Index,NDVI)提取算法通常采用GPU多线程并行模型,存在弱相关计算之间以及CPU与GPU之间数据传输耗时较多等问题,影响了加速效果的进一步提升。针对上述问题,根据NDVI提取算法的特性,文中提出了一种基于GPU多流并发并行模型的NDVI提取算法。通过CUDA流和Hyper-Q特性,GPU多流并发并行模型可以使数据传输与弱相关计算、弱相关计算与弱相关计算之间达到重叠,从而进一步提高算法并行度及GPU资源利用率。文中首先通过GPU多线程并行模型对NDVI提取算法进行优化,并对优化后的计算过程进行分解,找出包含数据传输及弱相关性计算的部分;其次,对数据传输和弱相关计算部分进行重构,并利用GPU多流并发并行模型进行优化,使弱相关计算之间、弱相关计算和数据传输之间达到重叠的效果;最后,以高分一号卫星拍摄的遥感影像作为实验数据,对两种基于GPU实现的NDVI提取算法进行实验验证。实验结果表明,与传统基于GPU多线程并行模型的NDVI提取算法相比,所提算法在影像大于12000*12000像素时平均取得了约1.5倍的加速,与串行提取算法相比取得了约260倍的加速,具有更好的加速效果和并行性。相似文献

17.

Fermi架构下超声成像组织运动可视化并行算法

何兴无《计算机系统应用》2013,22(4):147-152

在临床超声实时成像系统中组织运动情况是医生想要获取的重要诊断信息, 例如心脏运动. 基于线积分卷积的二维矢量场可视化技术可以同时展现运动矢量场的强度和方向. 但这一算法在处理时涉及大量的复杂计算, 尤其是流线追踪处理部分, 使其成为临床实时成像系统中的一大性能提升瓶颈. 为此研究并提出了一种基于新兴的高性能并行计算平台Fermi架构GPU(graphics processing unit图形处理单元)的并行运动可视化算法. 数据测试结果显示, 与基于CPU的实现相比, 采用Fermi架构的GPU处理不仅可相似文献

18.

基于图形处理器的边缘检测算法 总被引：1，自引：0，他引：1

张楠王建立王鸣浩《计算机科学》2010,37(1):265-267

边缘检测是一种高度并行的算法,计算量较大,传统的CPU处理难以满足实时要求。针对图像边缘检测问题的计算密集性,在分析常用边缘检测算法的基础上,利用CUDA(Compute Unified Device Architecture,计算统一设备架构)软硬件体系架构,提出了图像边缘检测的GPU(Graphics Processing Unit,图形处理器)实现方案。首先介绍GPU高强度并行运算的体系结构基础,并将Roberts和Sobel这两个具有代表性的图像边缘检测算法移植到GPU,然后利用当前同等价格的CPU和GPU进行对比实验,利用多幅不同分辨率图像作为测试数据,对比CPU和GPU方案的计算效率。实验结果表明,与相同算法的CPU实现相比,其GPU实现获得了相同的处理效果,并将计算效率最高提升到了17倍以上,以此证明GPU在数字图像处理的实际应用中大有潜力。相似文献

19.

双重并行环境下最短路径的研究

下载免费PDF全文

孙玉强李银银顾玉宛《计算机测量与控制》2017,25(3):195-196, 230

并行问题和最短路径问题已成为一个热点研究课题,传统的最短路径算法已不能满足数据爆炸式增长的处理需求,尤其当网络规模很大时,所需的计算时间和存储空间也大大的增加;MapReduce模型的出现,带来了一种新的解决方法来解决最短路径;GPU具有强大的并行计算能力和存储带宽,与CPU相比具有明显的优势;通过研究MapReduce模型和GPU执行过程的分析,指出单独基于MapReduce模型的最短路径并行方法存在的问题,降低了系统的性能;论文的创新点是结合MapReduce和GPU形成双并行模型,并行预处理数据,针对最短路径中的数据传输和同步开销,增加数据动态处理器;最后实验从并行算法的性能评价指标平均加速比进行比较,结果表明,双重并行环境下的最短路径的计算,提高了加速比。相似文献

20.

基于GPU的快速Level Set图像分割 总被引：5，自引：1，他引：5

下载免费PDF全文

吴仲乐王遵亮罗立民《中国图象图形学报》2004,9(6):679-683

水平集(1evel set)图像分割方法是图像分割中的一个重要方法，但是该算法的计算量大，往往不能达到实时处理的要求。给出了利用新一代的可编程图形处理器(GPU)实现level set的加速算法。首先介绍了如何在GPU上利用片元渲染程序进行网格化的线性运算和有限差分PDE计算，把level set方法的离散化算子映射到GPU上。由于以数据流处理方式的GPU的存储访问快，具有并行运算能力，同时level set算法演化的显示不再需要把数据从CPU传到GPU，因此较大地提高了算法速度与交互显示。文中实现并测试了一个与初始化状态独立的二维level set的算子用于图像分割，并对其运算结果和性能进行了比较，结果表明该方法具有更快的速度。相似文献