期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

傅贤超《现代计算机》2013,(11)

针对双向相似性计算在CPU下串行计算效率低下,无法满足实际需求的问题,利用该计算中数据独立性的特点,应用CUDA编程模型实现基于GPU加速的图像双向相似性计算。与CPU相比,在392x300的分辨率实验下,该算法在GPU上可获得超过1200倍的加速比。相似文献

2.

一种中西文图像的存储存储与恢复方法

李桂根《新浪潮．学网络》1994,(11):44-46

本文叙述日本汽车制动盘国产化的研制意义和研制过程中的关键技术，以及试成功后的技术经济效益分析。相似文献

3.

GPU上稀疏矩阵与矢量乘积运算的一种改进

马超韦刚裴颂文吴百锋《计算机系统应用》2010,19(5):116-120

稀疏矩阵和矢量的乘积运算在工程实践及科学计算中经常用到,随着矩阵规模的增长,大量的计算限制了整个系统的性能,因此可以利用GPU的高运算能力加速SpMV。分析了现有GPU上实现的SpMV存在的问题,并设计了行分割优化和float4数据类型优化两种方案。实验表明,该方案可以使性能提升2—8倍。相似文献

4.

基于GPU的图像特征并行计算方法

张杰柴志雷喻津《计算机科学》2015,42(10):297-300, 324

特征提取与描述是众多计算机视觉应用的基础。局部特征提取与描述因像素级处理产生的高维计算而导致其计算复杂、实时性差,影响了算法在实际系统中的应用。研究了局部特征提取与描述中的关键共性计算模块——图像金字塔机制及图像梯度计算。基于NVIDIA GPU/CUDA架构设计并实现了共性模块的并行计算,并通过优化全局存储、纹理存储及共享存储的访问方式进一步实现了其高效计算。实验结果表明,基于GPU的图像金字塔和图像梯度计算比CPU获得了30倍左右的加速,将实现的图像金字塔和图像梯度计算应用于HOG特征提取与描述算法,相比CPU获得了40倍左右的加速。该研究对于基于GPU实现局部特征的高速提取与描述具有现实意义。相似文献

5.

基于FPGA+GPU的图像采集处理系统设计

下载免费PDF全文

蒋俊伦丰大强徐新瑞程坤常中坤王桢《计算机测量与控制》2023,31(8):273-279

随着嵌入式图像处理系统的快速发展,对于前端图像采集模块的需求越来越高。图像采集的速度、分辨率、可靠性以及集成度对后续设计的准确度由极大的影响。通过对数字图像采集系统进行研究,设计出了基于FPGA和GPU架构的图像采集处理系统,重点研究了图像采集处理系统的硬件设计过程和软件设计过程。在基于FPGA+GPU的图像采集处理系统中,让具有强大运算处理能力的GPU专注于数据存储、用户交互以及后续的图像处理。系统中,FPGA则负责图像的采集、外设控制、任务调度。GPU与FPGA之间通过高速PCIE总线进行通信,分别设计编写基于Linux系统的驱动程序和FPGA端PCIE程序。实验结果表明,所设计基于FPGA+GPU的图像采集处理系统可实现437.5Mbps的实时图像采集存储速度,传输过程实时稳定,数据传输完整。相似文献

6.

GPU实现的高速FIR数字滤波算法

陈孝良邓仰东程晓斌李晓东田静《计算机辅助设计与图形学学报》2010,22(9):1435-1442

针对目前基于GPU的FIR算法速度低、扩展性差的缺点,提出一种高速的多通道FIR数字滤波的并行算法,并利用平衡并行运算负载的技术以及降低内存访问密度的方法进行加速.该算法采用矩阵乘法的并行运算技术在GPU上建立并行滤波模型,通过每个线程在单个指令周期内执行2个信号运算,实现了多通道信号的高速滤波.实验结果表明,在GTX260+平台上,采用文中算法的平均加速比达到了203,效率超过40%,并且具有更好的扩展性. 相似文献

7.

基于GPU的遥感图像融合并行算法研究

赵进刘昌明宋峰张丽萍《微型机与应用》2013,32(6)

基于通用GPU并行计算技术,结合遥感图像数据融合处理特点,利用NVIDIA公司的CUDA编程框架,在其GPU平台上对BROVEY变换和YIQ变换融合算法进行了并行研究与实现.实验结果表明,随着遥感图像融合算法的计算复杂度、融合处理的问题规模逐渐增加,GPU并行处理的加速性能优势也逐渐增大,GPU通用计算技术在遥感信息处理领域具有广阔的应用前景. 相似文献

8.

变换存储结构的一种高效排序算法 总被引：2，自引：0，他引：2

孟佳娜卢云宏《小型微型计算机系统》2004,25(7):1406-1408

给出变换存储结构的一种高效排序算法 ,该算法的时间复杂度为 O(n) ,且与待排序数据的分布无关 .给出了该排序算法的描述 ,并在时间复杂度和空间复杂度两方面与其他排序算法作了比较相似文献

9.

二叉树的一种新存储结构

李希春《计算机学报》1996,19(7):554-557

本文提出了一种可简单、高效地表示二叉树的存储结构。该结构：（１）显著地提高了寻找给定结点的父／兄结点等基本操作的时间效率，达到Ｏ（１），高于传统结构树下的效率；（２）使遍历操作不再显式或隐式地使用辅助堆栈；（３）提高了存储结构中指针字段利用率；（４）保持其它基本操作的效率不变。相似文献

10.

基于GPU的快速图像拷贝检测

谢洪涛高科张勇东李锦涛刘毅志《计算机辅助设计与图形学学报》2010,22(9):1483-1490

为了利用GPU强大的并行处理能力提高图像拷贝检测速度,提出一种基于GPU的图像拷贝检测方法.首先结合GPU的架构设计了尺度不变特征点提取算法——Harris-Hessian算法,通过在低尺度图像上检测特征点,在图像的一系列尺度空间中根据Hessian矩阵的行列式精确确定特征点的位置和尺度,显著地减少了像素级的计算量,并具有更好的并行性;在此基础上建立了图像拷贝检测系统,检测速度得到显著提升.实验结果表明,与基于CPU实现的传统算法相比,Harris-Hessian算法可以获得10~20倍的加速比,并可保证较高的检测精度.在11 250幅的图像库中,使用文中系统检测一幅640×480图像平均只需19.8 ms,并具有95%的正确率,满足了大规模数据下实时应用的需求. 相似文献

11.

A combined storage structure for image processing algorithms on GPU

ZUO Xian-yu ZHANG Zhe HUANG Xiang-zhi GE Qiang ZHANG Li-tao ZANG Wen-qian 《计算机工程与科学》1990,42(2):197

相似文献

12.

基于GPU的复杂网络社区挖掘算法并行计算

赵雅端卢罡赵英山岚《计算机应用研究》2013,30(8):2426-2428

由于复杂网络的规模越来越大, 在大规模的复杂网络中快速、准确地挖掘出隐藏的社区结构是当前该领域研究的热点问题。目前社区结构挖掘常用的基于快速Newman算法的社区结构挖掘算法之一是一般概率框架方法。以规模日益增大的复杂网络为研究对象, 提出了基于GPGPU的一般概率框架并行算法, 有效地解决了在大规模的复杂网络中快速、准确地挖掘出隐藏的社区结构问题。实验证明, 随着节点数的增加, 该并行算法在不损失准确性的前提下运行效率有所提高, 为复杂网络社区结构挖掘的研究提供了一种高效的解决方案。相似文献

13.

基于GPU的快速三维医学图像刚性配准技术* 总被引：2，自引：1，他引：2

秦安徐建冯前进孟晓林陈武凡《计算机应用研究》2010,27(3):1198-1200

自动三维配准将多个图像数据映射到同一坐标系中,在医学影像分析中有广泛的应用。但现有主流三维刚性配准算法(如FLIRT)速度较慢,2563大小数据的刚性配准需要300 s左右,不能满足快速临床应用的需求。为此提出了一种基于CUDA(compute unified device architecture)架构的快速三维配准技术,利用GPU(gra-phic processing unit)并行计算实现配准中的坐标变换、线性插值和相似性测度计算。临床三维医学图像上的实验表明,该技术在保持配准精度的前提下将速度提相似文献

14.

Parallel design for error-resilient entropy coding algorithm on GPU

Yuan Dai Yong Fang Dongjian He Bormin Huang 《Journal of Parallel and Distributed Computing》2013

The error-resilient entropy coding (EREC) algorithm is an effective method for combating error propagation at low cost in many compression methods using variable-length coding (VLC). However, the main drawback of the EREC is its high complexity. In order to overcome this disadvantage, a parallel EREC is implemented on a graphics processing unit (GPU) using the NVIDIA CUDA technology. The original EREC is a finer-grained parallel at each stage which brings additional communication overhead. To achieve high efficiency of parallel EREC, we propose partitioning the EREC (P-EREC) algorithm, which splits variable-length blocks into groups and then every group is coded using the EREC separately. Each GPU thread processes one group so as to make the EREC coarse-grained parallel. In addition, some optimization strategies are discussed in order to obtain higher performance using the GPU. In the case that the variable-length data blocks are divided into 128 groups (256 groups, resp.), experimental results show that the parallel P-EREC achieves 32×

32 \times

to 123×

123 \times

(54×

54 \times

to 350×

350 \times

, resp.) speedup over the original C code of EREC compiled with the _O₂

O_{2}

optimization option. Higher speedup can even be obtained with more groups. Compared to the EREC, the P-EREC not only achieves a good speedup performance, but it also slightly improves the resilience of the VLC bit-stream against burst or random errors. 相似文献

15.

Image stylization with enhanced structure on GPU

LI Ping SUN HanQiu SHENG Bin SHEN JianBing 《中国科学:信息科学(英文版)》2012,(5):1093-1105

相似文献

16.

Vectorized algorithms for astronomical image processing

Giuseppe A. De Biase Paolo Ciucci Marco Cottone 《Parallel Computing》1989,10(3):339-346

Astronomical Image Processing (AIP) needs extremely interactive software systems. It is therefore necessary to execute in a short time frequently used, time consuming elaborations, such as convolution, geometrical normalization and geometrical correction. In this paper such operations have been splitted in a scalar part (typical of the operation) and in a vectorial part (in common among the whole operation set). In this way it is possible to achieve their optimization with regard to a native or attached vector processor. The common vectorial part (vectorial nucleus) is studied in detail and checked on VAX 11/780+FPS 5410, CRAY X-MP/12 and IBM 3090VF processors. 相似文献

17.

A parallel scheme for accelerating parameter sweep applications on a GPU

Fumihiko Ino Kentaro Shigeoka Tomohiro Okuyama Masaya Motokubota Kenichi Hagihara 《Concurrency and Computation》2014,26(2):516-531

This paper proposes a parallel scheme for accelerating parameter sweep applications on a graphics processing unit. By using hundreds of cores on the graphics processing unit, we found that our scheme simultaneously processes multiple parameters rather than a single parameter. The simultaneous sweeps exploit the similarity of computing behaviors shared by different parameters, thus allowing memory accesses to be coalesced into a single access if similar irregularities appear among the parameters’ computational tasks. In addition, our scheme reduces the amount of off‐chip memory access by unifying the data that are commonly referenced by multiple parameters and by placing the unified data in the fast on‐chip memory. In several experiments, we applied our scheme to practical applications and found that our scheme can perform up to 8.5 times faster than a naive scheme that processes a single parameter at a time. We also include a discussion on application characteristics that are required for our scheme to outperform the naive scheme. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

18.

基于GPU加速的定向凝固相场模拟计算研究

胡延苏高昂王志军慕德俊《计算机科学》2015,42(7):19-21, 56

相场法作为一种极具优势的微观组织数值模拟方法,已经在凝固微观组织演化机制的研究中得到了广泛应用。然而无论是从计算尺度还是微观组织演化时间上考虑,相场模拟计算量均非常大,对计算机有着非常高的要求。相对于传统的中央处理器(CPU)计算,图形处理器(GPU)计算是最近发展的一种高效计算手段。提出了一种基于GPU加速的定向凝固相场模拟计算策略,实现了大尺度条件下的定向凝固界面形态演化的加速计算。计算结果表明,对于单个计算机,GPU计算与CPU计算的加速比可以高达30余倍。GPU加速将为相场模拟的发展及应用带来新的契机。相似文献

19.

基于CUDA的双三次B样条缩放方法 总被引：4，自引：2，他引：2

下载免费PDF全文

桂叶晨冯前进刘磊陈武凡《计算机工程与应用》2009,45(1):183-185

Nvidia在GeForce 8系列显卡上推出的CUDA（统一计算设备架构）技术使GPU通用计算（GPGPU）从图形硬件流水线和高级绘制语言中解放出来,开发人员无须掌握图形学编程方法即可在单任务多数据模式（SIMD）下完成高性能并行计算。研究了CUDA的设计思想和编程方式,改进了基于双三次B样条曲面的图像缩放算法,使用多个线程将计算中耗时的B样条重采样部分改造成SIMD模式,并分别采用CUDA中全局存储器和共享存储器策略在CUDA上完成图像缩放的全过程。实验结果表明,基于CUDA的B样条曲面并行插值方法成功实现了硬件加速,相对于CPU上运行的B样条缩放算法,其执行效率明显提高,易于扩展,对于大规模数据处理呈现出良好的实时处理能力。相似文献

20.

Accelerating ant colony optimisation for the travelling salesman problem on the GPU

Akihiro Uchida Yasuaki Ito 《International Journal of Parallel, Emergent and Distributed Systems》2014,29(4):401-420

Recent graphics processing units (GPUs) can be used for general purpose parallel computation. Ant colony optimisation (ACO) approaches have been introduced as nature-inspired heuristics to find good solutions of the travelling salesman problem (TSP). In ACO approaches, a number of ants traverse the cities of the TSP to find better solutions of the TSP. The ants randomly select next visiting cities based on the probabilities determined by total amounts of their pheromone spread on routes. The main contribution of this paper is to present sophisticated and efficient implementation of one of the ACO approaches on the GPU. In our implementation, we have considered many programming issues of the GPU architecture including coalesced access of global memory and shared memory bank conflicts. In particular, we present a very efficient method for random selection of next cities by a number of ants. Our new method uses iterative random trial which can find next cities in few computational costs with high probability. This idea can be applied in not only GPU implementation but also CPU implementation. The experimental results on NVIDIA GeForce GTX 580 show that our implementation for 1002 cities runs in 8.71 s, while the CPU implementation runs in 190.05 s. Thus, our GPU implementation attains a speed-up factor of 22.11. 相似文献