首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 281 毫秒
1.
韩玉  闫镔  宇超群  李磊  李建新 《计算机应用》2012,32(5):1407-1410
针对FDK算法重建耗时长的问题,提出了一种基于图形处理器(GPU)的FDK并行加速算法。通过采用合理的线程分配方式,对反投影参数计算过程中与体素无关的中间变量的提取和预计算、对全局存储器访问次数的细致优化等策略,提高FDK算法的执行效率。仿真实验结果表明,在不牺牲重建质量的前提下,完全优化后的FDK并行加速算法重建2563规模的体数据需要0.5s,重建5123规模的体数据需要2.5s,这与较新的研究成果相比有很大幅度的提升。  相似文献   

2.
基于ISA-DWT的多个任意形状感兴趣区域编码框架   总被引:1,自引:0,他引:1       下载免费PDF全文
徐平 《中国图象图形学报》2006,11(10):1426-1430
提出了一个多个任意形状感兴趣区域编码框架.通过引入灰度掩膜图像来表示不同感兴趣区域的优先级,在总的目标码率和各个优先级感兴趣区域重建质量的约束条件下,按照优先级从高到低的顺序对不同的感兴趣区域采用整数到整数形状自适应离散小波进行变换,对变换后的系数采用修改的SPIHT算法产生单独的压缩码流.实验结果表明,本编码框架有如下优点:(1)根据感兴趣区域的优先级顺序来保证感兴趣区域的目标重建质量;(2)适合于任意形状感兴趣区域的有损和无损压缩;(3)感兴趣区域重建质量估计方法简单有效.  相似文献   

3.
针对锥束CT感兴趣区域扫描中存在的截断投影数据图像重建问题,提出用基于迭代的代数重建(ART)算法进行重建。锥束ART算法的缺点是计算量大、重建速度慢。为了提高该算法的重建速度,提出了一种基于多核平台的快速并行图像重建方法。首先将三维重建区域等分为上下两块,相应地,探测器平面也分为上下两部分;然后通过双线性插值计算虚拟探测器投影数据;最后通过多线程技术在多核平台上实现了ART算法的并行重建,在保持较高重建精度的同时取得了约两倍的重建加速比。在此基础上,通过仿真实验对3DShepp-Logan模型不同感兴趣区域进行了重建,实验结果表明,ART算法用于感兴趣区域图像重建是可行的。  相似文献   

4.
CleanBlue 《微型计算机》2009,(16):153-158
很显然,这是一个对数据带宽极度渴求的时代:CPU总是在等待内存数据、GPU总是处于显存读写之中……计算机的内存从单通道到双通道,从SDR到DDR3,带宽在不停的增加;图形核心的显存控制器从128-bif到512-bit,甚至在用上了GDDR5显存后,依旧发觉需要更多的带宽。2005年,30GB/s的数据带宽一般可满足显卡需求;现在,GeForce GTX 285带来了超过140GB/s的显存带宽,而在接下来的2010年,GPU很可能需要超过500GB/s的数据带宽。  相似文献   

5.
过传卫  胡福乔 《微计算机信息》2007,23(19):292-293,276
我们提出了一种快速的扇束等距CT滤波反投影重建算法.这种新算法是传统标准滤波反投影(FBP)重建算法的加速形式,主要通过减少投影数量然后重建子图象来实现.实验结果表明:对于一幅512x512图像,这种算法可以将重建过程加速40倍以上,并且不会引入明显的图像误差.  相似文献   

6.
本文研究六边形区域上快速傅里叶变换(FFTH)的CUDA-MPI算法及其实现.首先,我们通过充分利用CUDA的层次化并行机制及其库函数,设计了FFTH的高效率的CUDA算法.对于规模为3×2048~2的双精度复数类型数据,我们设计的CUDA程序与CPU串行程序相比可以达到12倍加速比,如果不计内存和显存之间的数据传输,则加速比可达40倍;其计算效率与CUFFT所提供的二维方形区域FFT程序的效率基本一致.在此基础上,我们通过研究GPU上分布式并行数据的转置与排序算法,优化设计了FFTH的CUDA-MPI算法.在3×8192~2的数据规模、10节点×6GPU的计算环境下,我们的CUDA-MPI程序与CPU串行程序相比达到了55倍的加速;其效率比MPI并行版FFTW以及基于CUFFT本地计算和FFTW并行转置的方形区域并行FFT的效率都要高出很多.FFTH的CUDA-MPI算法研究和测试为大规模CPU+GPU异构计算机系统的可扩展新型算法的探索提供了参考.  相似文献   

7.
基于感兴趣区域半自动提取的图像压缩的研究   总被引:1,自引:0,他引:1  
为了减少图像数据的存储空间和传输时间,提出一种基于感兴趣区域半自动提取的图像压缩方法.与已有算法不同,该算法在对图像进行感兴趣区域分割时,并不是完全人工或者自动分割,而是基于手动的基础上结合区域生长的方法进行分割.得到较精确的感兴趣区域.在实验结果表明,该方法能在精确保存有用信息的前提下,得到较高的压缩比和较高质量的复原图像.  相似文献   

8.
硬件加速反走样体Splatting算法   总被引:2,自引:1,他引:1  
提出一个完全基于图形硬件实现的反走样体Splatting算法的全新框架利用可编程图形硬件中的顶点渲染器和像素渲染器实现距离相关的低通滤波核,获得透视投影下的Splatting体绘制算法的反走样利用规则体数据的空间分布规律和可编程图形流水线,提出一种增量式的压缩比为700 的顶点数据压缩算法,将体素几何完全置于显存实验结果表明,文中算法在512×512的图像分辨率下,每秒可绘制300万个反走样体素  相似文献   

9.
低剂量计算机断层扫描技术(Low-Dose Computed Tomography,LDCT)降低了X射线对人体的辐射,但射线剂量降低造成重建图像中存在严重的伪影和噪声,对临床医学诊断有很大干扰。针对此问题,提出一种改进的各向异性加权先验模型的最大后验(Maximum A Posteriori,MAP)投影域降噪算法。该算法考虑到直觉模糊熵能够有效区分平滑区域和边缘细节区域,将其与传统的各向异性扩散系数相结合,构造了一种新的扩散系数,并采用局部方差实现其自适应调节;最后将该扩散系数融合于基于Huber先验的MAP优化估计算法框架中,实现对投影数据不同区域进行不同强度的降噪处理。该算法分别采用数字骨盆模型、Shepp-Logan头模型和数字胸腔模型三种体模进行验证,并与滤波反投影重建算法(Filter Back Projection,FBP)、惩罚重加权最小二乘法(Penalized Reweighted Least-Squares,PRWLS)、各向异性加权先验正弦图平滑算法进行对比。实验结果表明,利用所提算法重建出的图像中伪影明显减少,同时较好地保持了图像的边缘和细节信息。三种体模的信噪比分别为20.502 0 dB、23.294 8 dB、21.018 4 dB,所需时间分别为49.50 s、49.60 s、8.59 s。  相似文献   

10.
对基于facet模型的表面检测的加速技术研究   总被引:2,自引:0,他引:2       下载免费PDF全文
针对基于facet模型的亚体素表面检测算法计算量大的问题,提出将3维facet模型的可分滤波器递归算法与感兴趣区域加速策略相结合的加速方案。可分滤波器递归算法通过在离散正交基下将3维卷积转换为3个1维卷积并使1维卷积递归执行,使计算量与卷积核大小无关,大大节省了计算时间。采用增量算法解决了可分滤波器递归算法内存耗费量大的问题。感兴趣区域加速策略采用图像分割后提取目标的分段包围盒作为有效区域,从而大幅缩减待处理数据量。实验结果表明本文加速方案在保持原始算法精度的同时,能取得很好的加速效果。  相似文献   

11.
以美国德州仪器公司的高性能数字信号处理器TMS320C6455为核心,在低成本DSP(Digital Signal Processor)平台上优化并实现了一种基于滤波反投影的PET(Positron Emission Tomography)图像重建算法FBP (Filtered Back Projection)。实验结果表明,通过针对性的算法和代码优化,系统能够在40秒内完成512×512分辨率PET图像的重建,并获得满足应用需要的图像清晰度。基于DSP平台的PET图像重建方法在PET医学图像重建及其他医学成像领域具有良好的应用前景。  相似文献   

12.
三维锥束CT图像重建运算量大,纯软件(仅使用CPU)计算时间较长。为了充分利用计算机图形处理器(Graphic Process Unit,GPU)的并行处理能力以及提高数据传输效率,研究了一种结合使用GPU多重纹理(multitexture)加速三维锥束CT的FDK图像重建过程的方法。该方法采用多重纹理映射来提高反投影速度、减少中间数据存储量、减少浮点累加次数,使用顶点颜色通道来实现距离加权运算,采用扩展方法来增加并行反投影的纹理单元,从而提高重建速度。计算机实验结果表明,使用普通PC机重建尺寸为2563的图像,在保证数据精度为16 bit浮点数的要求下,GPU反投影计算可以在10 s以内完成。与仅使用CPU的重建方法相比,GPU重建图像加速方法达到了较高的时间加速比。  相似文献   

13.
The rapid advance of computer hardware and popularity of multimedia applications enable multi-core processors with sub-word parallelism instructions to become a dominant market trend in desk-top PCs as well as high end mobile devices. This paper presents an efficient parallel implementation of 2D convolution algorithm demanding high performance computing power in multi-core desktop PCs. It is a representative computation intensive algorithm, in image and signal processing applications, accompanied by heavy memory access; on the other hand, their computational complexities are relatively low. The purpose of this study is to explore the effectiveness of exploiting the streaming SIMD (Single Instruction Multiple Data) extension (SSE) technology and TBB (Threading Building Block) run-time library in Intel multi-core processors. By doing so, we can take advantage of all the hardware features of multi-core processor concurrently for data- and task-level parallelism. For the performance evaluation, we implemented a 3?×?3 kernel based convolution algorithm using SSE2 and TBB with different combinations and compared their processing speeds. The experimental results show that both technologies have a significant effect on the performance and the processing speed can be greatly improved when using two technologies at the same time; for example, 6.2, 6.1, and 1.4 times speedup compared with the implementation of either of them are suggested for 256?×?256, 512?×?512, and 1024?×?1024 data sets, respectively.  相似文献   

14.
This paper proposes a machine learning based method which can detect certain events automatically and precisely in biomedical imaging. We detect one important and not well-defined event, which is called flash, in fluorescence images of Escherichia coli. Given a time series of images, first we propose a scheme to transform the event detection on region of interest (ROI) in images to a classification problem. Then with supervised human labeling data, we develop a feature selection technique to utilize support vector machine (SVM) to solve this classification problem. To reduce the time in training SVM model, a parallel version of SVM training is implemented. On ten stacks of fluorescence images labeled by experts, each of which owns one hundred 512 ·512 images with in total 4906 ROIs and 72056 labeled events, event detection with proposed method takes 19 seconds, while human labeling roughly costs 60 hours. With human labeling as the standard, the accuracy of our method achieves an F-value of about 0.81. This method is much faster than human detection and expects to be more precise with bigger data. It also can be expanded to a series of event detection with similar properties and improve efficiency of detection greatly.  相似文献   

15.
哈希函数SHA512是一种目前广泛使用的加密算法,在现代加密学中占据很重要的地位。鉴于拟态计算机高性能和高效能的特点,对SHA512算法进行了深入分析,提出了基于拟态计算机的全流水线结构的实现方案。为了提高算法的运算速率,在关键路径对加法运算进行了优化,并且配合全流水线结构,减少了加密一个数据分组所需要的时钟周期数,提高了数据吞吐率。在拟态计算机上实际运行,芯片工作在130 MHz的时钟频率下,数据吞吐率达到133 120 Mbits/s,性能得到了显著提高,且能效比高于通用服务器的能效比。  相似文献   

16.
As today’s standard screening methods frequently fail to detect breast cancer before metastases have developed, early diagnosis is still a major challenge. With the promise of high-quality volume images, three-dimensional ultrasound computer tomography is likely to improve this situation, but has high computational needs. In this work, we investigate the acceleration of the ray-based transmission reconstruction by a GPU-based implementation of the iterative numerical optimization algorithm TVAL3. We identified the regular and transposed sparse-matrix–vector multiply as the performance limiting operations. For accelerated reconstruction we propose two different concepts and devise a hybrid scheme as optimal configuration. In addition we investigate multi-GPU scalability and derive the optimal number of devices for our two primary use-cases: a fast preview mode and a high-resolution mode. In order to achieve a fair estimation of the speedup, we compare our implementation to an optimized CPU version of the algorithm. Using our accelerated implementation we reconstructed a preview 3D volume with 24,576 unknowns, a voxel size of (8 mm)3 and approximately 200,000 equations in 0.5 s. A high-resolution volume with 1,572,864 unknowns, a voxel size of (2mm)3 and approximately 1.6 million equations was reconstructed in 23 s. This constitutes an acceleration of over one order of magnitude in comparison to the optimized CPU version.  相似文献   

17.
In this paper, we propose a high-speed vision system that can be applied to real-time face tracking at 500 fps using GPU acceleration of a boosting-based face tracking algorithm. By assuming a small image displacement between frames, which is a property of high-frame rate vision, we develop an improved boosting-based face tracking algorithm for fast face tracking by enhancing the Viola–Jones face detector. In the improved algorithm, face detection can be efficiently accelerated by reducing the number of window searches for Haar-like features, and the tracked face pattern can be localized pixel-wise even when the window is sparsely scanned for a larger face pattern by introducing skin color extraction in the boosting-based face detector. The improved boosting-based face tracking algorithm is implemented on a GPU-based high-speed vision platform, and face tracking can be executed in real time at 500 fps for an 8-bit color image of 512 × 512 pixels. In order to verify the effectiveness of the developed face tracking system, we install it on a two-axis mechanical active vision system and perform several experiments for tracking face patterns.  相似文献   

18.
《Real》2001,7(2):203-217
This paper presents a VLSI architecture to implement the forward and inverse two dimensional Discrete Wavelet Transform (DWT), to compress medical images for storage and retrieval. Lossless compression is usually required in the medical image field. The word length required for lossless compression makes too expensive the area cost of the architectures that appear in the literature. Thus, there is a clear need for designing a cost-effective architecture to implement the lossless compression of medical images using DWT. The data path word length has been selected to ensure the lossless accuracy criteria leading a high speed implementation with small chip area. The pyramid algorithm is reorganized and the algorithm locality is improved in order to obtain an efficient hardware implementation. The result is a pipelined architecture that supports single chip implementation in VLSI technology. The implementation employs only one multiplier and 352 memory elements to compute all scales what results in a considerable smaller chip area (45 mm2) than former implementations. The hardware design has been captured by means of the VHDL language and simulated on data taken from random images. Implemented in a 0.7 μm technology, it can compute both the forward and inverse DWT at a rate of 3.5 512×512 12 bit images/s corresponding to a clock speed of 33 MHz. This chip is the core of a PCI board that will speedup the DWT computation on desktop computers.  相似文献   

19.
Compute unified device architecture (CUDA) is a software development platform that allows us to run C-like programs on the nVIDIA graphics processing unit (GPU). This paper presents an acceleration method for cone beam reconstruction using CUDA compatible GPUs. The proposed method accelerates the Feldkamp, Davis, and Kress (FDK) algorithm using three techniques: (1) off-chip memory access reduction for saving the memory bandwidth; (2) loop unrolling for hiding the memory latency; and (3) multithreading for exploiting multiple GPUs. We describe how these techniques can be incorporated into the reconstruction code. We also show an analytical model to understand the reconstruction performance on multi-GPU environments. Experimental results show that the proposed method runs at 83% of the theoretical memory bandwidth, achieving a throughput of 64.3 projections per second (pps) for reconstruction of 5123-voxel volume from 360 5122-pixel projections. This performance is 41% higher than the previous CUDA-based method and is 24 times faster than a CPU-based method optimized by vector intrinsics. Some detailed analyses are also presented to understand how effectively the acceleration techniques increase the reconstruction performance of a naive method. We also demonstrate out-of-core reconstruction for large-scale datasets, up to 10243-voxel volume.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号