首页 | 本学科首页   官方微博 | 高级检索  
     


High-performance cone beam reconstruction using CUDA compatible GPUs
Authors:Yusuke Okitsu  Fumihiko Ino  Kenichi Hagihara
Affiliation:1. School of Information Engineering, Nanchang Hangkong University, Nanchang, Jiangxi 330063, People’s Republic of China;2. School of Information Engineering, Nanchang Institute of Technology, Nanchang, Jiangxi 330099, People’s Republic of China;3. Wuhan Digital Engineering Research Institute, Wuhan 430074, People’s Republic of China;4. Institute for Pattern Recognition & Artificial Intelligence, Huazhong University of Science and Technology, Wuhan, Hubei 430074, People’s Republic of China;1. Department of Electronics and Communication Engineering, National Institute of Technology (NIT), Tiruchirappalli, India;2. Department of Electrical and Computer Engineering, University of Saskatchewan, Saskatoon, Canada;3. Department of Medical Imaging, Royal University Hospital, University of Saskatchewan, Saskatoon, Canada;4. Department of Anatomy and Cell Biology, College of Medicine, University of Saskatchewan, Saskatoon, Canada;1. Department of Computer Science and Technology, Suqian College, Suqian 223800, China;2. College of Computer, Nanjing University of Posts and Telecommunications, Nanjing 210003, China;1. Departamento de Física, FFCLRP, Universidade de São Paulo, 14040-901 Ribeirão Preto, SP, Brazil;2. Instituto de Física Gleb Wataghin, Universidade de Campinas, 13083-859 Campinas, SP, Brazil;3. Departamento Acadêmico de Física, Universidade Tecnológica Federal do Paraná, 80230-901 Curitiba, PR, Brazil;4. Faculdade de Odontologia de Ribeirão Preto, Universidade de São Paulo, 14040-901 Ribeirão Preto, SP, Brazil
Abstract:Compute unified device architecture (CUDA) is a software development platform that allows us to run C-like programs on the nVIDIA graphics processing unit (GPU). This paper presents an acceleration method for cone beam reconstruction using CUDA compatible GPUs. The proposed method accelerates the Feldkamp, Davis, and Kress (FDK) algorithm using three techniques: (1) off-chip memory access reduction for saving the memory bandwidth; (2) loop unrolling for hiding the memory latency; and (3) multithreading for exploiting multiple GPUs. We describe how these techniques can be incorporated into the reconstruction code. We also show an analytical model to understand the reconstruction performance on multi-GPU environments. Experimental results show that the proposed method runs at 83% of the theoretical memory bandwidth, achieving a throughput of 64.3 projections per second (pps) for reconstruction of 5123-voxel volume from 360 5122-pixel projections. This performance is 41% higher than the previous CUDA-based method and is 24 times faster than a CPU-based method optimized by vector intrinsics. Some detailed analyses are also presented to understand how effectively the acceleration techniques increase the reconstruction performance of a naive method. We also demonstrate out-of-core reconstruction for large-scale datasets, up to 10243-voxel volume.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号