首页 | 本学科首页   官方微博 | 高级检索  
     


Reducing 3D Fast Wavelet Transform Execution Time Using Blocking and the Streaming SIMD Extensions
Authors:Gregorio Bernabé  , José   M. Garcí  a  José   Gonzá  lez
Affiliation:(1) Dpto. Ingeniería y Tecnología de Computadores, Universidad de Murcia, 30071 Murcia, Spain;(2) Intel Barcelona Research Center, Intel Labs, 08034 Barcelona, Spain
Abstract:The video compression algorithms based on the 3D wavelet transform obtain excellent compression rates at the expense of huge memory requirements, that drastically affects the execution time of such applications. Its objective is to allow the real-time video compression based on the 3D fast wavelet transform. We show the hardware and software interaction for this multimedia application on a general-purpose processor. First, we mitigate the memory problem by exploiting the memory hierarchy of the processor using several techniques. As for instance, we implement and evaluate the blocking technique. We present two blocking approaches in particular: cube and rectangular, both of which differ in the way the original working set is divided. We also put forward the reuse of previous computations in order to decrease the number of memory accesses and floating point operations. Afterwards, we present several optimizations that cannot be applied by the compiler due to the characteristics of the algorithm. On the one hand, the Streaming SIMD Extensions (SSE) are used for some of the dimensions of the sequence (y and time), to reduce the number of floating point instructions, exploiting Data Level Parallelism. Then, we apply loop unrolling and data prefetching to specific parts of the code. On the other hand, the algorithm is vectorized by columns, allowing the use of SIMD instructions for the y dimension. Results show speedups of 5x in the execution time over a version compiled with the maximum optimizations of the Intel C/C++ compiler, maintaining the compression ratio and the video quality (PSNR) of the original encoder based on the 3D wavelet transform. Our experiments also show that, allowing the compiler to perform some of these optimizations (i.e. automatic code vectorization), causes performance slowdown, demonstrating the effectiveness of our optimizations.Special Issue on Media and Communication Applications on General Purpose Processors: Hardware and Software Issues/Journal of VLSI Signal Processing Systems/Dr. Eric Debes, (Lead) Guest Editor. Contact Author: Gregorio Bernabé.Gregorio Bernabé was born in Antibes (Alpes Maritimos, France) on 21 November 1974. He received the M.S. in Computer Science from the University of Murcia (Spain) in 1997. In 1998, he joined the Computer Engineering Department of the University of Murcia, where he is an Assistant Professor as well as a Ph. D. candidate. His current research interests include video compression using the Wavelet Transform, and the development of optimizations to improve the performance of the video compression algorithms based on the 3D wavelet transform.Jose M. Garcia was born in Valencia, Spain on 9 January, 1962. He received the MS and the PhD degrees in electrical engineering from the Technical University of Valencia (Valencia, Spain), in 1987 and 1991, respectively. In 1987 he joined the Computer Science Department at the University of Castilla-La Mancha at the Campus of Albacete (Spain). From 1987 to 1993, he was an Assistant Professor of Computer Architecture. In 1994 he became an Associate Professor at the University of Murcia (Spain). From 1995 to 1997 he served as Vice-Dean of the School of Computer Science. At present, he is the Director of the Computer Engineering Department, and also the Head of the Research Group on Parallel Computing and Architecture. He has developed several courses on Computer Structure, Peripheral Devices, Computer Architecture and Multicomputer Design. His current research interests include Multiprocessors Systems, Interconnection Networks, File Systems, Grid Computing and its Application in Multimedia Systems. He has published over 45 refereed papers in different Journals and Conferences in these fields. Dr. Garcia is a member of several international associations as IEEE Computer Society, ACM, USENIX, and also a member of some European associations (Euromicro and ATI).Pepe Gonzalez received the M.S. and Ph.D. degrees from the Universitat Politecnica de Catalunya (UPC). In January 2000, he joined the Computer Engineering Department of the University of Murcia, Spain, and became an Associate Professor in June 2001. In March 2002, he joined the Intel Barcelona Research Center, where he is a Senior Researcher. Currently, Pepe is working in new paradigms for the IA-32 family, in particular, Thermal-and Power-Aware clustered microarchitectures. pepe.gonzalez@intel.com
Keywords:3D wavelet transform  video compression  blocking  reuse  Streaming SIMD extensions  vectorization
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号