首页 | 本学科首页   官方微博 | 高级检索  
     


Merging VLIW and vector processing techniques for a simple,high-performance processor architecture
Affiliation:1. Computer Science and Information Department, Community College, Taibah University, Al-Madinah Al-Munawwarah 2898, Saudi Arabia;2. Computer and System Section, Electrical Engineering Department, Faculty of Engineering, Aswan University, Aswan 81542, Egypt;1. Council for Scientific and Industrial Research, Meiring Naudé Road, Brummeria, Pretoria 0184, South Africa;2. Carl and Emily Fuchs Institute for Microelectronics, Dept. of Electrical, Electronic and Computer Engineering, University of Pretoria, Cnr Lynnwood and University Roads, Pretoria 0002, South Africa;3. Faculty of Engineering and the Built Environment, University of Johannesburg, Auckland Park Kingsway Campus, Auckland Park 2006, South Africa;1. School of Computing and Electrical Engineering, Indian Institute of Technology (IIT) Mandi, Mandi, Himachal Pradesh 175001, India.;2. Electronics and Communication Engineering, Indian Institute of Technology (IIT) Roorkee, Roorkee, Uttarakhand, India.;1. University of Mons (UMONS), Belgium;2. IUMA, Universidad de Las Palmas de Gran Canaria (ULPGC), Spain
Abstract:This paper proposes a new processor architecture called VVSHP for accelerating data-parallel applications, which are growing in importance and demanding increased performance from hardware. VVSHP merges VLIW and vector processing techniques for a simple, high-performance processor architecture. One key point of VVSHP is the execution of multiple scalar instructions within VLIW and vector instructions on unified parallel execution datapaths. Another key point is to reduce the complexity of VVSHP by designing a two-part register file: (1) shared scalar–vector part with eight-read/four-write ports 64×32-bit registers (64 scalar or 16×4 vector registers) for storing scalar/vector data and (2) vector part with two-read/one-write ports 48 vector-registers, each stores 4×32-bit vector data. Moreover, processing vector data with lengths varying from 1 to 256 represents a key point for reducing the loop overheads. VVSHP can issue up to four scalar/vector operations in each cycle for parallel processing a set of operands and producing up to four results to be written back into VVSHP register file. However, it cannot issue more than one memory operation at a time, which loads/stores 128-bit scalar/vector data from/to data memory. The design of our proposed VVSHP processor is implemented using VHDL targeting the Xilinx FPGA Virtex-5 and its performance is evaluated.
Keywords:Data-parallel applications  VLIW  Vector processing  VHDL  Performance evaluation
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号