首页 | 本学科首页   官方微博 | 高级检索  
     


Design and evaluation of a hierarchical decoupled architecture
Authors:Won W Ro  Stephen P Crago  Alvin M Despain  Jean-Luc Gaudiot
Affiliation:(1) Department of Electrical and Computer Engineering, California State University, Northridge;(2) Information Sciences Institute-East, University of Southern California, California;(3) Department of Electrical Engineering, University of Southern California, California;(4) Department of Electrical Engineering and Computer Science, University of California, Irvine
Abstract:The speed gap between processor and main memory is the major performance bottleneck of modern computer systems. As a result, today's microprocessors suffer from frequent cache misses and lose many CPU cycles due to pipeline stalling. Although traditional data prefetching methods considerably reduce the number of cache misses, most of them strongly rely on the predictability for future accesses and often fail when memory accesses do not contain much locality. To solve the long latency problem of current memory systems, this paper presents the design and evaluation of our high-performance decoupled architecture, the HiDISC (Hierarchical Decoupled Instruction Stream Computer). The motivation for the design originated from the traditional decoupled architecture concept and its limits. The HiDISC approach implements an additional prefetching processor on top of a traditional access/execute architecture. Our design aims at providing low memory access latency by separating and decoupling otherwise sequential pieces of code into three streams and executing each stream on three dedicated processors. The three streams act in concert to mask the long access latencies by providing the necessary data to the upper level on time. This is achieved by separating the access-related instructions from the main computation and running them early enough on the two dedicated processors. Detailed hardware design and performance evaluation are performed with development of an architectural simulator and compiling tools. Our performance results show that the proposed HiDISC model reduces 19.7% of the cache misses and improves the overall IPC (Instructions Per Cycle) by 15.8%. With a slower memory model assuming 200 CPU cycles as memory access latency, our HiDISC improves the performance by 17.2%.
Keywords:Decoupled architectures  Memory latency hiding  Multithreading  Parallel architecture  Instruction level parallelism  Data prefetching  Speculative execution
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号