首页 | 本学科首页   官方微博 | 高级检索  
     


A phase adaptive cache hierarchy for SMT processors
Authors:Sonia López  Óscar Garnica  David H Albonesi  Steven Dropsho  Juan Lanchares  José I Hidalgo[Author vitae]
Affiliation:aDepartment of Computer Engineering, Rochester Institute of Technology, Rochester, NY, USA;bComputer Systems Laboratory, Cornell University, Ithaca, NY, USA;cGoogle Inc., Zurich, Switzerland;dDepartamento de Arquitectura de Computadores y Automática, Universidad Complutense de Madrid, Spain
Abstract:Resizable caches can trade-off capacity for access speed to dynamically match the needs of the workload. In single-threaded cores, resizable caches have demonstrated their ability to improve processor performance by adapting to the phases of the running application. In Simultaneous Multi-Threaded (SMT) cores, the caching needs can vary greatly across the number of threads and their characteristics, thus, offering even more opportunities to dynamically adjust cache resources to the workload.In this paper, we demonstrate that the preferred control methodology for data cache reconfiguring in a SMT core changes as the number of running threads increases. In workloads with one or two threads, the resizable cache control algorithm should optimize for cache miss behavior because misses typically form the critical path. In contrast, with several independent threads running, we show that optimizing for cache hit behavior has more impact, since large SMT workloads have other threads to run during a cache miss. Moreover, we demonstrate that these seemingly diametrically opposed policies are closely related mathematically; the former minimizes the arithmetic mean cache access time (which we will call AMAT), while the latter minimizes its harmonic mean. We introduce an algorithm (HAMAT) that smoothly and naturally adjusts between the two strategies with the degree of multi-threading.We extend a previously proposed Globally Asynchronous, Locally Synchronous (GALS) processor core with SMT support and dynamically resizable caches. We show that the HAMAT algorithm significantly outperforms the AMAT algorithm on four-thread workloads while matching its performance on one and two thread workloads. Moreover, HAMAT achieves overall performance improvements of 18.7%, 10.1%, and 14.2% on one, two, and four thread workloads, respectively, over the best fixed-configuration cache design.
Keywords:Adaptive caches  Reconfigurable caches  Cache memories  GALS  Simultaneous Multi-Threading
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号