核分组的多核处理器优化方法 Grouping Cores for Chip Multiprocessors Optimization期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

核分组的多核处理器优化方法

引用本文：	李国红,汪东升,刘振宇,李崇民,刘根贤,郭三川.核分组的多核处理器优化方法[J].计算机科学与探索,2014(4):385-396.

作者姓名：	李国红汪东升刘振宇李崇民刘根贤郭三川

作者单位：	[1]清华大学计算机科学与技术系,北京100084 [2]清华大学信息科学与技术国家实验室,北京100084

基金项目：	The National Natural Science Foundation of China under Grant No. 60833004 （国家自然科学基金）, the National High Technology Research and Development Program of China under Grant No. 2012AA010905 （国家高技术研究发展计划（863计划））; the Cross Discipline Foundation of Tsinghua National Laboratory for Information Science and Technology （清华大学信息科学与技术国家实验室交叉学科基金）.

摘要：	随着多核处理器规模的扩大，请求数据的处理器核到数据的宿主节点之间的平均距离相应增大，并且数据访问在分布式共享高速缓存块中的分布并不均衡引起了网络热点。这些情况导致一级高速缓存缺失延迟的增大。为了解决该问题，将每四个处理器核分为一组，在组内设计邻近数据探测器。邻近数据探测器通过确定一次缺失能否在邻近核的一级高速缓存中得到数据，从而利用了并行程序在多核处理器上执行时数据访问的核间局部性。另外，根据新的结构相应优化了高速缓存一致性协议。实验表明，该片上存储优化方法提高了系统性能，减少了片上网络流量，节省了能耗。
关键词：	多核处理器高速缓存片上网络
Grouping Cores for Chip Multiprocessors Optimization

LI Guohong,WANG Dongsheng,LIU Zhenyu,LI Chongmin,LIU Genxian,GUO Sanchuan.Grouping Cores for Chip Multiprocessors Optimization[J].Journal of Frontier of Computer Science and Technology,2014(4):385-396.

Authors:	LI Guohong WANG Dongsheng LIU Zhenyu LI Chongmin LIU Genxian GUO Sanchuan

Affiliation:	1. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China ;2. Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China)

Abstract:	In chip multiprocessors （CMP）, as the number of cores increases, the average distance between the requestors and the home nodes becomes longer, and certain hot nodes are incurred by the unbalanced accesses to the different banks of the distributed share cache. These cases lead to the higher average latency of L1 cache misses. To conquer this problem, this paper divides the cores into groups of 2×2 nodes, and introduces the neighboring data prober （NDP）. By deciding if a miss can be served by the L1 cache of a neighbor node, NDP can leverage the node-level spatial locality of the data accesses of parallel programs. Also, this paper optimizes the coherence protocol for the new architecture. The evaluation results illustrate that the proposed cache optimization improves the performance, lowers the network traffic and saves energy.

Keywords:	chip multiprocessors cache network on chip
本文献已被 CNKI 维普等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏