面向监听一致性协议的并发内存竞争记录算法 A Concurrent Memory Race Recording Algorithm for Snoop-Based Coherence期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

面向监听一致性协议的并发内存竞争记录算法

引用本文：	朱素霞, 陈德运, 季振洲, 孙广路, 张浩. 面向监听一致性协议的并发内存竞争记录算法[J]. 计算机研究与发展, 2016, 53(6): 1238-1248. DOI: 10.7544/issn1000-1239.2016.20150100

作者姓名：	朱素霞陈德运季振洲孙广路张浩

作者单位：	1.¹(哈尔滨理工大学计算机科学与技术学院博士后流动站哈尔滨 150080);2.²(哈尔滨理工大学计算机科学与技术学院哈尔滨 150080);3.³(哈尔滨工业大学计算机科学与技术学院哈尔滨 150001);4.⁴(中国科学院计算技术研究所北京 100190) (zhusuxia@hrbust.edu.cn)

基金项目：	国家自然科学青年基金项目(61502123)；国家自然科学基金项目(61173024)；国家“九七三”重点基础研究发展计划基金项目(2011CB302501)；黑龙江省青年科学基金项目(QC2015084)；中国博士后科学基金项目(2015M571429)

摘要：	内存竞争记录是解决多核程序执行不确定性的关键技术，然而现有点到点的内存竞争记录机制带来的硬件开销大，难以应用到实际的片上多核处理器系统中.以降低点到点内存竞争记录方式的硬件开销为出发点，为采用监听一致性协议的片上多核处理器(chip multiprocessor, CMP)系统设计了基于并发记录策略的点到点内存竞争记录算法.该记录算法将两两线程间点到点的内存竞争关系扩展到所有线程，采用分布式记录方法为每个线程记录一个由内存竞争关系的一方构成的内存竞争日志；重演时采用简化的生产者消费者模型，确保了确定性重演的实现，有效降低了硬件消耗和带宽开销.在8核处理器系统中的仿真结果表明，该并发式点到点内存竞争记录算法为每个处理器核添加硬件资源约171B，每千条内存操作指令记录日志大小约2.3B，记录和重演阶段均添加不到1.5%的带宽开销.
关键词：	片上多核处理器多核程序确定性重演内存竞争记录内存冲突检测监听一致性协议
A Concurrent Memory Race Recording Algorithm for Snoop-Based Coherence

Zhu Suxia, Chen Deyun, Ji Zhenzhou, Sun Guanglu, Zhang Hao. A Concurrent Memory Race Recording Algorithm for Snoop-Based Coherence[J]. Journal of Computer Research and Development, 2016, 53(6): 1238-1248. DOI: 10.7544/issn1000-1239.2016.20150100

Authors:	Zhu Suxia Chen Deyun Ji Zhenzhou Sun Guanglu Zhang Hao

Affiliation:	1.¹(Postdoctoral Research Station, School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080);2.²(School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080);3.³(School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001);4.⁴(Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190)

Abstract:	Memory race record-replay is an important technology to resolve the nondeterminism of multi-core programs. Because of high hardware overhead, the existing memory race recorders based on point-to-point logging approach are difficult to be applied to the practical modern chip multiprocessors. In order to reduce the hardware overhead of point-to-point logging approach, a novel memory race recording algorithm implemented in concurrent logging strategy for chip multiprocessors adopting snoop-based cache coherence protocol is proposed. This algorithm records the current execution points of all threads concurrently when detecting a memory conflict. It extends the point-to-point memory race relationship between two threads to all threads in recording phase, reducing hardware overhead significantly. It also uses distributed logging mechanism to record memory races to reduce bandwidth overhead effectively in the premise of not increasing the memory race log. When replaying, this algorithm uses a simplified producer-consumer model and introduces a counting semaphore for each processor core to ensure deterministic replay, improving replay speed and reducing coherence bandwidth overhead. The simulation results on 8-core chip multiprocessor (CMP) system show that this concurrent recording algorithm based on point-to-point logging approach adds about 171B hardware for each processor, and records about 2.3B log per thousand memory instructions and adds less than 1.5% additional interconnection bandwidth overhead.

Keywords:	chip multiprocessor (CMP) multi-core program deterministic replay memory race recording memory conflict detection snoop-based coherence protocol

	点击此处可从《计算机研究与发展》浏览原始摘要信息
	点击此处可从《计算机研究与发展》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏