首页 | 本学科首页   官方微博 | 高级检索  
     

可恢复的软件DSM系统JIACKPT
引用本文:章隆兵,张福新,胡伟武,唐志敏.可恢复的软件DSM系统JIACKPT[J].软件学报,2005,16(2):165-173.
作者姓名:章隆兵  张福新  胡伟武  唐志敏
作者单位:中国科学院,计算技术研究所,北京,100080
基金项目:Supported by the National Natural Science Foundation of China under Grant No.60303016(国家自然科学基金)
摘    要:软件DSM(distributed shared memory)系统在机群上构造了共享存储编程环境,结合了共享存储的易编程性和机群的可扩展性,引起了广泛的研究.由于软件DSM系统是一个分布式系统,系统失败风险大,需要实现容错技术以促进其实用化.利用用户级检查点技术,在支持域存储一致模型的软件DSM系统JIAJIA的基础上,设计并实现了一个可恢复的高可移植的软件DSM系统JIACKPT(JIAjia with ChecKPoinTing).由于采用适合软件DSM系统的强全局一致状态以及多种优化措施,JIACKPT易于实现且获得很好的性能.在一个8节点的PC机群上的应用测试表明,即使每分钟做一次检查点,大部分应用的检查点开销也小于10%.此外,JIACKPT还具有高可移植性.这些都表明JIACKPT已经成为一个比较实用的系统.

关 键 词:软件DSM系统  检查点  全局一致状态  JIAJIA
收稿时间:7/7/2004 12:00:00 AM
修稿时间:2004年7月7日

JIACKPT: A Recoverable Software Distributed Shared Memory System
ZHANG Long-Bing,ZHANG Fu-Xin,HU Wei-Wu and TANG Zhi-Min.JIACKPT: A Recoverable Software Distributed Shared Memory System[J].Journal of Software,2005,16(2):165-173.
Authors:ZHANG Long-Bing  ZHANG Fu-Xin  HU Wei-Wu and TANG Zhi-Min
Abstract:Software distributed shared memory (DSM) system has constructed a virtual shared memory abstract on cluster, which combines the programmability of shared memory and fine scalability of cluster. So it is widely studied. Software DSM system is easy to fail because it is a distributed system, some kinds of fault tolerance are necessary for it to be more practical. A recoverable and portable software DSM system, JIACKPT (JIAjia with ChecKPoinTing), has been designed and implemented to tolerate the fault of system. JIACKPT, based on JIAJIA, has adopted the checkpointing technology. By maintaining the strict global consistent state and using some optimization techniques, JIACKPT has gotten high performance. The experimental results on an 8-node PC cluster show that the checkpoint overhead is less than 10% of the whole execution time when checkpoint is done once per minute. JIACKPT also has good portability and can run on several operating systems, such as Linux, Solaris, etc. JIACKPT is a practical recoverable software DSM system.
Keywords:software distributed shared memory system  checkpoint  global consistent state  JIAJIA
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号