首页 | 本学科首页   官方微博 | 高级检索  
     

高性能计算机的可靠性技术现状与趋势
引用本文:黄永勤,金利峰,刘耀.高性能计算机的可靠性技术现状与趋势[J].计算机研究与发展,2010,47(4).
作者姓名:黄永勤  金利峰  刘耀
作者单位:江南计算技术研究所,江苏无锡,214083
基金项目:supported by the National Defense Pre-research Project Foundation under grant No.513160401
摘    要:随着高性能计算机系统性能的不断提升和硬件规模的不断扩大,如何实现系统的可靠运行,是高性能计算机尤其是P级计算机研制中面临的重要技术挑战.从高性能计算机对可靠性技术的需求出发,全面介绍了高性能计算机硬件设计中的可靠性技术现状,包括避错、静态冗余、动态冗余和在线替换等技术,详细分析了各种可靠性技术在典型机器中的应用情况;最后对高性能计算机可靠性技术的发展趋势进行了深入探讨,包括多核处理器的可靠性设计、全方位的内存防护技术和刀片式的冗余架构.

关 键 词:高性能计算机  可靠性  避错  容错  冗余  在线替换  

Current Situation and Trend of Reliability Technology in High Performance Computers
Huang Yongqin,Jin Lifeng, Liu Yao.Current Situation and Trend of Reliability Technology in High Performance Computers[J].Journal of Computer Research and Development,2010,47(4).
Authors:Huang Yongqin  Jin Lifeng    Liu Yao
Affiliation:Jiangnan Institute of Computing Technology;Wuxi;Jiangsu 214083
Abstract:As the system performance of high performance computers (HPC) becomes higher and higher and its hardware scale continuously increases,how to realize highly reliable operation of the system is a great challenge in tera-scale and peta-scale HPC research and development.Beginning with the requirement for high reliability technology from HPC,the authors completely introduce the present reliability technologies in HPC hardware design,such as fault avoidance,static redundancy,dynamic redundancy,and online replace...
Keywords:high performance computer  reliability  fault avoidance  fault tolerance  redundancy  on-line replacement  
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号