首页 | 本学科首页   官方微博 | 高级检索  
     

云计算系统中基于伴随状态追踪的故障检测机制
引用本文:饶翔,王怀民,陈振邦,周扬帆,蔡华,周琦,孙廷韬.云计算系统中基于伴随状态追踪的故障检测机制[J].计算机学报,2012,35(5):856-870.
作者姓名:饶翔  王怀民  陈振邦  周扬帆  蔡华  周琦  孙廷韬
作者单位:1. 国防科学技术大学并行与分布处理国家重点实验室 长沙410073
2. 香港中文大学深圳研究院 深圳
3. 阿里巴巴云计算公司计算平台部 杭州310011
基金项目:国家"九七三"重点基础研究发展规划项目基金,国家自然科学基金,国家杰出青年科学基金
摘    要:在运行时检测分布式系统内所产生的故障需要事先获得故障特征模型.构造故障特征模型的常见做法为将故障注入系统并根据随后系统内所产生的特征症状(如异常事件日志)建模.已有建模方法通常使用从故障发生到给定时间窗口之内的特征症状.然而,根据真实系统观察,不同故障的传播影响时间相差很大,且故障特征会在故障传播过程中发生改变.因此,已有方法对检测时间窗口之后发的故障特征症状不能识别或会产生大量错误报警.为了解决此问题,文中提出一种基于故障注入测试的故障特征提取方法,该方法主要由3步组成:(1)过滤噪声日志;(2)构造1个故障识别器识别不同故障的早期特征;(3)为每类故障构造限状态追踪器追踪该故障的后期传播状态,从而在故障被识别出来后持续跟踪故障传播状态.通过在企业级云计算系统中进行实验验证,与已有方法相比该文方法具备更高的故障检测精确度.

关 键 词:事件日志  故障检测  故障注入  故障特征提取  云计算系统

Detecting Faults by Tracing Companion States in Cloud Computing Systems
RAO Xiang , WANG Huai-Min , CHEN Zhen-Bang , ZHOU Yang-Fan , CAI Hua , ZHOU Qi , SUN Ting-Tao.Detecting Faults by Tracing Companion States in Cloud Computing Systems[J].Chinese Journal of Computers,2012,35(5):856-870.
Authors:RAO Xiang  WANG Huai-Min  CHEN Zhen-Bang  ZHOU Yang-Fan  CAI Hua  ZHOU Qi  SUN Ting-Tao
Affiliation:1)(National Key Laboratory for Parallel and Distributed Processing,National University of Defense Technology,Changsha 410073) 2)(Shenzhen Research Institute,The Chinese University of Hong Kong,Shenzhen) 3)(Department of Computing Platform,Alibaba Cloud Computing Company,Hangzhou 310011)
Abstract:A common way to construct a fault model is injecting the fault into the system and observing the subsequent symptoms,e.g.event logs.However,fault features would vary during the propagation period,and present different symptoms at different stage of the fault propagation process.The exiting detection window based feature extraction methods can only identify the early symptoms of a fault,but fail to detect the latter symptoms and cause false alarms.To solve the problem,we present a fault feature extraction method,called Companion State Tracer(CSTracer),which consists of 3 integrated steps:(1) pre-process logs to remove the unrelated logs;(2) construct a general identifier for the early symptoms of a fault;(3) construct a finite state machine model for the fault to trace the latter symptoms.CSTracer can persistently monitor a fault after the fault has been identified.We have justified the effectiveness of CSTracer in an enterprise cloud system.Compared with the existing,the results show that CSTracer has a better detection accuracy.
Keywords:event log  fault detection  fault injection  fault feature creation  cloud computing systems
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号