首页 | 本学科首页   官方微博 | 高级检索  
     

对于大规模系统日志的日志模式提炼算法的优化
引用本文:赵一宁,肖海力.对于大规模系统日志的日志模式提炼算法的优化[J].计算机工程与科学,2017,39(5):821-828.
作者姓名:赵一宁  肖海力
作者单位:;1.中国科学院计算机网络信息中心
基金项目:国家重点研发计划项目(2016YFB0201404);十二五863重大项目(2014AA01A302)
摘    要:LARGE框架是部署在中国科学院超级计算环境中的日志分析系统,通过日志收集、集中分析、结果反馈等步骤对环境中的各种日志文件进行监控和分析。在对环境中系统日志的监控过程中,系统维护人员需要通过日志模式提炼算法将大量的过往系统日志记录缩减为少量的日志模式集合。然而随着日志规模的增长以及messages日志文件的特殊性,原有的日志模式提炼算法已经难以满足对大规模日志快速处理的需要。介绍了一种对于日志模式提炼算法的优化方法,通过引入MapReduce机制实现在存在多个日志输入文件的情况下对日志处理和模式提炼的流程进行加速。实验表明,当输入文件较多时,该优化方法能够显著提高词汇一致率算法的运行速度,大幅减少运行时间。此外,还对使用词汇转换函数时的算法运行时间和提炼效果进行了验证。

关 键 词:日志处理  MapReduce机制  大数据分析  网格环境
收稿时间:2017-01-05
修稿时间:2017-05-25

Optimization of the log pattern extraction algorithm for large-scale syslog files
ZHAO Yi-ning,XIAO Hai-li.Optimization of the log pattern extraction algorithm for large-scale syslog files[J].Computer Engineering & Science,2017,39(5):821-828.
Authors:ZHAO Yi-ning  XIAO Hai-li
Affiliation:(Computer Network Information Center,Chinese Academy of Sciences,Beijing 100190,China)
Abstract:The LARGE system is a log analysis framework deployed in the supercomputing environment in Chinese Academy of Sciences. It monitors and analyzes various log files in the environment through log collection, centrally analysis and result feedback. In the process of monitoring system logs, it is necessary for system maintenance personnel to reduce the large number of original logs into a small set of log patterns using the log pattern extraction algorithm. However, because of the fast increase of log size and the peculiarity of messages log files, the traditional log pattern extraction algorithm fails to satisfy the requirement of rapid processing of logs. We propose an optimization method for the log pattern extraction algorithm by introducing the idea of the MapReduce mechanism to accelerate the process of log pattern extraction in case of multiple input log files. Evaluation results show that when there are a number of input files, the optimization method can significantly improve the running speed of the vocabulary consistency algorithm and greatly reduce the running time. We also evaluate the time cost and the extraction effect the optimization algorithm when the vocabulary conversion function is used.
Keywords:log processing  MapReduce  big data analysis  grid environment  
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号