首页 | 本学科首页   官方微博 | 高级检索  
     

面向Lustre集群存储的应用日志分析及系统自动优化框架
引用本文:程稳,李焱,曾令仿,王芳,唐士程,杨力平,冯丹,曾文君.面向Lustre集群存储的应用日志分析及系统自动优化框架[J].计算机工程与科学,2022,44(4):594-604.
作者姓名:程稳  李焱  曾令仿  王芳  唐士程  杨力平  冯丹  曾文君
作者单位:(1.华中科技大学武汉光电国家研究中心, 信息存储系统教育部重点实验室暨数据存储系统与技术教育部工程研究中心,湖北 武汉 430074; 2.深圳国家基因库,广东 深圳 518120;3.之江实验室,浙江 杭州 311121)
基金项目:浙江省万人计划;国家自然科学基金;创新研究群体项目;之江实验室自设科研项目
摘    要:在科学计算、大数据处理和人工智能等领域,对相关应用负载进行研究,分析负载I/O模式,揭示应用负载变迁规律等,对指导集群存储系统性能优化十分重要。当前应用种类繁多并且应用快速迭代更新,复杂的环境使得对应用负载的特性挖掘充满挑战。针对以上问题,在生产环境中收集了5个Lustre集群存储共计326天的应用日志信息,对应用负载的访问、负载特性进行了深入的探究与分析,并对已有观察进行了验证和补充。通过对应用日志信息横向、纵向和多维度对比分析与信息挖掘,总结了4个发现,并研究相关发现与以往工作的关联性,结合实际生产环境,给出了相应的系统优化策略与切实可行的实施方案,为用户、维护人员、上层应用开发者和多层存储系统设计等人员提供了相关参考与建议。同时,针对实际应用环境复杂、系统优化工作耗时费力等问题,设计并实现了一种系统自动优化框架(SAOF),SAOF可为指定应用负载提供资源预留、带宽限定等功能,初步测试表明,SAOF能根据系统资源与任务负载需求为不同任务提供自动化的QoS保障。

关 键 词:Lustre文件系统  日志分析  系统优化  服务质量  资源管理  
收稿时间:2021-08-13
修稿时间:2021-11-11

An application log analysis and system automation optimization framework for Lustre cluster storage
CHENG Wen,LI Yan,ZENG Ling-fang,WANG Fang,TANG Shi-cheng,YANG Li-ping,FENG Dan,ZENG Wen-jun.An application log analysis and system automation optimization framework for Lustre cluster storage[J].Computer Engineering & Science,2022,44(4):594-604.
Authors:CHENG Wen  LI Yan  ZENG Ling-fang  WANG Fang  TANG Shi-cheng  YANG Li-ping  FENG Dan  ZENG Wen-jun
Affiliation:(1.Wuhan National Laboratory for Optoelectronics,Huazhong University of Science and Technology, Key Laboratory of Information Storage System,Engineering Research Center of Data Storage Systems and Technology, Ministry of Education of China,Wuhan 430074; 2.China National GeneBank,BGI-Shenzhen,Shenzhen 518120; 3.Zhejiang Lab, Hangzhou 311121,China)
Abstract:In the fields of scientific computing, big data processing, and artificial intelligence, it is very important to study the relevant application load, analyze the load I/O pattern to reveal the application load change law, etc., which is very important to guide the performance optimization of the cluster storage system. At present, there are many kinds of applications and the applications are updated rapidly and iteratively. The complex environment makes the feature mining of application load full of challenges. To address the above problems, we collected the application log information of five Lustre cluster storages in the production environment for 326 days, explored and analyzed the access and load characteristics of the application load, and verified and supplemented the existing observations. Through horizontal, vertical, and multi-dimensional comparative analysis and information mining of the application log information, we summarize four findings, explore the relationship between the relevant findings and previous research work, and then combine the actual production environment with the corresponding system optimization strategies. Feasible implementation schemes are given, which provide relevant references and suggestions for users, maintainers, upper application developers, multi-tier storage system designers, and other personnel. At the same time, because of the complex practical application environment and time-consuming work of system optimization, a system automation optimization framework (SAOF) is designed and implemented. SAOF can provide functions such as resource reservation and bandwidth limitation for specified application loads. Preliminary tests show that SAOF can provide automatic QoS guarantees for different tasks according to system resources and task load requirements.
Keywords:Lustre file system  log analysis  system optimization  quality of service(QoS)  resource management  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号