首页 | 本学科首页   官方微博 | 高级检索  
     

面向深度学习的分布式任务执行系统
引用本文:高国樑,陈雷放,刘一鸣.面向深度学习的分布式任务执行系统[J].计算机系统应用,2021,30(7):80-86.
作者姓名:高国樑  陈雷放  刘一鸣
作者单位:中国石油大学(华东) 计算机科学与技术学院, 青岛 266580;青岛农业大学 理学与信息科学学院, 青岛 266109;华北电力大学(保定) 电气与电子工程学院, 保定 071003
摘    要:深度学习全流程托管平台提供了深度学习实验任务的网页端解决方案, 加速了深度学习技术在生产生活中的应用. 为了解决网页端深度学习平台进行图像识别模型训练的问题, 本文设计实现了面向深度学习实验任务的分布式任务执行系统. 系统由资源监控、任务调度、任务执行、日志管理4大模块组成, 将任务依据资源使用率等策略进行调度, 采用Docker容器技术进行执行, 并对产生的日志信息进行了实时收集. 经过测试, 分布式任务执行系统不仅保证了正常的功能需求, 在可靠性、稳定性等指标上也都达到了预期的要求, 将其集成到平台中可减少20%左右的训练时间.

关 键 词:分布式  任务调度  任务执行  日志  资源监控
收稿时间:2020/11/3 0:00:00
修稿时间:2020/12/2 0:00:00

Distributed Task Execution System for Deep Learning
GAO Guo-Liang,CHEN Lei-Fang,LIU Yi-Ming.Distributed Task Execution System for Deep Learning[J].Computer Systems& Applications,2021,30(7):80-86.
Authors:GAO Guo-Liang  CHEN Lei-Fang  LIU Yi-Ming
Affiliation:College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China;College of Science and Information, Qingdao Agricultural University, Qingdao 266109, China; School of Electrical and Electronic Engineering, North China Electric Power University (Baoding), Baoding 071003, China
Abstract:The whole lifecycle hosting platform of deep learning offers a web solution to experimental tasks and boosts the application of deep learning technology in production and life. To address the problem of training image recognition models by the platform, this study designs and implements a distributed task execution system for experimental tasks. The system is composed of modules for resource monitoring, task scheduling, task execution, and log management. It schedules tasks according to indicators, such as resource utilization, executes tasks in Docker containers and collects generated log data in real time. The test results demonstrate that the system fulfils the normal functional requirements, achieving the desired targets regarding reliability and stability while reducing about 20% of training time after being integrated into the deep learning platform.
Keywords:distribution  task scheduling  task execution  log  resource monitoring
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号