首页 | 本学科首页   官方微博 | 高级检索  
     

面向机器学习的分布式并行计算关键技术及应用
引用本文:曹嵘晖,,唐卓,,左知微,,张学东,.面向机器学习的分布式并行计算关键技术及应用[J].智能系统学报,2021,16(5):919-930.
作者姓名:曹嵘晖    唐卓    左知微    张学东  
作者单位:1. 湖南大学 信息科学与工程学院, 湖南 长沙 410082;2. 国家超级计算长沙中心, 湖南 长沙 410082
摘    要:当前机器学习等算法的计算、迭代过程日趋复杂, 充足的算力是保障人工智能应用落地效果的关键。本文首先提出一种适应倾斜数据的分布式异构环境下的任务时空调度算法,有效提升机器学习模型训练等任务的平均效率;其次,提出分布式异构环境下高效的资源管理系统与节能调度算法,实现分布式异构环境下基于动态预测的跨域计算资源迁移及电压/频率的动态调节,节省了系统的整体能耗;然后构建了适应于机器学习/深度学习算法迭代的分布式异构优化环境,提出了面向机器学习/图迭代算法的分布式并行优化基本方法。最后,本文研发了面向领域应用的智能分析系统,并在制造、交通、教育、医疗等领域推广应用,解决了在高效数据采集、存储、清洗、融合与智能分析等过程中普遍存在的性能瓶颈问题。

关 键 词:机器学习  分布式计算  倾斜数据  任务时空调度  资源管理  节能调度  跨域资源迁移  并行优化  图迭代算法  智能分析系统

Key technologies and applications of distributed parallel computing for machine learning
CAO Ronghui,,TANG Zhuo,,ZUO Zhiwei,,ZHANG Xuedong,.Key technologies and applications of distributed parallel computing for machine learning[J].CAAL Transactions on Intelligent Systems,2021,16(5):919-930.
Authors:CAO Ronghui    TANG Zhuo    ZUO Zhiwei    ZHANG Xuedong  
Affiliation:1. College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China;2. National Supercomputer Center in Changsha, Changsha 410082, China
Abstract:At present, the calculation and iteration process of algorithms such as machine learning is becoming more and more complex. Sufficient computational power is the key to ensure the landing effect of artificial intelligence application. In view of this, this paper first puts forward a task space-time scheduling algorithm adapted to the distributed heterogeneous environment of skew data, which effectively improves the average efficiency of tasks such as machine learning model training. Then, the high-efficiency resource management system and energy-saving scheduling algorithm in distributed heterogeneous environment are proposed to realize the dynamic prediction based cross-domain computing resource migration and voltage/frequency dynamic regulation in distributed heterogeneous environment, which saves the overall energy consumption of the system, and then, the distributed heterogeneous optimization environment adapted to the iteration of machine learning/deep learning algorithm is constructed, and the basic method of distributed parallel optimization for machine learning/graph iteration algorithm is proposed. Finally, the intelligent analysis system for field-oriented applications is researched and developed, and popularized in manufacturing, transportation, education, medical and other fields, which solves the performance bottleneck problems that are common in the process of high-efficiency data collection, storage, cleaning, fusion and intelligent analysis.
Keywords:machine learning  distributed computing  skew data  task space-time scheduling  resource management  energy-saving scheduling  cross-domain resource migration  parallel optimization  graph iteration algorithm  intelligent analysis system
点击此处可从《智能系统学报》浏览原始摘要信息
点击此处可从《智能系统学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号