面向大规模海洋数据同化算法的并行实现及优化 Parallel implementation and optimization of alarge scale ocean data assimilation algorithm期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

面向大规模海洋数据同化算法的并行实现及优化

引用本文：	万威强,肖俊敏,洪学海,谭光明. 面向大规模海洋数据同化算法的并行实现及优化[J]. 计算机工程与科学, 2019, 41(5): 765-772

作者姓名：	万威强肖俊敏洪学海谭光明

作者单位：	中国科学院计算技术研究所,北京,100190;中国科学院计算技术研究所,北京,100190;中国科学院计算技术研究所,北京,100190;中国科学院计算技术研究所,北京,100190

基金项目：	国家重点研发计划重点专项（2016YFC1401706）；国家自然科学基金（61802369）

摘要：	海洋数据同化是一种将海洋观测资料融合到海洋数值模式中的有效手段，经过同化的海洋数据更加接近海洋的真实情况，对人类理解和认识海洋具有重要意义。围绕海洋数据同化设计了一种基于区域分解的一般性并行实现方法。在此基础上，提出了一种基于IO代理的新并行算法。首先，IO代理进程负责数据的并行读取；接下来，IO代理进程对数据进行切块，然后将块数据发送给相应的计算进程；当计算进程完成局部数据同化后，IO代理进程负责收集计算进程的同化结果，并将其写入磁盘。该方法的主要优势在于：利用IO代理进程来负责IO，而不是像传统方法那样让所有进程都来参与IO（直接并行IO），这样可以防止大量进程对磁盘的同时访问，有效避免进程排队所导致的等待。在天河二号集群上的测试结果表明，对于1度分辨率的数据同化，在核心数为425时，该并行实现的总运行时间为9.1 s，相对于传统串行程序的加速比接近38倍。此外，对于0.1度分辨率的数据同化，基于IO代理的并行同化算法在使用10 000核时依然具有较好的可扩展性，并且可将其IO时间最大限制在直接并行IO时间的1/9。
关键词：	海洋数据同化集合最优插值区域分解 IO代理结点
收稿时间：	2018-10-08
修稿时间：	2019-05-25
Parallel implementation and optimization of alarge scale ocean data assimilation algorithm

WAN Wei qiang,XIAO Jun min,HONG Xue hai,TAN Guang ming. Parallel implementation and optimization of alarge scale ocean data assimilation algorithm[J]. Computer Engineering & Science, 2019, 41(5): 765-772

Authors:	WAN Wei qiang XIAO Jun min HONG Xue hai TAN Guang ming

Affiliation:	（Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China）

Abstract:	Ocean data assimilation is an effective method to integrate ocean observation data into the ocean numerical model. Assimilated ocean data is closer to the real situation of the ocean, so it is of great significance for human to understand and study the ocean. We design a general parallel implementation method for ocean data assimilation based on the domain decomposition strategy. We further propose a new parallel algorithm based on IO proxy. Firstly, IO proxy processes are in charge of parallel reading of data. Then, they split data into many blocks, and send different blocks to corresponding computation processes. After completion of local data assimilation, IO proxy processes collect local assimilation results from computation processes, and write them into the disk. The main advantage of this parallel method is that IO proxy processes takes charge of IO, rather than allowing all processes to participate in IO (direct parallel IO). This can prevent a large number of processes from accessing the disk simultaneously, thus effectively avoiding the waiting caused by processes queuing. Test results based on Tianhe 2 clusters show that, for the assimilation of data with 1 degree resolution, when there are 425 cores, the total running time of the proposed parallel implementation is 9.1s, which is nearly 38 times faster than that of traditional serial programs. In addition, for the assimilation of data with 0.1 degree resolution, the parallel assimilation algorithm using IO proxy still has a good scalability on 10,000 cores, and its IO time can be limited to at most 1/9 of the direct parallel IO time.

Keywords:	ocean data assimilation ensemble optimal interpolation (EnOI) domain decomposition IO proxy node
本文献已被万方数据等数据库收录！
	点击此处可从《计算机工程与科学》浏览原始摘要信息
	点击此处可从《计算机工程与科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏