首页 | 本学科首页   官方微博 | 高级检索  
     

面向大规模二部图的分布式Tip分解算法
引用本文:周旭,翁同峰,杨志邦,李博仁,张吉,李肯立.面向大规模二部图的分布式Tip分解算法[J].软件学报,2022,33(3):1043-1056.
作者姓名:周旭  翁同峰  杨志邦  李博仁  张吉  李肯立
作者单位:湖南大学 信息科学与工程学院, 湖南 长沙 410082;之江实验室, 浙江 杭州 311100;湖南大学 信息科学与工程学院, 湖南 长沙 410082;国家超级计算长沙中心, 湖南 长沙 410082
基金项目:国家自然科学基金(61772182,61802032,69189338,62172146,62172157);之江实验室开放课题(2021KD0AB02);国防科技大学信息系统工程重点实验室基金
摘    要:Tip分解作为图数据管理领域的热点研究问题,已被广泛应用于文档聚类和垃圾邮件组检测等实际场景中.随着图数据规模的爆炸式增长,单机内存已无法满足其存储需求,亟需研究分布式环境下Tip分解技术.现有分布式图计算系统的通信模式无法适用于二部图,为此,首先提出一种基于中继的通信模式,以实现分布式环境下处理二部图时消息的有效传递;其次,提出分布式butterfly计数算法(DBC)和tip分解算法(DTD),特别地,为解决处理大规模二部图时DBC面临的内存溢出问题,提出了一种可控的并行顶点激活策略;最后,引入基于顶点优先级的消息剪枝策略和消息有效性剪枝策略,通过减少冗余通信和计算开销,进一步提高算法效率.实验平台部署于国家超算中心高性能分布式集群上,在多个真实数据集上的实验结果验证了所提算法的有效性和高效性.

关 键 词:二部图  butterfly计数  分布式系统  tip分解
收稿时间:2021/7/1 0:00:00
修稿时间:2021/7/31 0:00:00

Distributed Algorithm for Tip Decomposition on Large Bipartite Graphs
ZHOU Xu,WENG Tong-Feng,YANG Zhi-Bang,LI Bo-Ren,ZHANG Ji,LI Ken-Li.Distributed Algorithm for Tip Decomposition on Large Bipartite Graphs[J].Journal of Software,2022,33(3):1043-1056.
Authors:ZHOU Xu  WENG Tong-Feng  YANG Zhi-Bang  LI Bo-Ren  ZHANG Ji  LI Ken-Li
Affiliation:College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China;Zhejiang Lab, Zhejiang, Hangzhou 311100, China; College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China;National Supercomputing Center in Changsha, Changsha 410082, China
Abstract:Tip decomposition has a pivotal role in mining cohesive subgraph in bipartite graphs. It is a popular research topic with wide applications in document clustering, spam group detection, and analysis of affiliation networks. With the explosive growth of bipartite graph data scale in these scenarios, it is necessary to use distributed method to realize its effective storage. For this reason, this paper studies the problem of tip decomposition on a bipartite graph in the distributed environment for the first time. Firstly, a new relay based communication mode is proposed to realize effective message transmission when the given bipartite graph is decomposed in distributed environment. Secondly, the distributed butterfly counting algorithm (DBC) and tip decomposition algorithm (DTD) are designed. In particular, a controllable parallel vertex activation strategy is proposed to solve the problem of memory overflow when DBC decomposes large-scale bipartite graphs. Finally, the message pruning strategy based on vertex priority and message validity pruning strategy are introduced to further improve the efficiency of the algorithm by reducing redundant communication and computing overhead. The experiment is deployed on the high performance distributed computing platform of National Supercomputing Center. The effectiveness and efficiency of the proposed algorithms are verified by experiments on several real datasets.
Keywords:bipartite graph  butterfly counting  distributed systems  tip decomposition
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号