首页 | 本学科首页   官方微博 | 高级检索  
     

基于距离度量的多样性图排序方法
引用本文:李劲,岳昆,蔡娇,张志坚,刘惟一.基于距离度量的多样性图排序方法[J].软件学报,2018,29(3):599-613.
作者姓名:李劲  岳昆  蔡娇  张志坚  刘惟一
作者单位:云南大学软件学院, 云南昆明 650091;云南省软件工程重点实验室, 云南昆明 650091,云南大学信息学院, 云南昆明 650091,云南大学软件学院, 云南昆明 650091,云南大学信息学院, 云南昆明 650091,云南大学信息学院, 云南昆明 650091
基金项目:国家自然科学基金项目(61562091,61472345),第二批“云岭学者”培养项目(C6153001),云南省应用基础研究计划重点项目(2014FA023),云南省应用基础研究计划面上项目(2016FB110),云南大学中青年骨干教师培养计划项目、云南大学青年英才培育计划(WX173602),云南大学数据驱动的软件工程科技创新团队项目(2017HC012).
摘    要:有效结合查询相关性和多样性的扩展相关性是多样性图排序问题的一种优化目标.基于扩展相关性的多样性图排序可建模为一个子模函数优化问题,贪心子模优化算法可近似求解该问题.然而,扩展相关性不能直接度量节点间的不相似性.子模优化算法是串行算法不能充分利用诸如Spark等集群计算平台有效提高算法效率.针对这些问题,本文提出一种描述节点间不相似性的距离度量.基于此距离度量,将多样性图排序问题建模为一个在查询相关节点集上构造的带权完全图的最大和k-dispersion优化问题.提出了求解该问题的多项式时间2-近似算法.鉴于不同节点对的距离度量计算是相互独立的,进一步地提出了基于MapReduce编程模型的并行化多样性图排序算法.最后,在真实图数据集上验证了本文提出算法的高效性和有效性.

关 键 词:图数据  个性化PageRank  多样性图排序  最大和k-dispersion  MapReduce
收稿时间:2017/8/2 0:00:00
修稿时间:2017/9/5 0:00:00

Distance Metric Based Diversified Ranking on Large Graphs
LI Jin,YUE Kun,CAI Jiao,ZHANG Zhi-Jian and LIU Wei-Yi.Distance Metric Based Diversified Ranking on Large Graphs[J].Journal of Software,2018,29(3):599-613.
Authors:LI Jin  YUE Kun  CAI Jiao  ZHANG Zhi-Jian and LIU Wei-Yi
Affiliation:School of Software, Yunnan University, Kunming 650091, China;Key Laboratory of Software Engineering of Yunnan Province, Kunming 650091, China,School of Information Science and Engineering, Yunnan University, Kunming 650091, China,School of Software, Yunnan University, Kunming 650091, China,School of Information Science and Engineering, Yunnan University, Kunming 650091, China and School of Information Science and Engineering, Yunnan University, Kunming 650091, China
Abstract:Expansion relevance which combines both relevance and diversity into a single function is resorted to a submodular optimization objective and apply the classic cardinality constrained monotone submodular maximization to solve the issue. However, expansion relevance do not directly capture the dis-similarity over a pair of nodes. Existing submodular algorithms are sequential and not easy to take full advantage of the power of distributed cluster computing platform, such as Spark, to significantly promote the efficiency of algorithm. To this end, in this paper, a distance metric, which is defined by a sum function of personalized PageRank scores over the symmetry difference of neighbors of a pair of nodes, is firstly introduced to capture the pairwise dis-similarity over pairs of nodes. Then, the problem of diversified ranking on graphs is formulated as a max-sum k-dispersion problem with metrical edge weight. A polynomial time 2-approximate algorithms are proposed to solve the problem. Considering the computational independence of different pairs of nodes, a MapReduce algorithm is further proposed to solve our problem. Finally, extensive experiments are conducted on real network datasets to verify the effectiveness and efficiency of our proposed algorithm.
Keywords:graph data  personalized PageRank  diversified graph ranking  max-sum k-dispersion  MapReduce
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号