首页 | 本学科首页   官方微博 | 高级检索  
     

融合BTM和图论的微博检索模型
引用本文:蔡晨,罗可.融合BTM和图论的微博检索模型[J].计算机工程与科学,2019,41(8):1512-1518.
作者姓名:蔡晨  罗可
作者单位:长沙理工大学计算机与通信工程学院,湖南长沙410114;长沙理工大学综合交通运输大数据智能处理湖南省重点实验室,湖南长沙410114;长沙理工大学计算机与通信工程学院,湖南长沙410114;长沙理工大学综合交通运输大数据智能处理湖南省重点实验室,湖南长沙410114
基金项目:国家自然科学基金(11671125,71371065)
摘    要:微博数据量庞大且微博文本的字符数少、特征稀疏,为提高检索精度,提出一种融合BTM和图论的微博检索模型,通过词汇语义相关度计算微博文本中带有标签的特征相关度,构建bi-term主题模型,用JSD距离计算映射到该模型中短文本的词对相关度,抽取CN-DBpedia中实体及图结构,再使用SimRank算法计算图结构中实体间的相关度。综上3种相关度为该模型最终相关度。最后使用新浪微博数据集进行检索实验,实验结果表明:对比于融合隐含狄利克雷分布算法与图论的检索模型和基于开放数据关联和图论方法系统模型,新模型在MAP、准确率和召回率上性能有明显提高,说明该模型具有较优的检索性能。

关 键 词:微博  短文本  相似度计算  BTM  图论  主题模型
收稿时间:2018-09-27
修稿时间:2019-08-25

A microblog retrieval model combining BTM and graph theory
CAI Chen,LUO Ke.A microblog retrieval model combining BTM and graph theory[J].Computer Engineering & Science,2019,41(8):1512-1518.
Authors:CAI Chen  LUO Ke
Affiliation:(1.School of Computer & Communication Engineering,Changsha University of Science and Technology,Changsha 410114; 2.Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology,Changsha 410114,China)
Abstract:Microblogs have a large amount of data but a few characters in the text, and their features are sparse. In order to improve the retrieval precision, we propose a microblog retrieval model combining BTM and graph theory. The lexical semantic correlation is used to calculate the correlation between features with labels in microblog text. Then we construct a bi-term topic model, use JSD distance to calculate the correlation of pair words in the short text that mapped to the model. Thirdly, we extract the entity and graph structure in CN-DBpedia, and then use the SimRank algorithm to calculate inter-entity correlation between graph structures. The above three correlations are the final correlation of the model. Finally, the Sina Weibo data set is used for the retrieval experiments. Experimental results show that compared with the retrieval model based on the combination of the implicit Dirichlet distribution algorithm and graph theory and the system model based on open data correlation and graph theory, the performance of the new model is significantly improved in MAP, accuracy and recall rate, indicating that the model has better retrieval performance.
Keywords:microblog  short text  similarity calculation  BTM  graph theory  topic model  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号