首页 | 本学科首页   官方微博 | 高级检索  
     

基于实时词共现网络的微博话题发现
引用本文:李亚星,王兆凯,冯旭鹏,刘利军,黄青松.基于实时词共现网络的微博话题发现[J].计算机应用,2016,36(5):1302-1306.
作者姓名:李亚星  王兆凯  冯旭鹏  刘利军  黄青松
作者单位:1. 昆明理工大学 信息工程与自动化学院, 昆明 650500;2. 昆明理工大学 教育技术与网络中心, 昆明 650500;3. 云南省计算机技术应用重点实验室(昆明理工大学), 昆明 650500
基金项目:国家自然科学基金资助项目(81360230)。
摘    要:针对微博的实时性、稀疏性和海量性特点,提出基于实时词共现网络的话题发现模型。首先,从原始语料中筛选出主题词集合,再利用时间参数计算共现主题词的关系权重以实现词共现网络的构建,通过该网络推算出与话题关联性强的潜在特征词以解决微博特征词的稀疏性;其次,采用改进Single-Pass算法实现话题增量聚类;最后,对每个话题的主题词按热度计算进行排序,获得最具代表性的话题主题词。实验结果表明,该模型与经典Single-Pass聚类算法相比,话题发现准确率约提高6%,综合指标提高8%。实验结果证明所提模型的有效性和准确性。

关 键 词:话题发现    实时共现网络    短文本    Single-Pass聚类    热度计算
收稿时间:2015-09-14
修稿时间:2015-10-22

Micro-blog hot-spot topic discovery based on real-time word co-occurrence network
LI Yaxing,WANG Zhaokai,FENG Xupeng,LIU Lijun,HUANG Qingsong.Micro-blog hot-spot topic discovery based on real-time word co-occurrence network[J].journal of Computer Applications,2016,36(5):1302-1306.
Authors:LI Yaxing  WANG Zhaokai  FENG Xupeng  LIU Lijun  HUANG Qingsong
Affiliation:1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming Yunnan 650500, China;2. Educational Technology and Network Center, Kunming University of Science and Technology, Kunming Yunnan 650500, China;3. Yunnan Provincial Key Laboratory of Computer Technology Applications(Kunming University of Science and Technology), Kunming Yunnan 650500, China
Abstract:In view of the real-time, sparse and massive characteristics of micro-blog, a topic discovery model based on real-time co-occurrence network was proposed. Firstly, the set of keywords was extracted from the primitive data by the model, and the relationship weights was calculated on the basis of the time parameter to structure the word co-occurrence network. Then, sparsity could be reduced by finding potential features of a strong correlation based on weight adjustment coefficient. Secondly, the topic incremental clustering could be achieved by using the improved Single-Pass algorithm. Finally, the feature words of each topic were sorted by heat calculation, so the most representative keywords of the topic were got. The experimental results show that the accuracy and comprehensive index of the proposed model increase 6%, 8% respectively compared with the Single-Pass algorithm. The experimental results prove the validity and accuracy of the proposed model.
Keywords:topic discovery                                                                                                                        real-time co-occurrence network                                                                                                                        short text                                                                                                                        Single-Pass cluster                                                                                                                        hot degree calculation
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号