首页 | 本学科首页   官方微博 | 高级检索  
     

MB-SinglePass:基于组合相似度的微博话题检测
引用本文:周刚,邹鸿程,熊小兵,黄永忠. MB-SinglePass:基于组合相似度的微博话题检测[J]. 计算机科学, 2012, 39(10): 198-202
作者姓名:周刚  邹鸿程  熊小兵  黄永忠
作者单位:1. 软件开发环境国家重点实验室 北京100191 ; 信息工程大学信息工程学院 郑州450002
2. 信息工程大学信息工程学院 郑州450002
基金项目:软件开发环境国家重点实验室开放课题,国家高技术研究发展(863)计划
摘    要:话题检测技术在传统媒体的研究中取得了较好的效果。探讨了针对微博类的新型媒体短文本对象话题检测技术的优化及性能评价。基于微博中联系人存在的关注和粉丝等结构化信息、帖子之间转发评论等内在关联关系,提出了针对微博的MB-SinglePass话题检测算法。该算法除了考虑微博上述特点之外,还针对短文本特征稀疏的问题,利用同义词典,引入了微博特征扩展技术,丰富了特征信息。同时,针对单一使用余弦相似度、雅各比相似度和语义相似度的不足,采用了组合相似度策略。相较传统算法,MB-SinglePass算法在新浪微博实测数据集上取得了更好的性能。另外,针对相似度策略的对照实验说明采用组合相似度的效果优于单一相似度。

关 键 词:微博  SinglePass  话题检测  文本相似度  同义词扩展

MB-SinglePass:Microblog Topic Detection Based on Combined Similarity
ZHOU Gang , ZOU Hong-cheng , XIONG Xiao-bing , HUANG Yong-zhong. MB-SinglePass:Microblog Topic Detection Based on Combined Similarity[J]. Computer Science, 2012, 39(10): 198-202
Authors:ZHOU Gang    ZOU Hong-cheng    XIONG Xiao-bing    HUANG Yong-zhong
Affiliation:2(State Key Laboratory of Software Development Environment,Beijing 100191,China)1(Information Engineering Institute,Information Engineering University,Zhengzhou 450002,China)2
Abstract:Topic detection achieves quite good result in the traditional media research. This paper discussed the refiness and performance evaluation of the topic detection technique in the new kind of medics such as microblog, proposed theM13-SinglePass topic detection algorithm on the basis of the structured information such as the relationships of attenlions and fans between contacts, the inner connection relationships such as forwarding and comment between posts. Beside considering the above microblog characteristics,MB-SinglePass introduces the characteristics extension technique in order w enrich characteristics information. At the same time, the paper used the combined similarity aiming at the shortage of singly utilizing the Jaccard similarity cocfficient,cosine based similarity and semantic similarity. Compared with the traditional algorithms,MI3-SinglePass shows better performance on the actual dataset of sing microblog. Additionally, experiment according to the similarity strategy reveals better result by using combined similarity than singular similariy.
Keywords:Microblog  SinglePass  Topic detection  Text similarity  Synonyms extension
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机科学》浏览原始摘要信息
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号