首页 | 本学科首页   官方微博 | 高级检索  
     

基于词共现图的中文微博新闻话题识别
引用本文:赵文清,侯小可.基于词共现图的中文微博新闻话题识别[J].智能系统学报,2012,7(5):444-449.
作者姓名:赵文清  侯小可
作者单位:华北电力大学 控制与计算机工程学院,河北 保定 071003
基金项目:国家自然科学基金资助项目(70671039);中央高校基本科研业务费专项资金资助项目(12MS121)
摘    要:针对传统的话题检测算法主要适用于新闻网页和博客等长文本信息,而不能有效处理具有稀疏性的微博数据,给出一种基于词共现图的方法来识别微博中的新闻话题.该方法首先在微博数据预处理之后,综合相对词频和词频增加率2个因素抽取微博数据中的主题词.然后根据主题词间的共现度构建词共现图,把词共现图中每个不连通的簇集看成一个新闻话题,并使用每个簇集中包含信息量较大的几个主题词来表示微博新闻话题.最后在微博数据集上进行实验,实现了对微博中新闻话题的识别,验证了该方法的有效性.

关 键 词:微博  新闻话题  新闻话题识别  主题词  词共现图

News topic recognition of Chinese microblog based on word co-occurrence graph
ZHAO Wenqing,HOU Xiaoke.News topic recognition of Chinese microblog based on word co-occurrence graph[J].CAAL Transactions on Intelligent Systems,2012,7(5):444-449.
Authors:ZHAO Wenqing  HOU Xiaoke
Affiliation:School of Control and Computer Engineering, North China Electric Power University, Baoding 071003, China
Abstract:The traditional topic detection algorithm is applied to longer texts such as: news website pages or blogs, causing it to be hard to deal with sparse microblog data effectively. In this paper, a method based on the word co occurrence graph was provided to detect news topics of microblogs. Firstly, the relative word frequency and the word frequency increase rate were considered to extract new keywords from microblog text after pretreatment. Secondly, a word co occurrence graph was built by co occurrence degrees of keywords; each unconnected cluster in a word co occurrence graph was taken as a news topic by calculating several keywords.These keywords contain much more information in each cluster, was used to represent a news topic of microblog. Finally, data analysis provided evidence on how the approach is most effective and also revealed the microblog data set recognized news topic recognition.
Keywords:microblog  news topics  topic recognition  keywords  word co occurrence graph
本文献已被 CNKI 等数据库收录!
点击此处可从《智能系统学报》浏览原始摘要信息
点击此处可从《智能系统学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号