首页 | 本学科首页   官方微博 | 高级检索  
     

一种中文微博新闻话题检测的方法
引用本文:郑斐然,苗夺谦,张志飞,高灿.一种中文微博新闻话题检测的方法[J].计算机科学,2012,39(1):138-141.
作者姓名:郑斐然  苗夺谦  张志飞  高灿
作者单位:同济大学计算机科学与技术系 上海201804;同济大学嵌入式系统与服务计算教育部重点实验室 上海201804
摘    要:微博的迅猛发展带来了另一种社会化的新闻媒体形式。提出一种从微博中挖掘新闻话题的方法,即在线检测微博消息中大量突现的关键字,并将它们进行聚类,从而找到新闻话题。为了提取出新闻主题词,综合考虑短文本中的词频和增长速度而构造复合权值,用以量化词语是新闻词汇的程度;在话题构造中使用了上下文的相关度模型来支撑增量式聚类算法,相比于语义相似度模型,其更能适应该问题的特点。在真实的微博数据上运行的实验表明,本方法可以有效地从大量消息中检测出新闻话题。

关 键 词:微博  新闻  话题检测  聚类

News Topic Detection Approach on Chinese Microblog
ZHENG Fei-ran , MIAO Duo-qian , ZHANG Zhi-fei , GAO Can.News Topic Detection Approach on Chinese Microblog[J].Computer Science,2012,39(1):138-141.
Authors:ZHENG Fei-ran  MIAO Duo-qian  ZHANG Zhi-fei  GAO Can
Affiliation:(Department of Computer Science and Technology,Tongji University,Shanghai 201804,China)(The Key Laboratory of Embedded System and Service Computing,Ministry of Education,Tongji University,Shanghai 201804,China)
Abstract:The popularity of microblogging brings another form of social news media. The paper proposed an approach of news topics mining from microblog. News topics were formed by finding the emerging keywords in large numbers and clustering them. To extract news keywords,a compound weight was introduced combining the word frequency and the growth, to measure the likelihood of a word to be a news keyword, and to construct the topic, contextual relevance model was used to support incremental clustering, which is more suitable to the problem compared with semantic similarity. The experiments on real world microblog data show the effectiveness of the approach to detect news topic out of massroc mcssagcs.
Keywords:Microblog  News  Topic detection  Clustering
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号