首页 | 本学科首页   官方微博 | 高级检索  
     

基于吸收马尔可夫链的子话题发现方法
引用本文:魏明川,朱俊杰,张 瑾,张 凯,程学旗,任 彦. 基于吸收马尔可夫链的子话题发现方法[J]. 中文信息学报, 2014, 28(1): 41-46
作者姓名:魏明川  朱俊杰  张 瑾  张 凯  程学旗  任 彦
作者单位:1. 中国科学院计算技术研究所,北京 100190;
2. 中国科学院研究生院,北京 100190;
3. 国家计算机网络应急技术处理协调中心,北京 100190
基金项目:国家自然科学基金(60903139),国家242专项(2011F45,2011A001,2012G129)
摘    要:受互联网文本信息话题内容多元性,演化性等特点的影响,传统的话题检测模型对子话题粒度的选取和检测质量很难保证。针对该问题,该文提出一种基于吸收马尔可夫链的子话题划分算法,该算法对基于网页聚类生成的话题关键词进行组合生成子话题,并以吸收马尔可夫链对子话题进行吸收衍化,进行重排序生成结果子话题。实验结果表明,该算法能同时保证生成子话题的重要性和多样性。

关 键 词:子话题划分  话题关键词  吸收马尔可夫链  

An Algorithm for Subtopic Detecting Based on Absorbing Markov Chain
WEI Mingchuan,ZHU Junjie,ZHANG Jin,ZHANG Kai,CHENG Xueqi,REN Yan. An Algorithm for Subtopic Detecting Based on Absorbing Markov Chain[J]. Journal of Chinese Information Processing, 2014, 28(1): 41-46
Authors:WEI Mingchuan  ZHU Junjie  ZHANG Jin  ZHANG Kai  CHENG Xueqi  REN Yan
Affiliation:1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
2. Graduate University of Chinese Academy of Sciences, Beijing 100190, China;
3. National Computer network Emergency Response Technical Team/Coordination Center, Beijing 100190, China
Abstract:Due to such natures as content diversity, dynamic evolution ,and so on, its difficult to get high quality subtopics for web texts and topics by traditional topic detection and tracking models. An algorithm of subtopic partition based on absorbing Markov chain is proposed to address this issue. The algorithm firstly gathers the topic keywords clustered by the web pages to generate subtopics, then derived subtopics based on the absorbing Markov chain. The experimental results show that the algorithm performs well in terms of both significance and diversity.
Keywords:subtopic partition   topic keywords   absorbing Markov chain  
本文献已被 CNKI 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号