首页 | 本学科首页   官方微博 | 高级检索  
     

基于多策略优化的分治多层聚类算法的话题发现研究
引用本文:骆卫华,于满泉,许洪波,王斌,程学旗.基于多策略优化的分治多层聚类算法的话题发现研究[J].中文信息学报,2006,20(1):31-38.
作者姓名:骆卫华  于满泉  许洪波  王斌  程学旗
作者单位:1.中国科学院计算技术研究所2.中国科学院研究生院
摘    要:话题发现与跟踪是一项评测驱动的研究,旨在依据事件对语言文本信息流进行组织利用。自1996年提出以来,该研究得到了越来越广泛的关注。本文在研究已有成熟算法的基础上,提出了基于分治多层聚类的话题发现算法,其核心思想是把全部数据分割成具有一定相关性的分组,对各个分组分别进行聚类,得到各个分组内部的话题(微类) ,然后对所有的微类再进行聚类,得到最终的话题,在聚类的过程中采用多种策略进行优化,以保证聚类的效果。基于该算法的系统在TDT4中文语料上进行了测试,结果表明该算法属于目前结果最好的算法之一。

关 键 词:计算机应用  中文信息处理  话题发现与跟踪  分治多层聚类  系统聚类  
文章编号:1003-0077(2006)01-0029-08
收稿时间:2005-05-20
修稿时间:2005-10-17

The Study of Topic Detection Based on Algorithm of Division and Multi-level Clustering with Multi-strategy Optimization
LUO Wei-hua,YU Man-quan,XU Hong-bo,WANG Bin,CHENG Xue-qi.The Study of Topic Detection Based on Algorithm of Division and Multi-level Clustering with Multi-strategy Optimization[J].Journal of Chinese Information Processing,2006,20(1):31-38.
Authors:LUO Wei-hua  YU Man-quan  XU Hong-bo  WANG Bin  CHENG Xue-qi
Affiliation:1.Institute of Computing Technology , Chinese Academy of Sciences2.Graduate School of Chinese Academy of Sciences
Abstract:Topic Detection and Tracking is a research driven by evaluation,which intends to organize and utilize information stream of texts according to event.Since being brought forward in 1996,it comes under more and more attention.This paper proposes an algorithm of division and multi-level clustering with multi-strategy optimization,which bases on study of today's mature algorithms.The core thought of the algorithm is to divide all data into groups(each group has intrinsic relevance),and cluster in each group to produce micro-clusters,and then cluster on all micro-clusters to result in final topics.During the process,various strategies are employed to improve the effect of clustering.The system implemented with the algorithm has been tested on TDT4 corpus.The test indicates the algorithm is one present best algorithm.
Keywords:computer application  Chinese information processing  topic detection and tracking  division and multi-level clustering  hierarchical clustering
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号