首页 | 官方网站   微博 | 高级检索  
     

基于主题区域发现的中文自动文摘研究
引用本文:胡珀,何婷婷,姬东鸿.基于主题区域发现的中文自动文摘研究[J].计算机科学,2005,32(1):177-181.
作者姓名:胡珀  何婷婷  姬东鸿
作者单位:1. 华中师范大学计算机科学与技术系,武汉,430079
2. 新加坡国立信息通信研究院,新加坡,119613
基金项目:中国国家语言文字应用委员会“十五”国家语委应用项目基金(ZDI105-43B),湖北省自然科学基金(2001ABB012)
摘    要:自动文摘是自然语言处理领域的一项重要的研究课题。文中提出了一种基于主题区域发现的中文自动文摘的方法。该方法的特色在于:产生的文摘能在尽可能全面地覆盖全文多个主题的同时,显著地缩减自身的冗余,从而能有效地平衡两者之间的矛盾。通过采用K—medoids的聚类算法联同新的自定义目标函数的聚类分析方法,实现了段落自适应聚类下的文本潜在主题区域的发现及其在自动文摘领域的应用。此外,一种基于表达熵的新的评价因子被用来评价摘要的冗余。实验结果验证了该方法的可行性,有效性,是对中文自动文摘研究的一种有意义的探索。

关 键 词:主题区域发现  中文自动文摘  聚类分析  表达熵  文本检索

A Study of Chinese Text Summarization Based on Thematic Area Discovery
HU Po,HE Ting-Ting,JI Dong-Hong.A Study of Chinese Text Summarization Based on Thematic Area Discovery[J].Computer Science,2005,32(1):177-181.
Authors:HU Po  HE Ting-Ting  JI Dong-Hong
Affiliation:HU Po,HE Ting-Ting,JI Dong-Hong Department of Computer Science and Technology,Central China Normal University,Wuhan 430079 Institute for Infocomm Research,Heng Mui Keng Terrace. 21 Singapore 119613
Abstract:Automatic summarization is an important issue in Natural Language Processing. This paper has proposed a special method that creates text summary by discovering thematic areas from Chinese text. The specificity of the method is that the created summary can both cover as many as different themes and reduce its redundancy obviously at the same time. And the discovery of latent thematic areas under the adaptive clustering of passages is realized by adopting k-medoids clustering method as well as a novel clustering analysis method based on self-defined objective function. In addition, a novel parameter,which is known as representation entropy,is used for summarization redun- dancy evaluation. Experimental results indicate that this method is effective and efficient in the automatic summariza- tion literature.
Keywords:Automatic summarization  Thematic area discovery  Clustering analysis  Representation entropy
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号