首页 | 本学科首页   官方微博 | 高级检索  
     

基于联合权重的多文档关键词抽取技术
引用本文:杨洁,季铎,蔡东风,林晓庆,白宇. 基于联合权重的多文档关键词抽取技术[J]. 中文信息学报, 2008, 22(6): 75-79
作者姓名:杨洁  季铎  蔡东风  林晓庆  白宇
作者单位:1. 沈阳航空工业学院 知识工程中心,辽宁 沈阳 110034; 2. 辽东学院 信息技术学院,辽宁 丹东 118003
基金项目:教育部科学技术研究重点项目,教育部科学技术研究重点项目
摘    要:该文提出一种多文档关键词抽取方法,该方法提出ATF×PDF(Average Term Frequency×Proportional Document Frequency)来计算词语权重,并根据候选关键词之间的语义相似度,采用联合权重方法重新计算候选关键词的权重来抽取关键词。该方法综合考虑了词语的频率,词性以及词语之间的语义相似性等信息,实验表明,该方法能有效抽取多个文档的关键词,同基于关键词的聚类标记方法相比,其准确率提高3%,召回率提高7%,F-measure提高4.4%。

关 键 词:计算机应用  中文信息处理  ATF×PDF  联合权重  多文档  语义相似度  

Keyword Extraction in Multi-Document Based on Joint Weight
YANG Jie,JI Duo,CAI Dong-feng,LIN Xiao-qing,BAI Yu. Keyword Extraction in Multi-Document Based on Joint Weight[J]. Journal of Chinese Information Processing, 2008, 22(6): 75-79
Authors:YANG Jie  JI Duo  CAI Dong-feng  LIN Xiao-qing  BAI Yu
Affiliation:1. Knowledge Engineering Research Center, Shenyang Institute of Aeronautical
Engineering, Shenyang, Liaoning 110034, China;
2. Institute of Information Technology, Eastern Liaoning University, Dandong,Liaoning 118003, China;
Abstract:This paper presents a keyword extraction method by first calculating word weight with ATF×PDF (Average Term Frequency*Proportional Document Frequency) and then determining the keywords by a joint weigh considering the semantic similarity between words. This method takes into account of the information of the frequency, the part of speech and the semantic relation simultaneously. The result shows that this method can efficiently extract keywords that cover multi-document’s topic, achieving an improvement in precision, recall and F-measure by 3%, 7%, and 4.4% respectively compared to keyword-based cluster-labeling algorithm.
Keywords:computer application   Chinese information processing   ATF×PDF  joint weigh  multi-document  semantic similarity  
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号