基于联合权重的多文档关键词抽取技术 Keyword Extraction in Multi-Document Based on Joint Weight期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于联合权重的多文档关键词抽取技术

引用本文：	杨洁,季铎,蔡东风,林晓庆,白宇. 基于联合权重的多文档关键词抽取技术[J]. 中文信息学报, 2008, 22(6): 75-79

作者姓名：	杨洁季铎蔡东风林晓庆白宇

作者单位：	1. 沈阳航空工业学院知识工程中心,辽宁沈阳 110034; 2. 辽东学院信息技术学院,辽宁丹东 118003

基金项目：	教育部科学技术研究重点项目，教育部科学技术研究重点项目

摘要：	该文提出一种多文档关键词抽取方法,该方法提出ATF×PDF(Average Term Frequency×Proportional Document Frequency)来计算词语权重,并根据候选关键词之间的语义相似度,采用联合权重方法重新计算候选关键词的权重来抽取关键词。该方法综合考虑了词语的频率,词性以及词语之间的语义相似性等信息,实验表明,该方法能有效抽取多个文档的关键词,同基于关键词的聚类标记方法相比,其准确率提高3%,召回率提高7%,F-measure提高4.4%。
关键词：	计算机应用中文信息处理 ATF×PDF 联合权重多文档语义相似度
Keyword Extraction in Multi-Document Based on Joint Weight

YANG Jie,JI Duo,CAI Dong-feng,LIN Xiao-qing,BAI Yu. Keyword Extraction in Multi-Document Based on Joint Weight[J]. Journal of Chinese Information Processing, 2008, 22(6): 75-79

Authors:	YANG Jie JI Duo CAI Dong-feng LIN Xiao-qing BAI Yu

Affiliation:	1. Knowledge Engineering Research Center, Shenyang Institute of Aeronautical Engineering, Shenyang, Liaoning 110034, China; 2. Institute of Information Technology, Eastern Liaoning University, Dandong,Liaoning 118003, China;

Abstract:	This paper presents a keyword extraction method by first calculating word weight with ATF×PDF (Average Term Frequency*Proportional Document Frequency) and then determining the keywords by a joint weigh considering the semantic similarity between words. This method takes into account of the information of the frequency, the part of speech and the semantic relation simultaneously. The result shows that this method can efficiently extract keywords that cover multi-document’s topic, achieving an improvement in precision, recall and F-measure by 3%, 7%, and 4.4% respectively compared to keyword-based cluster-labeling algorithm.

Keywords:	computer application Chinese information processing ATF×PDF joint weigh multi-document semantic similarity
本文献已被维普万方数据等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏