首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于词汇链的关键词抽取方法
引用本文:索红光,刘玉树,曹淑英.一种基于词汇链的关键词抽取方法[J].中文信息学报,2006,20(6):27-32.
作者姓名:索红光  刘玉树  曹淑英
作者单位:1.北京理工大学计算机科学技术学院2.中国石油大学计算机与通信工程学院
摘    要:关键词在文献检索、自动文摘、文本聚类/分类等方面有十分重要的作用。词汇链是由一系列词义相关的词语组成,最初被用于分析文本的结构。本文提出了利用词汇链进行中文文本关键词自动标引的方法,并给出了利用《知网》为知识库构建词汇链的算法。通过计算词义相似度首先构建词汇链,然后结合词频与区域特征进行关键词选择。该方法考虑了词汇之间的语义信息,能够改善关键词标引的性能。实验结果表明,与单纯的词频、区域方法相比,召回率提高了7.78%,准确率提高了9.33%。

关 键 词:计算机应用  中文信息处理  关键词标引  关键词抽取  词汇链  词义相似度  知网  
文章编号:1003-0077(2006)06-0025-06
收稿时间:2005-11-21
修稿时间:2005年11月21

A Keyword Selection Method Based on Lexical Chains
SUO Hong-guang,LIU Yu-shu,CAO Shu-ying.A Keyword Selection Method Based on Lexical Chains[J].Journal of Chinese Information Processing,2006,20(6):27-32.
Authors:SUO Hong-guang  LIU Yu-shu  CAO Shu-ying
Affiliation:1.School of Computer Science and Technology , Beijing Institute of Technology2.School of Computer and Communication Engineering , China University of Petroleum
Abstract:Keywords are very useful for information retrieval, automatic summarizing, text clustering/classification and so on. A lexical chain is a series of related words and primarily used in text structure analyzing. The paper proposes a lexical-chain-based keywords indexing method for Chinese texts. And, an algorithm for constructing lexieal chains based on HowNet knowledge database is given. In the method, lexical chains are firstly constructed by calculating the semantic similarity between terms, then keywords are selected through taking account of term frequency and area. The experimental results shows that the performance of the system has a notable improvement by considering semantic relationship between terms, and the precision can be improved by 9. 33 percent and the recall can be improved by 7.78 percent compared with term frequency and area.
Keywords:computer application  Chinese information processing  keyword indexing  keyword extraction  lexical chains  word similarity  HowNet
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号