基于词汇同现模型的关键词自动提取方法研究 A Method of Automatic Keyword Extraction based on Co-occurrence Model期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于词汇同现模型的关键词自动提取方法研究

引用本文：	肖红,许少华.基于词汇同现模型的关键词自动提取方法研究[J].沈阳理工大学学报,2009,28(5):38-41.

作者姓名：	肖红许少华

作者单位：	大庆石油学院,计算机与信息技术学院,黑龙江,大庆163318

基金项目：	国家自然科学基金资助项目，黑龙江省自然科学基金资助项目

摘要：	关键词提取是中文信息处理的一个关键环节。提出一种关键词自动提取的有效方法，首先对普通词典进行了扩充，在普通词典的基础上结合大量的训练样本对词典进行训练得到一个带有TFxIDF值和互信息的优化词典。然后在此词典上按段落进行切词，对切词结果集根据词频、权重、同现关系和互信息排序后筛选出候选关键词。最后根据候选词的上位词和下位词进行词汇合并，设定一个阀值，取出其中的n个词作为文章的关键词。通过小数据测试样本集的抽取实验结果表明，文中方法在一定程度上能够提高关键词提取的正确率，得到了较为满意的效果．
关键词：	关键词自动提取同现关系互信息 TF×IDF
A Method of Automatic Keyword Extraction based on Co-occurrence Model

XIAO Hong,XU Shao-hua.A Method of Automatic Keyword Extraction based on Co-occurrence Model[J].Transactions of Shenyang Ligong University,2009,28(5):38-41.

Authors:	XIAO Hong XU Shao-hua

Affiliation:	( Dept. of Computer ＆Information Technology, Daqing Petroleum Institute, Daqing 163318, China)

Abstract:	Keyword Extraction is a key problem in Chinese language processing. Firstly, an effective way for automatically extracting keywords was proposed in this paper, which extends the normal dic- tionary and constructs an optimum one with the TF × IDF and MI factor in vocabulary by training massive sample data sets on the base of normal dictionary. Secondly, based on the optimum diction- ary, all segment word items are sorted and the candidate words are selected in terms of the word frequency, weight, co-occurrence relationship and MI factor. With application of the candidate word＇s epigynous and hypogynous, the word items are merged. Finally, by setting a threshold that confined the number of keywords, the final keywords of document are obtained. It is shown by the experimental results that the method can improve the accuracy of automatic keywords extraction in certain extent, and that the more satisfied results are presented in min data-set.

Keywords:	automatic keyword extraction co-occurrence relationship mutual information TF × IDF
本文献已被维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏