首页 | 本学科首页   官方微博 | 高级检索  
     

基于字同现频率的关键词自动抽取
引用本文:都云程,周伟,韩艳铧,吕学强.基于字同现频率的关键词自动抽取[J].北京机械工业学院学报,2011(6):35-38.
作者姓名:都云程  周伟  韩艳铧  吕学强
作者单位:北京信息科技大学中文信息处理研究中心,北京100101
基金项目:国家自然科学基金项目(60872133); 北京市自然科学基金项目(4092015); 北京市教委科技发展计划项目(KM201110772021); 国家科技支撑计划课题(2011BAH11B03)
摘    要:为提高关键词自动抽取的准确率,提出了基于字同现频率的关键词自动抽取算法。根据词的位置和文本长度改进TF/IDF算法,由字同现频率计算词的信息量,运用特征加权计算词的权重,选取权重大的词作为关键词。给出了关键词自动抽取的过程,设计了关键词抽取的对比实验,验证该算法的有效性。实验结果表明该算法在准确率和召回率上具有优势。

关 键 词:关键词自动抽取  字同现  TF/IDF  信息量

Automatic extraction of keyword based on word co-occurrence frequency
Affiliation:DU Yun-cheng,ZHOU Wei,HAN Yan-hua,Lü Xue-qiang(Chinese Information Processing Research Center,Beijing Information Science and Technology University,Beijing 100101,China)
Abstract:In order to improve the accuracy of Keyword automatic extraction,a keyword automatic extraction algorithm is proposed based on word co-occurrence frequency.The TF/IDF algorithm is improved by the position of word and the length of text.The word information is calculated by the word co-occurrence frequency,and the words weighted by features.Finally,the words with higher weight are chosen as Key words.A comparison experiment of keyword automatic extraction is designed to verify the validity of algorithm.The results show that this method has advantage in precision ratio and recall ratio.
Keywords:keyword automatic extraction  word co-occurrence  TF/IDF  word information
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号