首页 | 本学科首页   官方微博 | 高级检索  
     

基于词共现矩阵的项目关键词词库和关键词语义网络
引用本文:王庆,陈泽亚,郭静,陈晰,王晶华.基于词共现矩阵的项目关键词词库和关键词语义网络[J].计算机应用,2015,35(6):1649-1653.
作者姓名:王庆  陈泽亚  郭静  陈晰  王晶华
作者单位:1. 中国科学技术大学 计算机科学与技术学院, 合肥 230027; 2. 中国科学技术大学 苏州研究院, 江苏 苏州 215123; 3. 国家电网公司信息通信分公司, 北京 100761
摘    要:针对专业领域中科技项目的关键词提取和项目词库建立的问题,提出了一种基于语义关系、利用共现矩阵建立项目关键词词库的方法。该方法在传统的基于共现矩阵提取关键词研究的基础上,综合考虑了关键词在文章中的位置、词性以及逆向文件频率(IDF)等因素,对传统算法进行改进。另外,给出一种利用共现矩阵建立关键词关联网络,并通过计算与语义基向量相似度识别热点关键词的方法。使用882篇电力项目数据进行仿真实验,实验结果表明改进后的方法能够有效对科技项目进行关键词提取,建立关键词关联网络,并在准确率、召回率以及平衡F分数(F1-score)等指标上明显优于基于多特征融合的中文文本关键词提取方法。

关 键 词:关键词提取    共现矩阵    关键词词库    关键词语义网络    电力项目
收稿时间:2015-01-13
修稿时间:2015-03-26

Project keyword lexicon and keyword semantic network based on word co-occurrence matrix
WANG Qing,CHEN Zeya,GUO Jing,CHEN Xi,WANG Jinghua.Project keyword lexicon and keyword semantic network based on word co-occurrence matrix[J].journal of Computer Applications,2015,35(6):1649-1653.
Authors:WANG Qing  CHEN Zeya  GUO Jing  CHEN Xi  WANG Jinghua
Affiliation:1. School of Computer Science and Technology, University of Science and Technology of China, Hefei Anhui 230027, China;
2. Suzhou Institute for Advanced Study, University of Science and Technology of China, Suzhou Jiangsu 215123, China;
3. State Grid Information and Telecommunication Branch, Beijing 100761, China
Abstract:In order to solve the problems of keyword extraction and project keyword lexicon establishment of technological projects in professional fields, an algorithm for building the lexicon based on semantic relation and co-occurrence matrix was proposed. On the basis of conventional keyword extraction research based on co-occurrence matrix, the algorithm considered several advanced factors such as the location, property and Inverse Document Frequency (IDF) index of the keywords to improve the traditional approach. Meanwhile, a method was given for the establishment of keyword semantic network using co-occurrence matrix and hot keyword identification through computing the similarity with semantic base vector. At last, 882 project experiment documents in power field were used to perform the simulation. And the experimental results show that the proposed algorithm can effectively extract the keywords for the technological projects, establish the keyword correlation network, and has better performance in precision, recall rate and F1-score than the keyword extraction algorithm of Chinese text based on multi-feature fusion.
Keywords:keyword extraction  co-occurrence matrix  keyword lexicon  keyword semantic network  power project
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号