首页 | 本学科首页   官方微博 | 高级检索  
     

结合拓扑势与TextRank算法的关键词提取方法
引用本文:罗婉丽,张磊. 结合拓扑势与TextRank算法的关键词提取方法[J]. 计算机应用与软件, 2022, 39(1): 334-338. DOI: 10.3969/j.issn.1000-386x.2022.01.051
作者姓名:罗婉丽  张磊
作者单位:四川旅游学院信息与工程学院 四川 成都 610199;四川省农村信用联合社 四川 成都 610041
基金项目:四川省教育厅项目(14ZB0315)。
摘    要:传统的TextRank算法进行关键词提取时词语之间的连接边采用权值均分的形式进行加权,未考虑词语的语义信息。针对这种情况,提出结合拓扑势与TextRank算法的关键词提取方法。方法使用词频和词语在文中的分布情况对词语加权作为词语的全局影响;使用拓扑势的思想结合词语全局影响计算词语间的转移概率作为词语的局部影响;将转移概率矩阵应用于传统TextRank算法中。实验表明,考虑词语全局及局部重要性等语义信息可有效提升TextRank算法的准确率和召回率。

关 键 词:TextRank  算法  关键词提取  语义信息  拓扑势

KEYWORDS EXTRACTION METHOD COMBINING TOPOLOGICAL POTENTIAL AND TEXTRANK ALGORITHM
Luo Wanli,Zhang Lei. KEYWORDS EXTRACTION METHOD COMBINING TOPOLOGICAL POTENTIAL AND TEXTRANK ALGORITHM[J]. Computer Applications and Software, 2022, 39(1): 334-338. DOI: 10.3969/j.issn.1000-386x.2022.01.051
Authors:Luo Wanli  Zhang Lei
Affiliation:(College of Information and Engineering,Sichuan Tourism University,Chengdu 610199,Sichuan,China;Sichuan Rural Credit Union,Chengdu 610041,Sichuan,China)
Abstract:When the traditional TextRank algorithm extracts keywords,the connection edges between words are weighted in the form of equipartition without considering the semantic information of words.In this case,this paper proposes a keyword extraction method combining topological potential and TextRank algorithm.It used the word frequency and the distribution of words in the text to weight words as the global influence;it combined the topological potential with the global influence to calculate the transfer probability between words as the local influence of words;the transition probability matrix was applied to the traditional TextRank algorithm.The experiments show that considering the semantic information such as global and local importance of words can effectively improve the accuracy and recall rate of the TextRank algorithm.
Keywords:TextRank algorithm  Keywords extraction  Semantic information  Topological potential
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号