首页 | 官方网站   微博 | 高级检索  
     

基于TAKE的中文关键短语提取算法研究
引用本文:刘晨晖,张德生,胡钢.基于TAKE的中文关键短语提取算法研究[J].计算机工程与应用,2020,56(10):115-121.
作者姓名:刘晨晖  张德生  胡钢
作者单位:西安理工大学 理学院,西安 710054
基金项目:陕西省自然科学基础研究规划;国家自然科学基金
摘    要:针对传统的中文关键短语提取算法所提取关键短语准确率低、歧义性强和涵盖信息量少等问题,在英文关键短语提取算法TAKE(Totally Automated Keyword Extraction)的启发下,加入基于多领域特异性的新词识别技术,并改进了原有算法的文本分词、词语过滤和特征计算方法,提出了一种改进的TAKE算法,并应用于中文文本关键短语挖掘中。与多种传统关键短语提取算法的对比实验结果表明,该算法提取的精确率、召回率和F]值指标的量化结果相比于传统算法有比较明显的提升。

关 键 词:文本挖掘  分词  词语过滤  特征计算  关键短语提取  

Research on Chinese Key Phrase Extraction Algorithm Based on TAKE
LIU Chenhui,ZHANG Desheng,HU Gang.Research on Chinese Key Phrase Extraction Algorithm Based on TAKE[J].Computer Engineering and Applications,2020,56(10):115-121.
Authors:LIU Chenhui  ZHANG Desheng  HU Gang
Affiliation:Faculty of Sciences, Xi’an University of Technology, Xi’an 710054, China
Abstract:Aiming at the problems that traditional Chinese key phrase extraction algorithm has low accuracy, strong ambiguity and little information, inspired by the English key phrase extraction algorithm TAKE(Totally Automated Keyword Extraction), this paper adds a new word recognition technique based on multi-domain specificity and improves the text segmentation, word filtering and feature calculation methods of the original algorithm. An improved TAKE algorithm is proposed and applied to Chinese text key phrase mining. By compared with traditional key phrase extraction algorithms, it is shown that the proposed algorithm has a significant improvement in the quantization results of precision, recall and F] values.
Keywords:text mining  word segmentation  word filtering  feature calculation  key phrase extraction  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号