首页 | 本学科首页   官方微博 | 高级检索  
     

中文文本的主题关键短语提取技术
引用本文:杨玥,张德生. 中文文本的主题关键短语提取技术[J]. 计算机科学, 2017, 44(Z11): 432-436
作者姓名:杨玥  张德生
作者单位:西安理工大学理学院 西安710054,西安理工大学理学院 西安710054
摘    要:在大数据时代,信息量暴增,人们接触最多的信息就是文本信息,每天在互联网上都有无数文本信息被上传或下载。快速掌握这些文本信息内容的重要方法之一就是关键词提取。然而,在传统关键词提取算法中,通常忽略了两个重要的方面:词语长度和文本主题。针对以上两方面问题,提出了提取中文文本的主题关键短语技术。将LDA主题模型与频繁短语发现算法相结合,生成不同长度的频繁候选短语;然后,利用所提的完整性筛选和排序函数对候选短语进行筛选和排序;最后,根据排序结果选择最终的主题关键短语。

关 键 词:关键词提取  LDA主题模型  频繁短语  完整性筛选  排序函数

Technology of Extracting Topical Keyphrases from Chinese Corpora
YANG Yue and ZHANG De-sheng. Technology of Extracting Topical Keyphrases from Chinese Corpora[J]. Computer Science, 2017, 44(Z11): 432-436
Authors:YANG Yue and ZHANG De-sheng
Abstract:In the big data era,the information is exploding.The most popular information among people connection is text message.On the Internet,there are countless text information upload or download every day.The important way to quickly grasp content of countless text message is extracting keywords.However,the traditional work of extracting keywords from text corpora ignores two problems:the length of keywords and the topic of text corpora.In this paper,a new algorithm which is in consideration of two aspects mentioned above was proposed.This paper combined the LDA topic model and frequent phrases discovery algorithm to generate frequent candidate phrases with different length,at the same time,this paper proposed an algorithm of completeness filter and rank function to filt and rank candidate.Finally,according to the rank list,the real keyphrases were chosen.
Keywords:Extracting keywords  LDA topic model  Frequent phrases  Completeness filter  Rank function
点击此处可从《计算机科学》浏览原始摘要信息
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号