首页 | 本学科首页   官方微博 | 高级检索  
     

基于LDA的多特征融合的短文本相似度计算
引用本文:张小川,余林峰,张宜浩.基于LDA的多特征融合的短文本相似度计算[J].计算机科学,2018,45(9):266-270.
作者姓名:张小川  余林峰  张宜浩
作者单位:重庆理工大学计算机科学与工程学院 重庆401320,重庆理工大学计算机科学与工程学院 重庆401320,重庆理工大学计算机科学与工程学院 重庆401320
基金项目:本文受国家自然科学基金(60443004),重庆市重大科技项目(cstc2013jcsf-jcssX0020),重庆市基础科学与前沿技术研究计划项目(cstc2015jcyjA40041)资助
摘    要:近年来,LDA(Latent Dirichlet()al.location)主题模型通过挖掘文本的潜在语义主题进行文本表示,为短文本的相似度计算提供了新思路。针对短文本特征稀疏,应用LDA主题模型易导致文本相似度计算结果缺乏准确性的问题,提出了基于LDA的多特征融合的短文本相似度算法。该方法融合了主题相似度因子ST(Similarity Topic)和词语共现度因子CW(Co-occurrence Words),建立了联合相似度模型以规约不同ST区间下CW对ST产生的约束或补充条件,并最终权衡了准确性更高的相似度结果。对改进后的算法进行文本聚类实验,结果表明改进后的算法在F度量值上取得了一定程度的提升。

关 键 词:LDA  主题模型  短文本相似度  主题相似度  词语共现度
收稿时间:2017/7/11 0:00:00
修稿时间:2017/10/8 0:00:00

Multi-feature Fusion for Short Text Similarity Calculation Based on LDA
ZHANG Xiao-chuan,YU Lin-feng and ZHANG Yi-hao.Multi-feature Fusion for Short Text Similarity Calculation Based on LDA[J].Computer Science,2018,45(9):266-270.
Authors:ZHANG Xiao-chuan  YU Lin-feng and ZHANG Yi-hao
Affiliation:College of Computer Science and Engineering,Chongqing University of Technology,Chongqing 401320,China,College of Computer Science and Engineering,Chongqing University of Technology,Chongqing 401320,China and College of Computer Science and Engineering,Chongqing University of Technology,Chongqing 401320,China
Abstract:In recent years,latent dirichlet()al.location(LDA)topic model provides a new idea for short text similarity calculation by mining the latent semantic themes of text.In view of the sparse features of short text,because the application of LDA theme model may easily lead to inaccurate results of similarity computation,this paper presented a calculation method based on LDA model combining similarity topics factor ST and co-occurrence words factor CW to establish union similarity model.In the protocol of different ST intervals,CW generates constraint or supplementary conditions to ST,and obtains higher accuracy of text similarity.A text clustering experiment was used to verify the method.The experimental results show that the proposed method gains a certain improvement of F measure value.
Keywords:LDA  Topic model  Short text similarity  Similarity topics  Co-occurence words
点击此处可从《计算机科学》浏览原始摘要信息
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号