基于LDA模型的文本聚类检索 TextClusteringRetrievalBasedonLDAModel期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于LDA模型的文本聚类检索

引用本文：	李霄野,李春生,李龙,张可佳.基于LDA模型的文本聚类检索[J].计算机与现代化,2018,0(6):7.

作者姓名：	李霄野李春生李龙张可佳

基金项目：	黑龙江省教育规划重大课题(GJ20170006)

摘要：	传统的判断2个文档相似性的方法没有考虑到文本背后的语义关联，导致检索系统返回的结果与用户的查询需求之间存在很大的差异。本文提出一种基于LDA主题模型的文本聚类方法，首先介绍LDA主题模型的应用原理，阐述文本挖掘的基本方法，之后构建LDA主题模型，采用Gibbs抽样的方法进行推导，得到特征词的概率分布，最后用优化聚类中心选择的K-means+〖KG-*3〗+方法对测试数据集合聚类，并把设计的LDA-Gibbs模型与传统的TF-IDF模型进行聚类评价对比。实验结果表明，该模型能够提高数据的检索效果，具有良好的推广价值。
关键词：	主题模型文本聚类潜在狄利克雷分配模型聚类评价信息检索
收稿时间：	2018-07-05
TextClusteringRetrievalBasedonLDAModel

LIXiao-ye,LIChun-sheng,LILong,ZHANGKe-jia.TextClusteringRetrievalBasedonLDAModel[J].Computer and Modernization,2018,0(6):7.

Authors:	LIXiao-ye LIChun-sheng LILong ZHANGKe-jia

Abstract:	Thetraditionalmethodofjudgingthesimilarityoftwodocumentsdoesnottakeintoaccountthesemanticrelationbehindthetexts,resultinginalargedifferencebetweentheresultsreturnedbytheretrievalsystemandtheusersqueryrequirements.ThispaperpresentsatextclusteringmethodbasedonLDAtopicmodel.Firstly,theapplicationprincipleofLDAtopicmodelisintroducedandthebasicmethodoftextminingisexpounded,andthentheLDAtopicmodelisconstructed.TheGibbssamplingmethodisusedtoderivetheprobabilitydistributionofthecharacteristicwords.Finally,thesetsoftestdataareclusteredwiththeK-means+〖KG-*3〗+methodchosenbytheoptimizationclustercenter.AndthedesignedLDA-GibbsmodeliscomparedwiththetraditionalTF-IDFmodel.Experimentalresultsshowthatthismodelcanimprovetheretrievaleffectofdataandhasgoodpromotionalvalue.

Keywords:	topicmodel textclustering latentDirichletallocation(LDA) clusterevaluation informationretrieval(IR)

	点击此处可从《计算机与现代化》浏览原始摘要信息
	点击此处可从《计算机与现代化》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏