首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于图挖掘的LDA改进算法
引用本文:李珊,陈妙苗,郑晨.一种基于图挖掘的LDA改进算法[J].计算机与现代化,2022,0(7):61-66.
作者姓名:李珊  陈妙苗  郑晨
基金项目:中央高校基本科研业务费专项基金资助项目(NJ2019023)
摘    要:LDA作为文本主题识别领域中使用最广泛的模型之一,其基于词袋模型的假设简单化地赋予词汇相同的权重,使得主题分布易向高频词倾斜,影响了识别主题的语义连贯性。本文针对该问题提出一种基于图挖掘的LDA改进算法GoW-LDA,首先基于特征词对在文本中的共现先后关系构建语义图模型,然后利用网络统计特征中节点的加权度,将文本的语义结构特点和关联性以权重修正的形式融入LDA主题建模中。实验结果显示,GoW-LDA相较于传统LDA和基于TF-IDF的LDA,能够大幅降低主题模型的混淆度,提高主题识别的互信息指数,并且有效减少模型的训练时间,为文本主题识别提供了一种新的解决思路。

关 键 词:文本主题识别    图挖掘    潜在狄利克雷分布  
收稿时间:2022-07-25

An Improved LDA Algorithm Based on Graph Mining
Abstract:As one of the most widely used models in the field of text topic recognition, LDA simplifies the assignment of the same weight to words based on the assumption of bag-of-words model, which makes the topic distribution inclined to high-frequency words, as well as affects the semantic coherence of the recognized topics. This paper proposes an improved LDA algorithm based on graph mining, named GoW-LDA, which firstly builds a semantic graph model based on the co-occurrence of feature word pairs in the text, then uses the weighting degree of nodes in network statistical features to integrate the semantic structure characteristics and relevance of the text into the LDA topic modeling in the form of weight correction. Experimental results show that, compared with traditional LDA and TF-IDF-based LDA, GoW-LDA can greatly reduce the complexity of topic models, improve the PMI of topic recognition, and effectively reduce the training time, which provides for a new solution idea text topic recognition.
Keywords:text topic recognition  graph mining  LDA(Latent Dirichlet Allocation)  
点击此处可从《计算机与现代化》浏览原始摘要信息
点击此处可从《计算机与现代化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号