首页 | 本学科首页   官方微博 | 高级检索  
     

融合LDA和GloVe模型的病症文本聚类算法
引用本文:吴迪,赵玉凤.融合LDA和GloVe模型的病症文本聚类算法[J].河北工程大学学报,2022,39(1):92-98.
作者姓名:吴迪  赵玉凤
作者单位:河北工程大学 信息与电气工程学院,河北 邯郸056038
基金项目:河北省自然科学基金资助项目(F2020402003,F2019402428)
摘    要:针对隐含狄利克雷分布(LDA)模型特征提取时忽略语义信息的问题,提出一种融合LDA和全局文本表示(GloVe)模型的病症文本聚类算法LG&K-Medoide.首先,利用LDA对病症文本数据建模,采用JS(Jensen-Shannon)距离计算文本相似度;其次,利用GloVe对病症文本数据建模获取词向量,根据病症词性贡献...

关 键 词:病症文本  LDA  GloVe  相似度结合  聚类
收稿时间:2021/6/21 0:00:00

Disease Text Clustering Algorithm Based on LDA and GloVe Model
Authors:WU Di and ZHAO Yufeng
Affiliation:School of Information and Electrical Engineering, Hebei University of Engineering, Handan, Hebei 056038, China and School of Information and Electrical Engineering, Hebei University of Engineering, Handan, Hebei 056038, China
Abstract:Aiming at solving the problem of ignoring semantic information in LDA model feature extraction, a disease text clustering algorithm LG&K-Medoide based on LDA and GloVe model was proposed. First, LDA was used to model the disease text data, and the JS distance was used to calculate the text similarity; second, GloVe was used to model the disease text data to obtain the word vector, the weight of the word vector was labeled according to the contribution to part of speech from disease text, and the cosine distance was used to calculate weighted text similarity based on GloVe modeling; finally, the two similarities are combined to improve the distance formula to realize K-Medoide clustering. The experimental results show that the LG&K-Medoide algorithm has higher accuracy than the clustering algorithm based on LDA, LDA+TF-IDF and LDA+Word2Vec models.
Keywords:disease text  LDA  GloVe  similarity combined finite  clustering
本文献已被 万方数据 等数据库收录!
点击此处可从《河北工程大学学报》浏览原始摘要信息
点击此处可从《河北工程大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号