首页 | 本学科首页   官方微博 | 高级检索  
     

基于文本双表示模型的微博热点话题发现
引用本文:刘梦颖,王勇.基于文本双表示模型的微博热点话题发现[J].计算机与现代化,2021,0(12):110-115.
作者姓名:刘梦颖  王勇
作者单位:北京工业大学信息学部,北京 100124
摘    要:微博作为当代生活中信息传播的重要平台,对其进行热点话题挖掘成为当今重要的研究方向之一。针对传统的热点话题发现方法在处理微博文本时存在文本表示缺乏语义信息、挖掘热点话题效果差等问题,本文提出一种基于频繁词集和BERT语义的文本双表示模型(Text dual representation model based on frequent word sets and BERT semantics, FWS-BERT),通过该模型计算加权文本相似度对微博文本进行谱聚类,进一步基于改进相似性度量的affinity propagation (AP)聚类算法进行微博话题挖掘,最后通过引入文献计量学中的H指数提出一种话题热度评估方法。实验表明,本文提出的方法在轮廓系数及Calinski-Harabasz(CH)指标值上均高于基于频繁词集的单一文本表示方法和K-means方法,并且能准确地对微博数据进行话题表示和热度评估。

关 键 词:微博    频繁词集    BERT    聚类    热点话题  
收稿时间:2021-12-24

Microblog Hot Topic Discovery Based on Text Dual Representation Model
LIU Meng-ying,WANG Yong.Microblog Hot Topic Discovery Based on Text Dual Representation Model[J].Computer and Modernization,2021,0(12):110-115.
Authors:LIU Meng-ying  WANG Yong
Abstract:Microblog is an important platform for information dissemination in contemporary life, mining hot topics on microblog has become one of the important research directions nowadays. In view of the problems of traditional hot topic discovery methods in dealing with microblog text, such as lack of semantic information in text representation, poor effect of mining hot topics and so on, this paper proposes a text dual representation model based on frequent word sets and BERT semantics(FWS-BERT), which calculates the weighted text similarity to perform spectral clustering on microblog text, further, microblog topic mining is carried out based on affinity propagation (AP) clustering algorithm with improved similarity measurement. Finally, a topic heat evaluation method is proposed by introducing the H index in bibliometrics. Experiments show that the proposed method is higher than the single text representation method based on frequent word set and K-means method in contour coefficient and Calinski-Harabasz (CH) index value, and can accurately represent the topic and Evaluate-the popularity of microblog data.
Keywords:microblog  frequent word sets  BERT  clustering  hot topics  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机与现代化》浏览原始摘要信息
点击此处可从《计算机与现代化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号