首页 | 本学科首页   官方微博 | 高级检索  
     

基于多种词特征的微博突发事件检测方法
引用本文:张仰森,段宇翔,王建,吴云芳. 基于多种词特征的微博突发事件检测方法[J]. 电子学报, 2019, 47(9): 1919-1928. DOI: 10.3969/j.issn.0372-2112.2019.09.015
作者姓名:张仰森  段宇翔  王建  吴云芳
作者单位:北京信息科技大学智能信息处理研究所,北京100101;国家经济安全预警工程北京实验室,北京100044;北京信息科技大学智能信息处理研究所,北京,100101;北京大学计算语言学研究所,北京,100871
基金项目:国家自然科学基金;科技创新服务能力建设-科研基地建设-北京实验室-国家经济安全预警工程北京实验室项目
摘    要:近年来,各领域内频频发生各类突发事件,对社会稳定发展产生了一定程度的影响.本文提出了一种基于多种词特征的微博突发事件检测模型,可以在海量微博数据中对突发事件进行检测,便于相关决策者进行微博监控和舆论引导,尽可能减少突发事件给社会带来的危害.首先根据时间信息对微博数据进行时间切片,对每一个时间窗口内的数据分别计算各个词语的词频特征、话题标签特征和词频增长率特征;然后基于D-S证据理论和层次分析法,确定词的各个特征权重,并进行加权融合得到词的突发特征值,将突发特征值大的词挑选出来构成突发特征词集,构建基于共现度和结合紧密度的突发事件特征词集的耦合度矩阵;最后将该耦合度矩阵作为凝聚式层次聚类算法的输入,生成一棵由突发词为叶子节点的二叉树,并采用内部相似度的二叉树剪枝算法对聚类结果进行划分,即可实现对相应时间窗口突发事件的检测.实验结果表明,基于突发词的事件检测模型在簇内部相似度阈值等于1.1时效果最好,正确率达到0.8462、召回率达到0.8684、F值为0.8571,表明了本文所提方法的有效性.

关 键 词:微博  突发事件  突发特征词  D-S证据理论  凝聚式层次聚类
收稿时间:2018-08-13

Microblog Bursty Events Detection Method Based on Multiple Word Features
ZHANG Yang-sen,DUAN Yu-xiang,WANG Jian,WU Yun-fang. Microblog Bursty Events Detection Method Based on Multiple Word Features[J]. Acta Electronica Sinica, 2019, 47(9): 1919-1928. DOI: 10.3969/j.issn.0372-2112.2019.09.015
Authors:ZHANG Yang-sen  DUAN Yu-xiang  WANG Jian  WU Yun-fang
Affiliation:1. Institute of Intelligent Information Processing, Beijing Information Science and Technology University, Beijing 100101, China;2. Institute of Computational Linguistics, Peking Universit,, Beijing, 100871, China;3. Beijing Laboratory of National Economic Security Early-warning Engineering, Beijing 100044, China
Abstract:In recent years,a wide variety of bursty events have been occurring frequently in many fields,impacting both the stability and the development of our society.This paper proposes an event detection model based on multiple word features,which is intended to detect bursty events in the massive microblog data.The model will assist decision-makers to monitor microblogs and guide public opinions and will minimize the negative effect of bursty events to society.Firstly,the model slices the microblog data according to the time information.In each time window,the word frequency feature,the topic tag feature and the word frequency growth rate feature of each word are calculated separately.Then,the D-S evidence theory and the analytic hierarchy process are utilized to determine each word's feature weights,which are then merged to obtain the bursty feature value of the word.Words with large bursty feature value are selected to form the bursty feature word set and to construct a coupling degree matrix of bursty feature word set based on co-occurrence degree and tightness.Finally,the coupling degree matrix is used as the input of the hierarchical agglomerative clustering algorithm to generate a binary tree with bursty words being leaf nodes,and the internal similarity binary tree pruning algorithm is used to divide the clustering results.In this way,the detection of the corresponding time window's bursty events can be realized.The experimental results show that the event detection model based on bursty words has the best effect when the intra-cluster similarity threshold is 1.1,the correct rate is as high as 0.8462,the recall rate reaches 0.8684,and the F value is 0.8571,indicating the effectiveness of the proposed method.
Keywords:microblog  bursty events  bursty feature words  D-S evidence theory  hierarchical agglomerative clustering  
本文献已被 万方数据 等数据库收录!
点击此处可从《电子学报》浏览原始摘要信息
点击此处可从《电子学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号