首页 | 本学科首页   官方微博 | 高级检索  
     

基于词向量的中文事件发现及表示
引用本文:张斌,胡琳梅,侯磊,李涓子.基于词向量的中文事件发现及表示[J].模式识别与人工智能,2018,31(3):275-282.
作者姓名:张斌  胡琳梅  侯磊  李涓子
作者单位:1.清华大学 计算机科学与技术系 知识工程研究室 北京 100084
基金项目:国家重点基础研究发展计划(973计划)(No.2014CB340504)、国家自然科学基金重点项目(No.61533018,61661146007)、教育部在线教育研究中心在线教育研究基金(No.2016ZD102)、清华-新加坡国立大学下一代搜索联合研究中心项目资助
摘    要:已有的事件发现方法主要基于词频-逆文档频率文档表示,维度较高,语义稀疏,效率和准确率都较低,不适用于大规模在线新闻事件发现.因此,文中提出基于词向量的文档表示方法,降低文档表示维度,缓解语义稀疏问题,提高文档相似度计算效率和准确性.基于该文档表示方法,提出动态在线新闻聚类方法,用于在线新闻事件发现,同时提高事件发现的准确率和召回率.在标准数据集TDT4和真实数据集上的实验表明,相比当前通用的基线方法,文中方法在时间效率和事件质量上都有显著提高.

关 键 词:词向量  事件发现  动态在线聚类  
收稿时间:2017-09-26

Word Embedding Based Chinese News Event Detection and Representation
ZHANG Bin,HU Linmei,HOU Lei,LI Juanzi.Word Embedding Based Chinese News Event Detection and Representation[J].Pattern Recognition and Artificial Intelligence,2018,31(3):275-282.
Authors:ZHANG Bin  HU Linmei  HOU Lei  LI Juanzi
Affiliation:1.Knowledge Engineering Group, Department of Computer Science and Technology, Tsinghua University, Beijing 100084
Abstract:Existing methods of event detection are mainly based on traditional TF-IDF document representation with high dimension and sparse semantics, leading to low efficiency and accuracy. Thus, they are not suitable for large-scale online news event detection. A document representation method based on word embedding is proposed in this paper. By the document representation method, the document representation dimension is reduced, the semantic sparse problem is alleviated and the efficiency and accuracy of document similarity calculation are enhanced. Based on the document representation method, a dynamic online clustering method is proposed for online news event detection. Based on the dynamic online clustering method, both the accuracy and the recall of event detection are improved. Experiments on the standard dataset TDT4 and a real dataset show that the proposed adaptive online event detection method significantly improves the performance of event detection in both efficiency and accuracy compared with the state-of-the-art methods.

Keywords:Word Embedding  Event Detection  Dynamic Online Clustering  
点击此处可从《模式识别与人工智能》浏览原始摘要信息
点击此处可从《模式识别与人工智能》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号