首页 | 本学科首页   官方微博 | 高级检索  
     

基于新浪微博的冰雹实况信息挖掘
引用本文:王 萍,王贺颖. 基于新浪微博的冰雹实况信息挖掘[J]. 计算机与现代化, 2016, 0(3): 24-2886. DOI: 10.3969/j.issn.1006-2475.2016.03.006
作者姓名:王 萍  王贺颖
基金项目:天津市自然科学基金资助项目(14JCYBJC21800)
摘    要:为更便捷快速地从新浪微博数据中得到准确的冰雹实况信息,设计并实现一个3层次识别系统,即通过网络爬虫技术的含“冰雹”微博的一次识别、基于分类器的降雹事件的二次识别和基于规则的冰雹实况要素的三次识别。为提高降雹事件的识别性能,新增一个用于提取样本特征的评估函数,提出使用多评估函数共同确定特征向量的方法,给出基于3分类器的组合分类方案。测试结果表明,该方法能够将隐含在新浪微博中的降雹事件的89.5%提取出来,误识信息低于13.4%;对冰雹事件中冰雹实况单要素的提取率超过96.0%,误识信息低于8.6%。

关 键 词:微博  冰雹实况  特征提取  文本分类  文本要素识别  网络爬虫  
收稿时间:2016-03-17

Hail Information Extraction Based on Sina Weibo
WANG Ping,WANG He-ying. Hail Information Extraction Based on Sina Weibo[J]. Computer and Modernization, 2016, 0(3): 24-2886. DOI: 10.3969/j.issn.1006-2475.2016.03.006
Authors:WANG Ping  WANG He-ying
Abstract:To obtain accurate hail information more easily and quickly, a three-level identification is designed, which is the first identification of microblog containing “hail” through Web crawler technology, the second identification of hail events based on classifier and the third identification of hail element information based on rules. In order to improve identification performance of hail events, an assessment function for extracting features is added, and a multi-assessment function to determine the feature vectors is proposed. Then a scheme based on combination of three classifiers is given. The test results show that hail events extraction rate is 89.5% by the presented method, mistaken identification rate is less than 13.4%; hail element information extraction rate is more than 96.0%, mistaken identification rate is less than 8.6%.
Keywords:microblog  hail information  feature extraction  text classification  text elements recognition  Web crawler  
点击此处可从《计算机与现代化》浏览原始摘要信息
点击此处可从《计算机与现代化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号