首页 | 本学科首页   官方微博 | 高级检索  
     

面向食源性疾病爆发检测的大数据分析方法
引用本文:冯亚伟,黎建辉. 面向食源性疾病爆发检测的大数据分析方法[J]. 数据与计算发展前沿, 2016, 7(5): 52-58. DOI: 10.11871/j.issn.1674-9480.2016.05.007
作者姓名:冯亚伟  黎建辉
作者单位:1. 中国科学院计算机网络信息中心,北京 100190。;2. 中国科学院大学,北京 100049。
摘    要:食源性疾病是指通过摄食而进入人体的有毒有害物质 (包括生物性病原体) 等致病因子所造成的疾病,一般具有感染性或中毒性。爆发检测的目标是根据收集得到的医院病例数据判断是否有同构性的食源性疾病爆发。本文采用基于并查集 [1] 的 MPI 并行 [2] DBSCAN 算法来检测聚集性的食源性疾病爆发,可以在 1 分钟完成 25 万病例数据集的聚集性爆发检测,比原始的单机 DBSCAN 算法速度快了 100 倍。本文设计了 S-K-CPS 算法 (Spark 下的 [3] K-CPS 算法) 来检测食源性病例数据的散发性爆发,能够实现对于 100 万病例数据规模的检测,并在速度上比 K-CPS 算法快了 10 倍左右。

关 键 词:DBSCAN  K-CPS  Spark  并行化  食源性疾病  MPI  

Big Data Analysis Method for Foodborne Disease Outbreak Detection
Feng Yawei,Li Jianhui. Big Data Analysis Method for Foodborne Disease Outbreak Detection[J]. Frontiers of Data & Computing, 2016, 7(5): 52-58. DOI: 10.11871/j.issn.1674-9480.2016.05.007
Authors:Feng Yawei  Li Jianhui
Affiliation:1. Computer Network Information Center Chinese Academy of Sciences, Beijing, 100190.;2. University of Chinese Academy of Sciences, Beijing, 100490.
Abstract:Foodborne diseases are diseases caused by pathogens such as toxic and harmful substances (including biological pathogens) that enter the body through ingestion. They are generally infectious or toxic. The goal of the outbreak test is to determine whether there is an isomorphic foodborne disease outbreak based on the collected hospital case data. In this paper, the parallel DBSCAN algorithm with disjont-set data structure under MPI is used to detect the outbreak of aggregated foodborne disease. It can finish detecting 250,000 records in only one minute, runs 100 times faster than local DBSCAN algorithm. In this paper, SK-CPS (Spark K-CPS) is designed to detect the sporadic outbreak of foodborne cases data, which can achieve the detection of 1 million cases of data size and faster than the K-CPS algorithm about 10 times.
Keywords:DBSCAN  K-CPS  Spark  parallel  foodborne diseases  MPI  
点击此处可从《数据与计算发展前沿》浏览原始摘要信息
点击此处可从《数据与计算发展前沿》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号