首页 | 本学科首页   官方微博 | 高级检索  
     

一种大数据交互式挖掘框架与实现
引用本文:王锐君,黎建辉. 一种大数据交互式挖掘框架与实现[J]. 数据与计算发展前沿, 2016, 7(6): 61-68. DOI: 10.11871/j.issn.1674-9480.2016.06.007
作者姓名:王锐君  黎建辉
作者单位:1. 中国科学院计算机网络信息中心,北京 100190;2. 中国科学院大学,北京 100049
摘    要:在传统的数据挖掘过程中,用户需根据专业知识对数据进行预处理,为模型设定参数后构建模型,通过评估指标判断模型是否可行。该过程的不便性体现在模型以黑盒的方式构建,用户不可见其中间过程,模型产生的结果也不易被理解。在海量数据的环境下,传统数据挖掘过程在预处理时异常数据的定位和模型生成后知识的表达方面都有不便。为了解决传统数据挖掘过程存在的问题,本文提出了一种大数据环境下的交互式数据挖掘框架。该框架使交互贯穿整个数据挖掘的过程,使得用户可以轻松定位异常输入源数据,参与模型训练过程,对模型生成的结果溯源。本文还基于Spark对该框架进行了实现,并在食源性疾病爆发预测场景下验证了其可行性。

关 键 词:交互式挖掘   大数据   Spark  
收稿时间:2016-07-10

A Framework and Implementation of Big Data Interactive Mining
Wang Ruijun,Li Jianhui. A Framework and Implementation of Big Data Interactive Mining[J]. Frontiers of Data & Computing, 2016, 7(6): 61-68. DOI: 10.11871/j.issn.1674-9480.2016.06.007
Authors:Wang Ruijun  Li Jianhui
Affiliation:1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190;2. University of Chinese Academy of Sciences, Beijing 100049
Abstract:In the traditional data mining process, the dataare pre-processedbasedon the professional knowledge first, then the algorithm parameters are set and are verified.Finally the feasibility ofmodel is determined according to the evaluation of output of data procession. However, there is inconvenience in the process.For example, the model is built in black-box way, the intermediate results can’t be seen, and the final results are not easy to be interpreted. In order to solve the problem in traditional data mining process, this paper presents an interactive data mining framework in large data environment. The framework supports the interaction through the entire data mining process, so that users can easily locate the abnormal input data , participate in the model training process, trace the final results. This paper also implements the framework based on Spark and validates it in the food borne outbreak prediction scenario.
Keywords:interactive mining   big data   Spark  
点击此处可从《数据与计算发展前沿》浏览原始摘要信息
点击此处可从《数据与计算发展前沿》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号