基于Spark的油藏数据挖掘与分析 Reservoir Data Mining and Analysis Based on Spark期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于Spark的油藏数据挖掘与分析

引用本文：	武志军,夏盛瑜,王鹏.基于Spark的油藏数据挖掘与分析[J].计算机系统应用,2017,26(8):9-15.

作者姓名：	武志军夏盛瑜王鹏

作者单位：	中国石油大学(华东) 计算机与通信工程学院, 青岛 266580,中国石油大学(华东) 计算机与通信工程学院, 青岛 266580,中国石油大学(华东) 计算机与通信工程学院, 青岛 266580

摘要：	为了方便油藏数据特征的分析和石油的勘探开发过程，本文利用Spark并行计算框架分析油藏数据，并通过数据挖掘算法分析油藏属性之间的潜在关系，对油藏的不同层段进行了分类和预测.本文的主要工作包括：搭建Spark分布式集群和数据处理、分析平台，Spark是流行的大数据并行计算框架，相对传统的一些分析方法和工具，可以实现快速、准确的数据挖掘任务；根据油藏数据的特点建立多维异常检测函数，并新增渗孔比判别属性Pr；在处理不平衡数据时，针对逻辑回归分类提出交叉召回训练模型，并优化代价函数，针对决策树，提出KR-SMOTE对小类别样本进行过采样扩充，这两种方法都可以有效处理数据不平衡问题，提高分类精度.
关键词：	Spark 数据挖掘异常点检测不平衡数据分类
收稿时间：	2016/12/9 0:00:00
Reservoir Data Mining and Analysis Based on Spark

WU Zhi-Jun,XIA Sheng-Yu and WANG Peng.Reservoir Data Mining and Analysis Based on Spark[J].Computer Systems& Applications,2017,26(8):9-15.

Authors:	WU Zhi-Jun XIA Sheng-Yu and WANG Peng

Affiliation:	Computer and Communication Engineering, China University of Pertroleum, Qingdao 266580, China,Computer and Communication Engineering, China University of Pertroleum, Qingdao 266580, China and Computer and Communication Engineering, China University of Pertroleum, Qingdao 266580, China

Abstract:	In order to improve the analysis of reservoir properties and oil exploration and development process, this paper analyzes data and finds relationships between reservoir properties using Spark parallel computing framework and data mining algorithm, and classifies and predicts different reservoir segments. The main work in this paper includes: building the Spark distributed clustering and data processing and analysis platform, Spark being a popular big data parallel computing framework, which can achieve fast and accurate data mining tasks compared with some traditional analysis methods and tools; establishing a multidimensional outlier detection function according to the characteristics of reservoir data and adding a new discriminant attribute Pr; proposing a cross-recall training model and optimized cost function for logistic regression classification in dealing with the imbalanced data. KR-SMOTE is used to oversample for decession tree classification that both improve the classification precision.

Keywords:	Spark data mining outlier detection imbalanced data classification

	点击此处可从《计算机系统应用》浏览原始摘要信息
	点击此处可从《计算机系统应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏