首页 | 本学科首页   官方微博 | 高级检索  
     

Spark下的分布式粗糙集属性约简算法
引用本文:章夏杰,朱敬华,陈杨.Spark下的分布式粗糙集属性约简算法[J].计算机应用,2020,40(2):518-523.
作者姓名:章夏杰  朱敬华  陈杨
作者单位:黑龙江大学 计算机科学技术学院,哈尔滨 150080
黑龙江省数据库与并行计算重点实验室,哈尔滨 150080
基金项目:黑龙江省自然科学基金面上项目(F2018028)
摘    要:属性约简(特征选择)作为数据预处理的重要环节,大多以属性依赖作为筛选属性子集的标准。设计了一种快速依赖计算方法FDC,通过直接寻找基于相对正域的对象来计算依赖度,而不需要预先求出相对正域,相比传统方法在速度上有明显的性能提升。另外,改进鲸鱼优化算法(WOA)使其能够有效应用于粗糙集属性约简。结合上述两个方法,提出一种基于Spark的分布式粗糙集属性约简算法SP-WOFRST,并在两组人工合成的大数据集上与另一种基于Spark的粗糙集属性约简算法SP-RST进行对比实验。实验结果表明所提出的SP-WOFRST算法在精度和速度上均优于SP-RST。

关 键 词:粗糙集  Apache  Spark  鲸鱼优化算法  特征选择  属性约简  
收稿时间:2019-08-30
修稿时间:2019-09-26

Distributed rough set attribute reduction algorithm under Spark
Xiajie ZHANG,Jinghua ZHU,Yang CHEN.Distributed rough set attribute reduction algorithm under Spark[J].journal of Computer Applications,2020,40(2):518-523.
Authors:Xiajie ZHANG  Jinghua ZHU  Yang CHEN
Affiliation:School of Computer Science and Technology,Heilongjiang University,Harbin Heilongjiang 150080,China
Key Laboratory of Database and Parallel Computing of Heilongjiang Province,Harbin Heilongjiang 150080,China
Abstract:Attribute reduction (feature selection) is an important part of data preprocessing. Most of attribute reduction methods use attribute dependence as the criterion for filtering attribute subsets. A Fast Dependence Calculation (FDC) method was designed to calculate the dependence by directly searching for the objects based on relative positive domains. It is not necessary to find the relative positive domain in advance, so that the method has a significant performance improvement in speed compared with the traditional methods. In addition, the Whale Optimization Algorithm (WOA) was improved to make the calculation method effective for rough set attribute reduction. Combining the above two methods, a distributed rough set attribute reduction algorithm based on Spark named SP-WOFRST was proposed, which was compared with a Spark-based rough set attribute reduction algorithm named SP-RST on two synthetical large data sets. Experimental results show that the proposed SP-WOFRST algorithm is superior to SP-RST in accuracy and speed.
Keywords:rough set                                                                                                                        Apache Spark                                                                                                                        Whale Optimization Algorithm (WOA)                                                                                                                        feature selection                                                                                                                        attribute reduction
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号