首页 | 本学科首页   官方微博 | 高级检索  
     

基于相似连接的多源数据并行预处理方法
引用本文:郭方方,潮洛蒙,朱建文.基于相似连接的多源数据并行预处理方法[J].计算机应用,2019,39(1):57-60.
作者姓名:郭方方  潮洛蒙  朱建文
作者单位:哈尔滨工程大学计算机科学与技术学院,哈尔滨,150001;哈尔滨工程大学计算机科学与技术学院,哈尔滨,150001;哈尔滨工程大学计算机科学与技术学院,哈尔滨,150001
基金项目:国家科技重大专项(2016ZX03001023-005);国家级产学研合作项目(2016ZTE01-03-06);中央高校基本科研业务费专项(HEUCF100601)。
摘    要:大规模网络环境和大数据相关技术的发展对传统数据融合分析技术提出了新的挑战。针对目前多源数据融合分析过程灵活性差、处理效率低的问题,提出了一种基于相似连接的多源数据并行预处理方法,该方法采用了分治和并行的思想。首先,通过对多源数据中的相似语义进行统一、对个性语义进行保留的预处理方法提高了灵活性;其次,提出了一种改进的并行MapReduce框架,提高了相似连接的效率。实验结果表明,所提方法在保证数据完整性的基础上,使总的数据量减小了32%。与传统的MapReduce框架相比,改进后的框架在耗费时间方面减小了43. 91%,因此该方法可以有效提高多源数据融合分析的效率。

关 键 词:网络安全  多源数据  数据预处理  相似连接  MAPREDUCE
收稿时间:2018-07-19
修稿时间:2018-09-25

Multi-source data parallel preprocessing method based on similar connection
GUO Fangfang,CHAO Luomeng,ZHU Jianwen.Multi-source data parallel preprocessing method based on similar connection[J].journal of Computer Applications,2019,39(1):57-60.
Authors:GUO Fangfang  CHAO Luomeng  ZHU Jianwen
Affiliation:School of Computer Science and Technology, Harbin Engineering University, Harbin Heilongjiang 150001, China
Abstract:With the development of large-scale network environments and big data-related technologies, traditional data fusion analysis technology faces new challenges. Focusing on poor flexibility and low processing efficiency in current multi-source data fusion analysis process, a multi-source data parallel preprocessing method based on similar connection was proposed, in which the idea of dividing and conquering and paralleling was adopted. Firstly, the preprocessing method was improved to increase the flexibility by unifying similar semantics in multi-source data and retaining personality semantics. Secondly, an improved parallel MapReduce framework was proposed to improve the efficiency of similar connections. The experimental results show that the proposed method reduces total data volume by 32% while ensuring data integrity. Compared with traditional MapReduce framework, the improved framework decreases 43.91% of time consumed; therefore, the proposed method can effectively improve the efficiency of multi-source data fusion analysis.
Keywords:network security                                                                                                                        multi-source data                                                                                                                        data preprocessing                                                                                                                        similar connection                                                                                                                        MapReduce
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号