首页 | 本学科首页   官方微博 | 高级检索  
     

大数据环境下基于元模型控制的数据质量保障技术研究
引用本文:杨冬菊,徐晨阳.大数据环境下基于元模型控制的数据质量保障技术研究[J].计算机工程与科学,2019,41(2):197-206.
作者姓名:杨冬菊  徐晨阳
作者单位:(1.大规模流数据集成与分析技术北京市重点实验室,北京 100144; 2.北方工业大学云计算研究中心,北京 100144)
基金项目:国家自然科学基金重点项目(61832004)
摘    要:数据集成环节,越来越丰富的异构源数据给集成后数据质量的提升带来了新的挑战和困难。针对传统ETL模型在数据集成后出现的数据冗余、无效、重复、缺失、不一致、错误值及格式出错等数据质量问题,提出了基于元数据模型控制的ETL集成模型,并对数据集成过程中的各种映射规则进行了详细的定义,通过将抽取、转换、加载环节的元模型和映射机制相结合,能够有效地保证集成后数据的数据质量。提出的元模型已经应用到科技资源管理数据集成业务中。通过科技资源管理数据集成实例分析,验证了此数据集成方案能够有效地支撑大数据环境下数据仓库的构建和集成后数据质量的提升。

关 键 词:大数据  数据仓库  ETL  元数据模型  映射  数据集成  
收稿时间:2018-08-10
修稿时间:2019-02-25

Data quality assurance based on metamodel control in big data environment
YANG Dong ju,XU Chen yang.Data quality assurance based on metamodel control in big data environment[J].Computer Engineering & Science,2019,41(2):197-206.
Authors:YANG Dong ju  XU Chen yang
Affiliation:(1.Beijing Key Laboratory on Integration and Analysis of Large Scale Stream Data,Beijing 100144; 2.Research Center for Cloud Computing,North China University of Technology,Beijing 100144,China)  
Abstract:In data integration process, more and more heterogeneous data sources bring new challenges and difficulties to the improvement of data quality after integration. Aiming at the data quality problems, such as data redundancy, invalidity, duplication, missing, inconsistency, error value and format error of the traditional ETL model after data integration, we propose an ETL integration model based on metadata model control. The mapping rules are defined in detail. By combining the metamodel and the mapping mechanism in extraction, transformation and loading phases, we can effectively guarantee the quality of integrated data. The proposed metamodel has been applied to the data integration business of scientific and technological resource management. The analysis on data integration examples of scientific and technological resources management shows that this data integration solution can effectively support the construction of data warehouses in the big data environment and improve data quality after integration.
Keywords:big data  data warehouse  ETL  metadata model  mapping  data integration  
点击此处可从《计算机工程与科学》浏览原始摘要信息
点击此处可从《计算机工程与科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号