首页 | 本学科首页   官方微博 | 高级检索  
     检索      

数据质量多种性质的关联关系研究
引用本文:丁小欧,王宏志,张笑影,李建中,高宏.数据质量多种性质的关联关系研究[J].软件学报,2016,27(7):1626-1644.
作者姓名:丁小欧  王宏志  张笑影  李建中  高宏
作者单位:哈尔滨工业大学计算机科学与技术学院, 哈尔滨 150000,哈尔滨工业大学计算机科学与技术学院, 哈尔滨 150000,哈尔滨工业大学计算机科学与技术学院, 哈尔滨 150000,哈尔滨工业大学计算机科学与技术学院, 哈尔滨 150000,哈尔滨工业大学计算机科学与技术学院, 哈尔滨 150000
基金项目:国家973 计划(编号:2012CB316200);国家自然科学基金项目(编号:61472099,61133002)
摘    要:信息化时代数据海量增长的同时,用户需要利用多种指标从不同性质方面对数据质量进行评价和改善.但在目前数据质量管理过程中,影响数据可用性的多种重要因素并非完全孤立,在评估机制和指导数据清洗规则时,彼此会发生关联.本文研究了在实际信息系统中适用的综合性数据质量评估方法,将文献所提出以及在实际的信息系统中常用的数据质量性质指标,按其定义与性质进行归纳总结,提出了基于性质的数据质量综合评估框架.随后针对影响数据可用性的四个重要性质:精确性、完整性、一致性、时效性整理出在数据集合上的操作方法,并逐一介绍其违反模式的定义,随后给出其具体关系证明,进而确定数据质量多维关联关系评估策略,并通过实验验证了该策略的有效性.

关 键 词:数据质量  数据质量性质  多性质关系  数据清洗  数据管理
收稿时间:2015/10/10 0:00:00
修稿时间:2016/1/12 0:00:00

Association Relationships Study of Multi-Dimensional Data Quality
DING Xiao-Ou,WANG Hong-Zhi,ZHANG Xiao-Ying,LI Jian-Zhong and GAO Hong.Association Relationships Study of Multi-Dimensional Data Quality[J].Journal of Software,2016,27(7):1626-1644.
Authors:DING Xiao-Ou  WANG Hong-Zhi  ZHANG Xiao-Ying  LI Jian-Zhong and GAO Hong
Institution:School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China,School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China,School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China,School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China and School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Abstract:Recently, with therapid growth of data amount, users are using a variety of indicators to evaluate and improve the quality of data from different dimensions. During the course of data quality management, it is found that many important factors that influence the data availability are not completely isolated. In the evaluation mechanism which can guide data cleaning rules, these dimensions may be associated with each other. In this paper, we discuss quite a few data quality dimensions researched in the literature as well as being used in the real information system, according to which, the definition and properties of the dimensions are summarized. In addition, a multi-dimensional data quality assessment framework is proposed. According to the four important properties of data availability: accuracy, completeness, consistency, currency, the operation method and the relationshipsamong them on the data set are discussed, after which, multi-dimensional data quality accessment strategy is created.The effctiveness of the proposed strategy is verified by experiments.
Keywords:data quality  data quality dimensions  relationship among dimensions  data cleaning  data management
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号