首页 | 本学科首页   官方微博 | 高级检索  
     

异构模式中关联数据的一致性规则发现方法
引用本文:杜岳峰, 李晓光, 宋宝燕. 异构模式中关联数据的一致性规则发现方法[J]. 计算机研究与发展, 2020, 57(9): 1939-1948. DOI: 10.7544/issn1000-1239.2020.20190570
作者姓名:杜岳峰  李晓光  宋宝燕
作者单位:1.(辽宁大学信息学院 沈阳 110136) (duyuefeng@lnu.edu.cn)
基金项目:国家自然科学基金;辽宁公共舆情与网络安全大数据系统工程实验室专项;辽宁省自然科学基金
摘    要:数据一致性是数据质量管理的一项核心事务.规则约束作为一种抽象化、形式化的数据关系表达技术,可以有效地进行数据一致性管理.但是,在进行多源数据一致性管理的过程中,由于异源数据所属的关系模式不同,给一致性规则融合带来了挑战.另外,不论同源数据还是异源数据,数据之间是相互关联的,可以利用这种关系强化规则约束中语义含义的表达作用,发现数据中的潜在错误.具体地,条件包含依赖(conditional inclusion dependencies, CINDs)和内容相关的条件函数依赖(content-related conditional functional dependencies, CCFDs)可以分别用于异构模式的属性匹配和内容关联数据的一致性维护.基于此,对面向异构关系模式中关于关联数据的一致性规则发现问题进行研究.首先,针对使用CINDs进行异构模式中CCFDs规则发现的基本问题进行分析,对规则发现的可满足性、蕴含性和可验证性问题进行解释,它们分别满足NP-complete,coNP-complete,PTIME的复杂性判定问题.其次,为了对规则空间内的全部CCFDs进行发现,以CCFDs中的条件属性和变量属性为划分依据,提出了一种2级lattice的搜索结构.再次,设计了一种基于CINDs和CCFDs的异构关联数据一致性规则发现方法,使用CINDs对规则形式进行融合,而后通过增量发现方式查找一致性规则.最后,通过在2组真实数据进行实验,验证了方法的有效性和高效性.

关 键 词:异构关系模式  关联数据  条件包含依赖  内容相关的条件函数依赖  规则发现

Discovering Consistency Constraints for Associated Data on Heterogeneous Schemas
Du Yuefeng, Li Xiaoguang, Song Baoyan. Discovering Consistency Constraints for Associated Data on Heterogeneous Schemas[J]. Journal of Computer Research and Development, 2020, 57(9): 1939-1948. DOI: 10.7544/issn1000-1239.2020.20190570
Authors:Du Yuefeng  Li Xiaoguang  Song Baoyan
Affiliation:1.(Information College, Liaoning University, Shenyang 110136)
Abstract:Data consistency is a central issue of data quality management. With capability of expressing data relationship abstractly and formally, constraints are a technique for data consistency management. However, the diversity on heterogeneous schemas from multi-source brings great challenges to data consistency management, especially for constraints fusion. Besides, for both data from single-sources and multi-sources, they are related. These relationships can be used to strengthen the expression of constraints for semantics, which helps to probe potential data error. In practice, CINDs (conditional inclusion dependencies) and CCFDs (content-related conditional functional dependencies) are two effective techniques respectively for attributes match under heterogeneous schemas and consistency maintenance on content-related data. Based on this, we study how to discover consistency constraints for associated data on heterogeneous schemas. We firstly investigate the three fundamental problems related to CCFDs discovery. And we also illustrate that the implication, satisfiability and validation problems are NP-complete, coNP-complete, PTIME. Aiming at searching for the CCFDs in the space entirely, we present 2-level lattice according to the division between the conditional attribute set and the variable attribute set of CCFDs. After that an incremental method of discovering the fusion constraints over CINDs and CCFDs is proposed, which combines CCFDs on heterogeneous schemas via CINDs. Finally, our method is experimentally verified effectively and scalablely by using two real-life data.
Keywords:heterogeneous schemas  associated data  CINDs (conditional inclusion dependencies)  CCFDs (content-related conditional functional dependencies)  constraints discovery
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机研究与发展》浏览原始摘要信息
点击此处可从《计算机研究与发展》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号