首页 | 本学科首页   官方微博 | 高级检索  
     


Representing and processing lineages over uncertain data based on the Bayesian network
Affiliation:1. Department of Electrical & Electronics Engineering, Faculty of Engineering, Erciyes University, 38039 Kayseri, Turkey;2. Department of Electrical & Electronics Engineering, Faculty of Engineering, Bartin University, 74100 Bartin, Turkey;1. Institute of Advanced Technology, Nanjing University of Posts and Telecommunications, Nanjing 210023, PR China;2. School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China;3. School of Automation, Nanjing University of Posts and Telecommunications, Nanjing 210023, PR China;1. Department of Mechanical Engineering, University of Texas at Austin, USA;2. Department of Mechatronics & Control Engineering, University of Engineering & Technology, Lahore, Pakistan;1. Ankara University, Faculty of Science, Statistics Department, Tando?an, Ankara, Turkey;2. Mu?la S?tk? Koçman University, Faculty of Science, Statistics Department, Kötekli, Mu?la, Turkey
Abstract:Processing lineages (also called provenances) over uncertain data consists in tracing the origin of uncertainty based on the process of data production and evolution. In this paper, we focus on the representation and processing of lineages over uncertain data, where we adopt Bayesian network (BN), one of the popular and important probabilistic graphical models (PGMs), as the framework of uncertainty representation and inferences. Starting from the lineage expressed as Boolean formulae for SPJ (Selection–Projection–Join) queries over uncertain data, we propose a method to transform the lineage expression into directed acyclic graphs (DAGs) equivalently. Specifically, we discuss the corresponding probabilistic semantics and properties to guarantee that the graphical model can support effective probabilistic inferences in lineage processing theoretically. Then, we propose the function-based method to compute the conditional probability table (CPT) for each node in the DAG. The BN for representing lineage expressions over uncertain data, called lineage BN and abbreviated as LBN, can be constructed while generally suitable for both safe and unsafe query plans. Therefore, we give the variable-elimination-based algorithm for LBN's exact inferences to obtain the probabilities of query results, called LBN-based query processing. Then, we focus on obtaining the probabilities of inputs or intermediate tuples conditioned on query results, called LBN-based inference query processing, and give the Gibbs-sampling-based algorithm for LBN's approximate inferences. Experimental results show the efficiency and effectiveness of our methods.
Keywords:Uncertain data  Lineage  Inference query  Probabilistic graphical model  Bayesian network  Approximate inference
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号