首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
Processing lineages (also called provenances) over uncertain data consists in tracing the origin of uncertainty based on the process of data production and evolution. In this paper, we focus on the representation and processing of lineages over uncertain data, where we adopt Bayesian network (BN), one of the popular and important probabilistic graphical models (PGMs), as the framework of uncertainty representation and inferences. Starting from the lineage expressed as Boolean formulae for SPJ (Selection–Projection–Join) queries over uncertain data, we propose a method to transform the lineage expression into directed acyclic graphs (DAGs) equivalently. Specifically, we discuss the corresponding probabilistic semantics and properties to guarantee that the graphical model can support effective probabilistic inferences in lineage processing theoretically. Then, we propose the function-based method to compute the conditional probability table (CPT) for each node in the DAG. The BN for representing lineage expressions over uncertain data, called lineage BN and abbreviated as LBN, can be constructed while generally suitable for both safe and unsafe query plans. Therefore, we give the variable-elimination-based algorithm for LBN's exact inferences to obtain the probabilities of query results, called LBN-based query processing. Then, we focus on obtaining the probabilities of inputs or intermediate tuples conditioned on query results, called LBN-based inference query processing, and give the Gibbs-sampling-based algorithm for LBN's approximate inferences. Experimental results show the efficiency and effectiveness of our methods.  相似文献   

2.
Databases with uncertainty and lineage   总被引:2,自引:0,他引:2  
This paper introduces uldbs, an extension of relational databases with simple yet expressive constructs for representing and manipulating both lineage and uncertainty. Uncertain data and data lineage are two important areas of data management that have been considered extensively in isolation, however many applications require the features in tandem. Fundamentally, lineage enables simple and consistent representation of uncertain data, it correlates uncertainty in query results with uncertainty in the input data, and query processing with lineage and uncertainty together presents computational benefits over treating them separately. We show that the uldb representation is complete, and that it permits straightforward implementation of many relational operations. We define two notions of uldb minimality—data-minimal and lineage-minimal—and study minimization of uldb representations under both notions. With lineage, derived relations are no longer self-contained: their uncertainty depends on uncertainty in the base data. We provide an algorithm for the new operation of extracting a database subset in the presence of interconnected uncertainty. We also show how uldbs enable a new approach to query processing in probabilistic databases. Finally, we describe the current state of the Trio system, our implementation of uldbs under development at Stanford. This work was supported by the National Science Foundation under grants IIS-0324431, IIS-1098447, and IIS-9985114, by DARPA Contract #03-000225, and by a grant from the Boeing Corporation.  相似文献   

3.
概率图模型推理方法的研究进展   总被引:1,自引:0,他引:1  
近年来概率图模型已成为不确定性推理的研究热点,在人工智能、机器学习与计算机视觉等领域有广阔的应用前景.根据网络结构与查询问题类型的不同,系统地综述了概率图模型的推理算法.首先讨论了贝叶斯网络与马尔可夫网络中解决概率查询问题的精确推理算法与近似推理算法,其中主要介绍精确推理中的VE算法、递归约束算法和团树算法,以及近似推理中的变分近似推理和抽样近似推理算法,并给出了解决MAP查询问题的常用推理算法;然后分别针对混合网络的连续与混合情况阐述其推理算法,并分析了暂态网络的精确推理、近似推理以及混合情况下的推理;最后指出了概率图模型推理方法未来的研究方向.  相似文献   

4.
概率图模型学习技术研究进展   总被引:10,自引:5,他引:5  
概率图模型能有效处理不确定性推理,从样本数据中准确高效地学习概率图模型是其在实际应用中的关键问题.概率图模型的表示由参数和结构两部分组成,其学习算法也相应分为参数学习与结构学习.本文详细介绍了基于概率图模型网络的参数学习与结构学习算法,并根据数据集是否完备而分别讨论各种情况下的参数学习算法,还针对结构学习算法特点的不同把结构学习算法归纳为基于约束的学习、基于评分搜索的学习、混合学习、动态规划结构学习、模型平均结构学习和不完备数据集的结构学习.并总结了马尔科夫网络的参数学习与结构学习算法.最后指出了概率图模型学习的开放性问题以及进一步的研究方向.  相似文献   

5.
陈亚端  廖士中 《计算机科学》2010,37(10):207-210,245
Ising图模型概率推理的主要工作是通过变量求和来计算配分函数和边缘概率分布。传统计算复杂性理论证明Ising图模型精确概率推理是NP难的,并且Ising图模型近似概率推理是NP难的。研究了Ising图模型精确概率推理和Ising均值场近似概率推理的参数化复杂性。首先证明了不同参数的Ising图模型概率推理的参数化复杂性定理,指出基于变量个数或图模型树宽的参数化概率推理问题是固定参数可处理的。然后证明了Ising均值场的参数化复杂性定理,指出基于自由分布树宽、迭代次数和变量个数的参数化Icing均值场是固定参数可处理的;进一步,当Ising图模型参数满足Ising均值场迭代式压缩条件时,基于自由分布树宽和迭代次数的参数化Ising均值场是固定参数可处理的。  相似文献   

6.
Preface          下载免费PDF全文
Database and Artificial Intelligence (AI) can benefit from each other. On the one hand, AI can make database more intelligent (AI4DB) by exploiting learning-based techniques. On the other hand, database techniques can optimize AI models (DB4AI), such as reducing the complexity of using AI models and accelerating the deployment of AI algorithms. In this special section, we discuss 1) how to exploit AI or machine learning techniques for index design, performance tuning, query processing in database systems, and 2) how to utilize database and data management techniques to make AI models more reusable and more tolerant to dirty data.  相似文献   

7.
Query optimizers rely on statistical models that succinctly describe the underlying data. Models are used to derive cardinality estimates for intermediate relations, which in turn guide the optimizer to choose the best query execution plan. The quality of the resulting plan is highly dependent on the accuracy of the statistical model that represents the data. It is well known that small errors in the model estimates propagate exponentially through joins, and may result in the choice of a highly sub-optimal query execution plan. Most commercial query optimizers make the attribute value independence assumption: all attributes are assumed to be statistically independent. This reduces the statistical model of the data to a collection of one-dimensional synopses (typically in the form of histograms), and it permits the optimizer to estimate the selectivity of a predicate conjunction as the product of the selectivities of the constituent predicates. However, this independence assumption is more often than not wrong, and is considered to be the most common cause of sub-optimal query execution plans chosen by modern query optimizers. We take a step towards a principled and practical approach to performing cardinality estimation without making the independence assumption. By carefully using concepts from the field of graphical models, we are able to factor the joint probability distribution over all the attributes in the database into small, usually two-dimensional distributions, without a significant loss in estimation accuracy. We show how to efficiently construct such a graphical model from the database using only two-way join queries, and we show how to perform selectivity estimation in a highly efficient manner. We integrate our algorithms into the PostgreSQL DBMS. Experimental results indicate that estimation errors can be greatly reduced, leading to orders of magnitude more efficient query execution plans in many cases. Optimization time is kept in the range of tens of milliseconds, making this a practical approach for industrial-strength query optimizers.  相似文献   

8.
Currently, a good portion of datasets on Internet are accessed through data services, where user’s queries are answered as a composition of multiple data services. Defining the semantics of data services is the first step towards automating their composition. An interesting approach to define the semantics of data services is by describing them as semantic views over a domain ontology. However, defining such semantic views cannot always be done with certainty, especially when the service’s returned data are too complex. In such case, a data service is associated with several possible semantic views. In addition, complex correlations may be present among these possible semantic views, mainly when data services encapsulate the same data sources. In this paper, we propose a probabilistic approach to model the semantic uncertainty of data services. Services along with their possible semantic views are represented in probabilistic service registry. The correlations among service semantics are modeled through a directed probabilistic graphical model (Bayesian network). Based on our modeling, we study the problem of compositing correlated data services to answer a user query, and propose an efficient method to compute the different possible compositions and their probabilities.  相似文献   

9.
张宏毅  王立威  陈瑜希 《软件学报》2013,24(11):2476-2497
概率图模型作为一类有力的工具,能够简洁地表示复杂的概率分布,有效地(近似)计算边缘分布和条件分布,方便地学习概率模型中的参数和超参数.因此,它作为一种处理不确定性的形式化方法,被广泛应用于需要进行自动的概率推理的场合,例如计算机视觉、自然语言处理.回顾了有关概率图模型的表示、推理和学习的基本概念和主要结果,并详细介绍了这些方法在两种重要的概率模型中的应用.还回顾了在加速经典近似推理算法方面的新进展.最后讨论了相关方向的研究前景.  相似文献   

10.
We review in this paper some recent yet fundamental results on evaluating queries over probabilistic databases. While one can see this problem as a special instance of general purpose probabilistic inference, we describe in this paper two key database specific techniques that significantly reduce the complexity of query evaluation on probabilistic databases. The first is the separation of the query and the data: we show here that by doing so, one can identify queries whose data complexity is #P-hard, and queries whose data complexity is in PTIME. The second is the aggressive use of previously computed query results (materialized views): in particular, by rewriting a query in terms of views, one can reduce its complexity from #P-complete to PTIME. We describe a notion of a partial representation for views, and show that, once computed and stored, this partial representation can be used to answer subsequent queries on the probabilistic databases. evaluation.  相似文献   

11.
12.
An Introduction to Variational Methods for Graphical Models   总被引:20,自引:0,他引:20  
This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields). We present a number of examples of graphical models, including the QMR-DT database, the sigmoid belief network, the Boltzmann machine, and several variants of hidden Markov models, in which it is infeasible to run exact inference algorithms. We then introduce variational methods, which exploit laws of large numbers to transform the original graphical model into a simplified graphical model in which inference is efficient. Inference in the simpified model provides bounds on probabilities of interest in the original model. We describe a general framework for generating variational transformations based on convex duality. Finally we return to the examples and demonstrate how variational algorithms can be formulated in each case.  相似文献   

13.
CONSTRUCTION OF BELIEF AND DECISION NETWORKS   总被引:2,自引:0,他引:2  
We describe a representation and set of inference techniques for the dynamic construction of probabilistic and decision-theoretic models expressed as networks. In contrast to probabilistic reasoning schemes that rely on fixed models, we develop a representation that implicitly encodes a large number of possible model structures. Based on a particular query and state of information, the system constructs a customized belief net for that particular situation. We develop an interpretation of the network construction process in terms of the implicit networks encoded in the database. A companion method for constructing belief networks with decisions and values (decision networks) is also developed that uses sensitivity analysis to focus the model building process. Finally, we discuss some issues of control of model construction and describe examples of constructing networks.  相似文献   

14.
专家发现是实体检索领域的一个研究热点,针对经典专家发现模型存在索引术语独立性假设与检索性能低的缺陷,提出一种基于贝叶斯网络模型的专家发现方法。该方法模型采用四层网络结构,能够实现图形化的概率推理,同时运用词向量技术能够实现查询术语的语义扩展。实验结果显示该模型在多个评价指标上均优于经典专家发现模型,能够有效实现查询术语语义扩展,提高专家检索性能。  相似文献   

15.
Trends in databases leading to complex objects present opportunities for representing imprecision and uncertainty that were difficult to integrate cohesively in simpler database models. In fact, one can begin at the conceptual level with a model that allows uncertainty assumptions and then transform those assumptions into a logical model having the necessary semantic foundations upon which to base a meaningful query language. Here we provide such a constructive approach beginning with the ExIFO model for expression of the conceptual design then show how the conceptual design is transformed into the logical design (for which we utilize the extended NF2 logical database model). The steps are straightforward, unambiguous, and preserve the relevant information, including information concerning uncertainty  相似文献   

16.
We formulate the problem of 3D human pose estimation and tracking as one of inference in a graphical model. Unlike traditional kinematic tree representations, our model of the body is a collection of loosely-connected body-parts. In particular, we model the body using an undirected graphical model in which nodes correspond to parts and edges to kinematic, penetration, and temporal constraints imposed by the joints and the world. These constraints are encoded using pair-wise statistical distributions, that are learned from motion-capture training data. Human pose and motion estimation is formulated as inference in this graphical model and is solved using Particle Message Passing (PaMPas). PaMPas is a form of non-parametric belief propagation that uses a variation of particle filtering that can be applied over a general graphical model with loops. The loose-limbed model and decentralized graph structure allow us to incorporate information from “bottom-up” visual cues, such as limb and head detectors, into the inference process. These detectors enable automatic initialization and aid recovery from transient tracking failures. We illustrate the method by automatically tracking people in multi-view imagery using a set of calibrated cameras and present quantitative evaluation using the HumanEva dataset.  相似文献   

17.
陈亚瑞 《计算机科学》2013,40(2):253-256,288
图模型概率推理的主要任务是通过对联合概率分布进行变量求和来计算配分函数、变量边缘概率分布、条件 概率分布等。图模型概率推理计算复杂性及近似概率推理的计算复杂性是一重要的理论问题,也是设计概率推理算 法和近似概率推理算法的理论基础。研究了Ising图模型概率推理的计算复杂性,包括概率推理的难解性及不可近似 性。具体地,通过构建#2 SA"I'问题到Icing图模型概率推理问题的多项式时间计数归约,证明在一般 Ising图模型上 计算配分函数、变量边缘概率分布、条件概率分布的概率推理问题是#P难的,同时证明Icing图模型近似概率推理问 题是NP难的,即一般Icing图模型上的概率推理问题是难解且不可近似的。  相似文献   

18.
《Information Systems》2001,26(6):445-475
The rapid increase in end-user computing calls into question the suitability of existing database query languages (DBQLs). Because the typical DB end-user is not a DB specialist, it is essential that DBQLs use concepts that are as close as possible to those in the end-users’ cognitive mental model and adopt interface techniques that are suited to end-users’ abilities. Concept-based query languages are well suited for this. This realization has motivated further research in conceptual, or semantic, query approaches. However, the primary focus in this field has been on semantic query optimization, not on query formulation. In this study, we address ourselves to the problem of formulation of queries using concepts. We propose a concept-based query language, called the conceptual query language (CQL), which allows for the conceptual abstraction of database queries and exploits the rich semantics of data models to ease and facilitate query formulation.The CQL approach uses the relationship semantics of semantic data models to render transparent the technical complexities of existing DB query languages. Association semantics are also used to automatically construct query graphs and pseudo-natural language explanations of queries, and to generate SQL codes. A set theoretic formalism for conceptual queries is developed and used. This paper discusses the design of CQL, its expressive power, its implementation, and the strategies for CQL query processing. The implementation of a CQL prototype is briefly discussed in this paper. User experiments were carried out extensively and showed the advantage of CQL over alternative languages such as SQL.  相似文献   

19.
Due to the pervasive data uncertainty in many real applications, efficient and effective query answering on uncertain data has recently gained much attention from the database community. In this paper, we propose a novel and important query in the context of uncertain databases, namely probabilistic group subspace skyline (PGSS) query, which is useful in applications like sensor data analysis. Specifically, a PGSS query retrieves those uncertain objects that are, with high confidence, not dynamically dominated by other objects, with respect to a group of query points in ad-hoc subspaces. In order to enable fast PGSS query answering, we propose effective pruning methods to reduce the PGSS search space, which are seamlessly integrated into an efficient PGSS query procedure. Furthermore, to achieve low query cost, we provide a cost model, in light of which uncertain data are pre-processed and indexed. Extensive experiments have been conducted to demonstrate the efficiency and effectiveness of our proposed approaches.  相似文献   

20.
不确定海量数据存储与记录的广泛应用及其在XML上的扩展,使XML的关联事件概率的数据模型研究成为研究热点,以描述复杂事件的概率数据模型为目标,在当前已有概率模型的基础上,提出了多维不确定概率模型空间的概念,基于多个概率模型进行统一建模,并把单维XML概率节点引申到多维空间,进而定义了统一的空间查询方式,为复杂概率数据建模和查询优化提供了一种新颖的理论方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号