首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
PrDB: managing and exploiting rich correlations in probabilistic databases   总被引:2,自引:0,他引:2  
Due to numerous applications producing noisy data, e.g., sensor data, experimental data, data from uncurated sources, information extraction, etc., there has been a surge of interest in the development of probabilistic databases. Most probabilistic database models proposed to date, however, fail to meet the challenges of real-world applications on two counts: (1) they often restrict the kinds of uncertainty that the user can represent; and (2) the query processing algorithms often cannot scale up to the needs of the application. In this work, we define a probabilistic database model, PrDB, that uses graphical models, a state-of-the-art probabilistic modeling technique developed within the statistics and machine learning community, to model uncertain data. We show how this results in a rich, complex yet compact probabilistic database model, which can capture the commonly occurring uncertainty models (tuple uncertainty, attribute uncertainty), more complex models (correlated tuples and attributes) and allows compact representation (shared and schema-level correlations). In addition, we show how query evaluation in PrDB translates into inference in an appropriately augmented graphical model. This allows us to easily use any of a myriad of exact and approximate inference algorithms developed within the graphical modeling community. While probabilistic inference provides a generic approach to solving queries, we show how the use of shared correlations, together with a novel inference algorithm that we developed based on bisimulation, can speed query processing significantly. We present a comprehensive experimental evaluation of the proposed techniques and show that even with a few shared correlations, significant speedups are possible.  相似文献   

2.
Databases with uncertainty and lineage   总被引:2,自引:0,他引:2  
This paper introduces uldbs, an extension of relational databases with simple yet expressive constructs for representing and manipulating both lineage and uncertainty. Uncertain data and data lineage are two important areas of data management that have been considered extensively in isolation, however many applications require the features in tandem. Fundamentally, lineage enables simple and consistent representation of uncertain data, it correlates uncertainty in query results with uncertainty in the input data, and query processing with lineage and uncertainty together presents computational benefits over treating them separately. We show that the uldb representation is complete, and that it permits straightforward implementation of many relational operations. We define two notions of uldb minimality—data-minimal and lineage-minimal—and study minimization of uldb representations under both notions. With lineage, derived relations are no longer self-contained: their uncertainty depends on uncertainty in the base data. We provide an algorithm for the new operation of extracting a database subset in the presence of interconnected uncertainty. We also show how uldbs enable a new approach to query processing in probabilistic databases. Finally, we describe the current state of the Trio system, our implementation of uldbs under development at Stanford. This work was supported by the National Science Foundation under grants IIS-0324431, IIS-1098447, and IIS-9985114, by DARPA Contract #03-000225, and by a grant from the Boeing Corporation.  相似文献   

3.
Reverse nearest neighbor (RNN) search is very crucial in many real applications. In particular, given a database and a query object, an RNN query retrieves all the data objects in the database that have the query object as their nearest neighbors. Often, due to limitation of measurement devices, environmental disturbance, or characteristics of applications (for example, monitoring moving objects), data obtained from the real world are uncertain (imprecise). Therefore, previous approaches proposed for answering an RNN query over exact (precise) database cannot be directly applied to the uncertain scenario. In this paper, we re-define the RNN query in the context of uncertain databases, namely probabilistic reverse nearest neighbor (PRNN) query, which obtains data objects with probabilities of being RNNs greater than or equal to a user-specified threshold. Since the retrieval of a PRNN query requires accessing all the objects in the database, which is quite costly, we also propose an effective pruning method, called geometric pruning (GP), that significantly reduces the PRNN search space yet without introducing any false dismissals. Furthermore, we present an efficient PRNN query procedure that seamlessly integrates our pruning method. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed GP-based PRNN query processing approach, under various experimental settings.  相似文献   

4.
信息物理融合系统CPS获得广泛应用需要解决的一个关键问题是软件中的信息处理部分,而复杂事件处理是CPS中信息处理的核心任务之一。CPS环境下的事件具有异构、分散、海量和不确定性等特征。在CPS实际应用中,因噪声、传感器误差、通讯技术等原因而造成的事件不确定性急需解决。为了解决CPS系统中存在的海量不确定事件流问题,提出一种处理不确定事件流的复杂事件处理方法USCEP,该方法不仅可以实时有效地处理海量不确定事件流,还可以有效计算复杂事件的概率。USCEP对现有RFID复杂事件监测方法 RCEDA进行了改进,提供了历史概率事件查询处理的支持,提出一种事件概率模型进行概率计算,并通过关联查询表来提高效率。实验表明,在处理不确定事件流时,该方法比传统方法具有更好的性能。  相似文献   

5.
Distance-based range search is crucial in many real applications. In particular, given a database and a query issuer, a distance-based range search retrieves all the objects in the database whose distances from the query issuer are less than or equal to a given threshold. Often, due to the accuracy of positioning devices, updating protocols or characteristics of applications (for example, location privacy protection), data obtained from real world are imprecise or uncertain. Therefore, existing approaches over exact databases cannot be directly applied to the uncertain scenario. In this paper, we redefine the distance-based range query in the context of uncertain databases, namely the probabilistic uncertain distance-based range (PUDR) queries, which obtain objects with confidence guarantees. We categorize the topological relationships between uncertain objects and uncertain search ranges into six cases and present the probability evaluation in each case. It is verified by experiments that our approach outperform Monte-Carlo method utilized in most existing work in precision and time cost for uniform uncertainty distribution. This approach approximates the probabilities of objects following other practical uncertainty distribution, such as Gaussian distribution with acceptable errors. Since the retrieval of a PUDR query requires accessing all the objects in the databases, which is quite costly, we propose spatial pruning and probabilistic pruning techniques to reduce the search space. Two metrics, false positive rate and false negative rate are introduced to measure the qualities of query results. An extensive empirical study has been conducted to demonstrate the efficiency and effectiveness of our proposed algorithms under various experimental settings.  相似文献   

6.
A survey of queries over uncertain data   总被引:1,自引:1,他引:0  
Uncertain data have already widely existed in many practical applications recently, such as sensor networks, RFID networks, location-based services, and mobile object management. Query processing over uncertain data as an important aspect of uncertain data management has received increasing attention in the field of database. Uncertain query processing poses inherent challenges and demands non-traditional techniques, due to the data uncertainty. This paper surveys this interesting and still evolving research area in current database community, so that readers can easily obtain an overview of the state-of-the-art techniques. We first provide an overview of data uncertainty, including uncertainty types, probability representation models, and sources of probabilities. We next outline the current major types of uncertain queries and summarize the main features of uncertain queries. Particularly, we present and analyze several typical uncertain queries in detail, such as skyline queries, top- $k$ queries, nearest-neighbor queries, aggregate queries, join queries, range queries, and threshold queries over uncertain data. Finally, we present many interesting research topics on uncertain queries that have not yet been explored.  相似文献   

7.
Query processing in the uncertain database has become increasingly important due to the wide existence of uncertain data in many real applications. Different from handling precise data, the uncertain query processing needs to consider the data uncertainty and answer queries with confidence guarantees. In this paper, we formulate and tackle an important query, namely probabilistic inverse ranking (PIR) query, which retrieves possible ranks of a given query object in an uncertain database with confidence above a probability threshold. We present effective pruning methods to reduce the PIR search space, which can be seamlessly integrated into an efficient query procedure. Moreover, we tackle the problem of PIR query processing in high dimensional spaces, which reduces high dimensional uncertain data to a lower dimensional space. Furthermore, we study three interesting and useful aggregate PIR queries, that is, MAX, top-m, and AVG? PIRs. Moreover, we also study an important query type, PIR with uncertain query object (namely UQ-PIR), and design specific rules to facilitate the pruning. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approaches over both real and synthetic data sets, under various experimental settings.  相似文献   

8.
Currently, a good portion of datasets on Internet are accessed through data services, where user’s queries are answered as a composition of multiple data services. Defining the semantics of data services is the first step towards automating their composition. An interesting approach to define the semantics of data services is by describing them as semantic views over a domain ontology. However, defining such semantic views cannot always be done with certainty, especially when the service’s returned data are too complex. In such case, a data service is associated with several possible semantic views. In addition, complex correlations may be present among these possible semantic views, mainly when data services encapsulate the same data sources. In this paper, we propose a probabilistic approach to model the semantic uncertainty of data services. Services along with their possible semantic views are represented in probabilistic service registry. The correlations among service semantics are modeled through a directed probabilistic graphical model (Bayesian network). Based on our modeling, we study the problem of compositing correlated data services to answer a user query, and propose an efficient method to compute the different possible compositions and their probabilities.  相似文献   

9.
A probabilistic database is defined in a previous article [R. Cavallo and M. Pittarelli, Proc. 13th Int. Conf. on Very Large Databases (VLDB); 1987; see Ref. 9] as a collection of probability distributions over Cartesian products of finite variable domains. the concept is extended here to accommodate interval-valued probabilities. Algebraic operations for both real- and interval-valued probabilities databases analogous to those for relational databases are defined. Techniques for making inferences regarding joint distributions on subsets of the variables over which a probabilistic database is defined are developed. These are illustrated through application to a problem of decision analysis under partial uncertainty. Connections between the probabilistic database formalism and other forms of data representation are discussed.  相似文献   

10.
The flexibility of XML data model allows a more natural representation of uncertain data compared with the relational model. Matching twig pattern against XML data is a fundamental problem in querying information from XML documents. For a probabilistic XML document, each twig answer has a probabilistic value because of the uncertainty of data. The twig answers that have small probabilistic value are useless to the users, and usually users only want to get the answers with the k largest probabilistic values. To this end, existing algorithms for ordinary XML documents cannot be directly applicable due to the need for handling probability distributional nodes and efficient calculation of top-k probabilities of answers in probabilistic XML. In this paper, we address the problem of finding twig answers with top-k probabilistic values against probabilistic XML documents directly. We propose a new encoding scheme called PEDewey for probabilistic XML in this paper. Based on this encoding scheme, we then design two algorithms for finding answers of top-k probabilities for twig queries. One is called ProTJFast, to process probabilistic XML data based on element streams in document order, and the other is called PTopKTwig, based on the element streams ordered by the path probability values. Experiments have been conducted to study the performance of these algorithms.  相似文献   

11.
This paper deals with decision making in a real time optimization context under uncertain data by linking Bayesian networks (BN) techniques (for uncertainties modeling) and linear programming (LP, for optimization scheme) into a single framework. It is supposed that some external events sensed in real time are susceptible to give relevant information about data. BN consists in graphical representation of probabilistic relationship between variables of a knowledge system and so permit to take into account uncertainty in an expert system by bringing together the classical artificial intelligence (AI) approach and statistics approach. They will be used to estimate numerical values of parameters subjected to the influence of random events for a linear programming program that perform optimization process in order to select optimal values of decision variables of a certain real time decision-making system.  相似文献   

12.
Uncertain data are data with uncertainty information,which exist widely in database applications.In recent years,uncertainty in data has brought challenges in almost all database management areas such as data modeling,query representation,query processing,and data mining.There is no doubt that uncertain data management has become a hot research topic in the field of data management.In this study,we explore problems in managing uncertain data,present state-of-the-art solutions,and provide future research directions in this area.The discussed uncertain data management techniques include data modeling,query processing,and data mining in uncertain data in the forms of relational,XML,graph,and stream.  相似文献   

13.
Due to the inherent existence of uncertainty in many real-world applications, in this paper, we investigate an important query in uncertain databases, namely probabilistic least influenced set (PLIS) query, which retrieves all the uncertain objects in an uncertain database such that they are the least affected by a given query object with high probabilities. Such a PLIS query is useful in applications such as business planning. We propose and tackle both monochromatic and bichromatic versions (i.e. M-PLIS and B-PLIS, respectively) of the PLIS query. In order to efficiently answer PLIS queries, we present three pruning methods, MINMAX, Regional, and Candidate pruning, which can effectively reduce the PLIS search space. The proposed pruning methods can be seamlessly integrated into efficient query procedures. Moreover, we also study important variants of PLIS query with uncertain query object (i.e. UQ-PLIS). Furthermore, we formulate and tackle the PLIS problem on uncertain moving objects (i.e. UMOD-PLIS). Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approaches under various settings.  相似文献   

14.
Gerardo  Bice   《Data & Knowledge Engineering》2009,68(11):1187-1205
The paper proposes a novel approach for on-line max and min query auditing, in which a Bayesian network addresses disclosures based on probabilistic inferences that can be drawn from released data. In the literature, on-line max and min auditing has been addressed with some restrictive assumptions, primarily that sensitive values must be all distinct and the sensitive field has a uniform distribution. We remove these limitations and propose a model able to: provide a graphical representation of user knowledge; deal with the implicit delivery of information that derives from denying the answer to a query; and capture user background knowledge. Finally, we discuss the results of experiments aimed at assessing the scalability of the approach, in terms of response time and size of the conditional probability table, and the usefulness of the auditor system, in terms of probability to deny.  相似文献   

15.
16.
Due to the pervasive data uncertainty in many real applications, efficient and effective query answering on uncertain data has recently gained much attention from the database community. In this paper, we propose a novel and important query in the context of uncertain databases, namely probabilistic group subspace skyline (PGSS) query, which is useful in applications like sensor data analysis. Specifically, a PGSS query retrieves those uncertain objects that are, with high confidence, not dynamically dominated by other objects, with respect to a group of query points in ad-hoc subspaces. In order to enable fast PGSS query answering, we propose effective pruning methods to reduce the PGSS search space, which are seamlessly integrated into an efficient PGSS query procedure. Furthermore, to achieve low query cost, we provide a cost model, in light of which uncertain data are pre-processed and indexed. Extensive experiments have been conducted to demonstrate the efficiency and effectiveness of our proposed approaches.  相似文献   

17.
概率XML数据管理技术研究进展   总被引:2,自引:0,他引:2  
随着网络应用的快速发展,XML数据已大量存在于当前的信息社会,使得XML类型的数据成为当前主流的数据形式,并已经成为Internet中进行数据交换和表示事实上的标准.由于客观世界的复杂性,不确定性是数据常见的内在属性,因此不确定的信息是普遍存在的.通常不确定信息以概率值的形式在XML文件(称为概率XML文件)中表示,因此,研究表示和处理概率XML数据将成为一个新的研究领域.自2001年以来,概率XML数据管理技术取得了一系列研究成果.从概率XML数据模型、PXML代数、查询、原型系统等几个方面综述了概率XML数据管理的研究进展,讨论了目前存在的主要问题和需要进一步研究的方向.  相似文献   

18.
针对移动对象轨迹的内在不确定性及其导致的查询精度较低的问题,将已有移动对象的空间轨迹不确定性模型应用于网络移动对象的轨迹表示中。在此基础上,将概率查询方法运用到受限网络移动对象的不确定轨迹查询中,提出移动对象轨迹的不确定性点查询和概率范围查询方法。实验结果表明,该方法具有较高的查询效率。  相似文献   

19.
Approaches for the processing of location-dependent queries usually assume that the location data are expressed precisely, usually using GPS locations. However, this is unrealistic because positioning methods do not have a perfect accuracy (e.g., the positioning approach used in cellular networks handles only the cell where mobile users are located). Besides, users may need to express queries based on concepts of locations other than traditional GPS locations, which we call location granules.In this paper, we focus on location granule-based query processing (i.e., processing of queries with location granules) in situations where the location data available is imprecise, which we have called probabilistic location-dependent queries. For that purpose, we exploit the concept of uncertainty location granule, which represents the location uncertainty of an object. In particular, we tackle the problem of processing probabilistic inside (range) constraints. We analyze in detail how those constraints can be processed, taking into account both the existence of location uncertainty affecting the relevant objects and the location granularity specified. An extensive experimental evaluation shows the feasibility of the proposed probabilistic query processing approach and analyzes the advantages of using index structures to speed up the query processing.  相似文献   

20.
组最近邻查询是空间对象查询领域的一类重要查询,通过该查询可找到距离给定查询点集最近的空间对象.由于图像分辨率或解析度的限制等因素,空间对象的存在不确定性广泛存在于某些涉及图像处理的查询应用中.这些对象位置数据的存在不确定性会对组最近邻查询结果产生影响.本文给出面向存在不确定对象的概率阈值组最近邻查询定义,设计了高效的查询处理机制,通过剪枝优化等手段提高概率阈值组最近邻查询效率,并进一步提出了高效概率阈值组最近邻查询算法.采用多个真实数据集对概率阈值组最近邻算法进行了实验验证,结果表明所提算法具有良好的查询效率.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号