首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
周帆  李树全  肖春静  吴跃 《计算机应用》2010,30(10):2605-2609
传感器网络等技术的广泛应用产生了大量不确定数据。近年来,对于不确定数据的处理和查询成为数据库和数据挖掘领域研究的热点。其中,传统关系数据库中的top-k查询和排序查询怎样拓展到不确定数据是其中的焦点之一。研究近年来提出的不确定数据库上top-k查询和排序查询算法,归纳和比较目前各种不同查询算法所适应的语义世界和应用场景,并详细分析各种算法的执行效率和算法复杂度。另外,对于不确定数据top-k查询和排序查询所面临的挑战和可能的研究方向进行了总结。  相似文献   

2.
在众多应用中,由于受到测量仪器精度、更新延迟、网络带宽等限制,不同形式的数据不确定性广泛存在。目前,不确定数据中的信息查询受到数据库研究领域学者的关注,并且为不确定数据寻找高效的分析方法也成为了一个热门课题。本文针对基于曼哈顿距离的不确定移动对象概率Skyline查询问题,提出一个基于曼哈顿距离的概率Skyline模型用于求解不确定移动对象在某时刻是Skyline的概率,并得到一个p-t-Skyline结果集,此集合包含所有在t时刻Skyline概率至少是p的移动对象。在实际应用中,计算大量不确定移动对象的Skyline概率过程繁琐,代价高昂。为提高概率Skyline查询过程的计算效率,本文提出包含“采样-限定-修剪-精炼”4个步骤的解决方案。同时,为进一步减少Skyline运算开销,本文使用一个多维索引结构VCI树以加快数据检索的效率。实验结果表明该解决方案在不同数据规模以及维度的数据集上均具有较高的效率。  相似文献   

3.
反轮廓查询在制定有效的市场决策方面具有重要的作用,随着数据流特征和不确定性的表现日益明显,不确定数据流上概率反轮廓查询已经成为一个新的研究课题.为了高效解决不确定数据流上概率反轮廓查询问题,首先,通过对实际应用需求进行分析,提出了不确定数据流上概率反轮廓查询的定义,并根据相关概念,提出了不确定数据流上概率反轮廓查询的索...  相似文献   

4.
由于在经济、军事等领域的广泛应用,不确定数据的查询处理技术成为近年来数据库领域的研究热点.概率top-κ查询根据打分函数和概率两个维度来对数据进行排序,因此具有多种查询语义.作为I/O密集型查询,概率top-κ查询需要具备一定通用性的索引技术来提高查询效率.本文从分析概率top-κ查询满足的性质入手,分别基于skyline和支配频率的概念,提出两种层次索引.通过理论分析和实验证明了满足特定性质的概率top-κ查询均可以利用这两种索引来提高I/O效率,其中基于支配频率的索引具有更好的鲁棒性.  相似文献   

5.
PrDB: managing and exploiting rich correlations in probabilistic databases   总被引:2,自引:0,他引:2  
Due to numerous applications producing noisy data, e.g., sensor data, experimental data, data from uncurated sources, information extraction, etc., there has been a surge of interest in the development of probabilistic databases. Most probabilistic database models proposed to date, however, fail to meet the challenges of real-world applications on two counts: (1) they often restrict the kinds of uncertainty that the user can represent; and (2) the query processing algorithms often cannot scale up to the needs of the application. In this work, we define a probabilistic database model, PrDB, that uses graphical models, a state-of-the-art probabilistic modeling technique developed within the statistics and machine learning community, to model uncertain data. We show how this results in a rich, complex yet compact probabilistic database model, which can capture the commonly occurring uncertainty models (tuple uncertainty, attribute uncertainty), more complex models (correlated tuples and attributes) and allows compact representation (shared and schema-level correlations). In addition, we show how query evaluation in PrDB translates into inference in an appropriately augmented graphical model. This allows us to easily use any of a myriad of exact and approximate inference algorithms developed within the graphical modeling community. While probabilistic inference provides a generic approach to solving queries, we show how the use of shared correlations, together with a novel inference algorithm that we developed based on bisimulation, can speed query processing significantly. We present a comprehensive experimental evaluation of the proposed techniques and show that even with a few shared correlations, significant speedups are possible.  相似文献   

6.
Query processing in the uncertain database has become increasingly important due to the wide existence of uncertain data in many real applications. Different from handling precise data, the uncertain query processing needs to consider the data uncertainty and answer queries with confidence guarantees. In this paper, we formulate and tackle an important query, namely probabilistic inverse ranking (PIR) query, which retrieves possible ranks of a given query object in an uncertain database with confidence above a probability threshold. We present effective pruning methods to reduce the PIR search space, which can be seamlessly integrated into an efficient query procedure. Moreover, we tackle the problem of PIR query processing in high dimensional spaces, which reduces high dimensional uncertain data to a lower dimensional space. Furthermore, we study three interesting and useful aggregate PIR queries, that is, MAX, top-m, and AVG? PIRs. Moreover, we also study an important query type, PIR with uncertain query object (namely UQ-PIR), and design specific rules to facilitate the pruning. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approaches over both real and synthetic data sets, under various experimental settings.  相似文献   

7.
存在级不确定数据上的概率Skyline计算   总被引:1,自引:0,他引:1  
概率Skyline计算是在不确定对象集合中找出Skyline概率大于给定阈值的对象,在多目标决策应用中有重要价值.现有的存在级不确定数据上的概率Skyline算法均需要预先建立索引,在数据量很大、维度很高或数据频繁更新时,建立索引往往不可行或者不会带来性能的提升,因此有必要设计通用的非索引算法.提出了存在级不确定数据上...  相似文献   

8.
A survey of queries over uncertain data   总被引:1,自引:1,他引:0  
Uncertain data have already widely existed in many practical applications recently, such as sensor networks, RFID networks, location-based services, and mobile object management. Query processing over uncertain data as an important aspect of uncertain data management has received increasing attention in the field of database. Uncertain query processing poses inherent challenges and demands non-traditional techniques, due to the data uncertainty. This paper surveys this interesting and still evolving research area in current database community, so that readers can easily obtain an overview of the state-of-the-art techniques. We first provide an overview of data uncertainty, including uncertainty types, probability representation models, and sources of probabilities. We next outline the current major types of uncertain queries and summarize the main features of uncertain queries. Particularly, we present and analyze several typical uncertain queries in detail, such as skyline queries, top- $k$ queries, nearest-neighbor queries, aggregate queries, join queries, range queries, and threshold queries over uncertain data. Finally, we present many interesting research topics on uncertain queries that have not yet been explored.  相似文献   

9.
完整性约束是保证关系型数据库中数据确定性的重要条件,现实中存在大量不确定、不满足完整约束条件,但仍具有使用价值。结合概率数据库理论,提出了一种新的针对非一致性数据库的查询策略,利用并、交、差、选择、投影、连接等约束方法,对非一致性数据进行修复,四元组概率计算方法和概率查询重写技术弥补了非一致性数据库查询的不足,减少了数据冲突的发生机率。  相似文献   

10.
不确定数据流上的Skyline查询技术逐步引起研究者的关注,传统的集中式流处理算法难以满足海量数据的查询需求,并且云计算所提供的海量计算资源和有效的存储管理模式,为研究并行Skyline查询技术提供了充足的条件。基于上述事实,提出了一种不确定数据流上的并行Skyline查询算法(parallel Skyline over uncertain data streams,PSUDS)。该算法通过交叉划分滑动窗口的方式,将集中式流查询转化为并行处理,以并行执行的方式来解决集中式算法处理性能不足的问题。大量实验结果表明,该算法具有较好的并行可扩展性。  相似文献   

11.
局部相关空间不确定数据越来越受到许多实际应用的关注.提出了一种新颖的定义在不确定数据库的多个快照上的概率频繁近邻查询,目的是在多个快照数据上找到以一定概率频繁成为查询点最近邻的那些对象.应用现有的基于传统数据和基于不确定数据上的近邻查询算法直接处理这种查询会产生昂贵的开销.为了很好地解决这一问题,提出了一般的处理框架,...  相似文献   

12.
P2P环境中不确定数据Top-k查询处理算法   总被引:1,自引:0,他引:1  
近年来随着P2P技术的日益发展,P2P环境中的Top-k查询处理技术也越来越成熟.但是,自从不确定数据在数据库的各个领域受到广泛重视,这就引发了学术界和工业界对研发新型的不确定性数据管理技术的兴趣.所以在P2P环境中对不确定数据进行Top-k查询处理就成为了一个新的挑战.主要研究P2P环境下的不确定数据Top-k查询处理技术.首先给出了在不确定数据集上的Top-k查询的定义;然后,以Chord拓扑为例阐述了在P2P环境中对不确定数据的Top-k查询处理算法,并且在保序散列的基础上提出了基于upper-bound的剪枝策略及其改进的路由剪枝策略;最后,通过大量的实验来验证了所提出算法的性能.  相似文献   

13.
As an important type of multidimensional preference query, the skyline query can find a superset of optimal results when there is no given linear function to combine values for all attributes of interest. Its processing has been extensively investigated in the past. While most skyline query processing algorithms are designed based on the assumption that query processing is done for all attributes in a static dataset with deterministic attribute values, some advanced work has been done recently to remove part of such a strong assumption in order to process skyline queries for real-life applications, namely, to deal with data with multi-valued attributes (known as data uncertainty), to support skyline queries in a subspace which is a subset of attributes selected by the user, and to support continuous queries on streaming data. Naturally, there are many application scenarios where these three complex issues must be considered together. In this paper, we tackle the problem of probabilistic subspace skyline query processing over sliding windows on uncertain data streams. That is, to retrieve all objects from the most recent window of streaming data in a user-selected subspace with a skyline probability no smaller than a given threshold. Based on the subtle relationship between the full space and an arbitrary subspace, a novel approach using a regular grid indexing structure is developed for this problem. An extensive empirical study under various settings is conducted to show the effectiveness and efficiency of our PSS algorithm.  相似文献   

14.
A Survey of Uncertain Data Algorithms and Applications   总被引:8,自引:0,他引:8  
In recent years, a number of indirect data collection methodologies have lead to the proliferation of uncertain data. Such data points are often represented in the form of a probabilistic function, since the corresponding deterministic value is not known. This increases the challenge of mining and managing uncertain data, since the precise behavior of the underlying data is no longer known. In this paper, we provide a survey of uncertain data mining and management applications. In the field of uncertain data management, we will examine traditional methods such as join processing, query processing, selectivity estimation, OLAP queries, and indexing. In the field of uncertain data mining, we will examine traditional mining problems such as classification and clustering. We will also examine a general transform based technique for mining uncertain data. We discuss the models for uncertain data, and how they can be leveraged in a variety of applications. We discuss different methodologies to process and mine uncertain data in a variety of forms.  相似文献   

15.
Most recently, uncertain graph data begin attracting significant interests of database research community, because uncertainty is the intrinsic property of the real-world and data are more suitable to be modeled as graphs in numbers of applications, e.g. social network analysis, PPI networks in biology, and road network monitoring. Meanwhile, as one of the basic query operators, aggregate nearest neighbor (ANN) query retrieves a data entity whose aggregate distance, e.g. sum, max, to the given query data entities is smaller than those of other data entities in a database. ANN query on both certain graph data and high dimensional data has been well studied by previous work. However, existing ANN query processing approaches cannot handle the situation of uncertain graphs, because topological structures of an uncertain graph may vary in different possible worlds. Motivated by this, we propose the aggregate nearest neighbor query in uncertain graphs (UG-ANN) in this paper. First of all, we give the formal definition of UG-ANN query and the basic UG-ANN query algorithm. After that, to improve the efficiency of UG-ANN query processing, we develop two kinds of pruning approaches, i.e. structural pruning and instance pruning. The structural pruning takes advantages the monotonicity of the aggregate distance to derive the upper and lower bounds of the aggregate distance for reducing the graph size. Whereas, the instance pruning decreases the number of possible worlds to be checked in the searching tree. Comprehensive experimental results on real-world data sets demonstrate that the proposed method significantly improves the efficiency of the UG-ANN query processing.  相似文献   

16.
Recently, many new applications, such as sensor data monitoring and mobile device tracking, raise up the issue of uncertain data management. Compared to "certain” data, the data in the uncertain database are not exact points, which, instead, often reside within a region. In this paper, we study the ranked queries over uncertain data. In fact, ranked queries have been studied extensively in traditional database literature due to their popularity in many applications, such as decision making, recommendation raising, and data mining tasks. Many proposals have been made in order to improve the efficiency in answering ranked queries. However, the existing approaches are all based on the assumption that the underlying data are exact (or certain). Due to the intrinsic differences between uncertain and certain data, these methods are designed only for ranked queries in certain databases and cannot be applied to uncertain case directly. Motivated by this, we propose novel solutions to speed up the probabilistic ranked query (PRank) with monotonic preference functions over the uncertain database. Specifically, we introduce two effective pruning methods, spatial and probabilistic pruning, to help reduce the PRank search space. A special case of PRank with linear preference functions is also studied. Then, we seamlessly integrate these pruning heuristics into the PRank query procedure. Furthermore, we propose and tackle the PRank query processing over the join of two distinct uncertain databases. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approaches in answering PRank queries, in terms of both wall clock time and the number of candidates to be refined.  相似文献   

17.
不确定性是数据的本质特征,对不确定性数据的研究得到了越来越多领域的关注。在总结当前处理历史数据不确定性方法的基础上,针对缺乏处理不确定性历史数据的语义框架问题,基于Neo4j图数据库建立用于处理不确定性历史数据的通用数学模型。该模型以双时态模型、概率模型等为依托,整合了历史数据的时间、不确定性与世系三个方面。并基于Python语言实现了具有CRUD基本操作的存储系统,可动态增加节点之间的关系、存储和检索历史数据、实现了不确定性数据的筛选查询和模糊查询。通过关系型数据库与图数据库中数据的存储方式及存储系统的查询效率对比实验表明,所提出的数学模型扩展性更强,实现系统查询效率更高,在处理大规模不确定性数据的存储和检索方面优势更加明显。  相似文献   

18.
粗糙关系查询与经典关系查询的比较   总被引:1,自引:0,他引:1  
在数据库查询过程中,经典关系数据库查询的是确定的、精确的信息,粗糙集理论在处理不确定性问题时,不需要提供问题所需处理的数据以外的先验信息,所以在处理不确定性信息的关系中应用很广。  相似文献   

19.
Processing lineages (also called provenances) over uncertain data consists in tracing the origin of uncertainty based on the process of data production and evolution. In this paper, we focus on the representation and processing of lineages over uncertain data, where we adopt Bayesian network (BN), one of the popular and important probabilistic graphical models (PGMs), as the framework of uncertainty representation and inferences. Starting from the lineage expressed as Boolean formulae for SPJ (Selection–Projection–Join) queries over uncertain data, we propose a method to transform the lineage expression into directed acyclic graphs (DAGs) equivalently. Specifically, we discuss the corresponding probabilistic semantics and properties to guarantee that the graphical model can support effective probabilistic inferences in lineage processing theoretically. Then, we propose the function-based method to compute the conditional probability table (CPT) for each node in the DAG. The BN for representing lineage expressions over uncertain data, called lineage BN and abbreviated as LBN, can be constructed while generally suitable for both safe and unsafe query plans. Therefore, we give the variable-elimination-based algorithm for LBN's exact inferences to obtain the probabilities of query results, called LBN-based query processing. Then, we focus on obtaining the probabilities of inputs or intermediate tuples conditioned on query results, called LBN-based inference query processing, and give the Gibbs-sampling-based algorithm for LBN's approximate inferences. Experimental results show the efficiency and effectiveness of our methods.  相似文献   

20.
赵法信  金义富 《计算机科学》2015,42(8):236-239, 248
Skyline查询处理是近年来数据库领域的一个热门研究方向。由于现实世界中普遍存在着大量不精确、不确定的信息,Skyline查询也随之成为模糊数据处理中的一个重要内容。在已有研究的基础上,讨论了基于Vague关系数据模型的Skyline查询,其用于查询给定Vague关系中的任意元组确定不被该关系中的任意其它元组所支配的程度,并给出了相关的计算公式和查询算法,该算法可直接作用于Vague关系数据库,而无需对Vague关系数据库对应的所有可能性状态逐一进行扫描,具有较高的执行效率。在此基础上,还进一步讨论了带有预选择条件的Skyline查询的计算方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号