首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Due to the pervasive data uncertainty in many real applications, efficient and effective query answering on uncertain data has recently gained much attention from the database community. In this paper, we propose a novel and important query in the context of uncertain databases, namely probabilistic group subspace skyline (PGSS) query, which is useful in applications like sensor data analysis. Specifically, a PGSS query retrieves those uncertain objects that are, with high confidence, not dynamically dominated by other objects, with respect to a group of query points in ad-hoc subspaces. In order to enable fast PGSS query answering, we propose effective pruning methods to reduce the PGSS search space, which are seamlessly integrated into an efficient PGSS query procedure. Furthermore, to achieve low query cost, we provide a cost model, in light of which uncertain data are pre-processed and indexed. Extensive experiments have been conducted to demonstrate the efficiency and effectiveness of our proposed approaches.  相似文献   

2.
With the increasing demands for advanced use of streaming data, efficient execution of continuous queries is an important research issue. This paper focuses on event-driven continuous queries that are activated by foreign events such as data arrival and the progression of time. Existing approaches to multiple continuous query optimization decide the optimal query plan by extracting common subexpressions from the given queries. Event-driven queries containing the common subexpressions may produce many common intermediate results when they are activated within a small interval, but may produce only disjoint data when activated at completely different timings.This paper proposes an efficient data stream processing scheme for multiple event-driven continuous queries. In the proposed approach, we introduce query result caching to achieve a flexible way to share common operators among queries activated by unpredictable events. When a query is activated, an intermediate result generated for the query is stored into the cache area if it is expected to be reused by other queries. When other queries including the same operator are activated, they reuse the cached result if the cache includes reusable data. Efficiency of the proposed scheme is validated by intensive experimental evaluations.  相似文献   

3.
陈鹏 《计算机科学》2012,39(105):265-270
随着信息与通讯技术的快速发展,数据管理正面临着越来越多的挑战,其中之一就是数据的不确定性。提出一种基于元组存在性的概率数据模型相似文献   

4.
The continuous partial match query is a partial match query whose result remains consistently in the client’s memory. Conventional cache invalidation methods for mobile clients are record ID-based. However, since the partial match query uses content-based retrieval, the conventional ID-based approaches cannot efficiently manage the cache consistency of mobile clients. In this paper, we propose a predicate-based cache invalidation scheme for continuous partial match queries in mobile computing environments. We represent the cache state of a mobile client as a predicate, and also construct a cache invalidation report (CIR), which the server broadcasts to clients for cache management, with predicates. In order to reduce the amount of information that is needed for cache management, we propose a set of methods for CIR construction (in the server) and identification of invalidated data (in the client). Through experiments, we show that the predicate-based approach is very effective for the cache management of mobile clients.  相似文献   

5.
We propose a new similar sequence matching method that efficiently supports variable-length and variable-tolerance continuous query sequences on time-series data stream. Earlier methods do not support variable lengths or variable tolerances adequately for continuous query sequences if there are too many query sequences registered to handle in main memory. To support variable-length query sequences, we use the window construction mechanism that divides long sequences into smaller windows for indexing and searching the sequences. To support variable-tolerance query sequences, we present a new notion of intervaled sequences whose individual entries are an interval of real numbers rather than a real number itself. We also propose a new similar sequence matching method based on these notions, and then, formally prove correctness of the method. In addition, we show that our method has the prematching characteristic, which finds future candidates of similar sequences in advance. Experimental results show that our method outperforms the naive one by 2.6-102.1 times and the existing methods in the literature by 1.4-9.8 times over the entire ranges of parameters tested when the query selectivities are low (<32%), which are practically useful in large database applications.  相似文献   

6.
RFID middleware collects and filters RFID streaming data to process applications' requests called continuous queries, because they are executed continuously during tag movement. Several approaches to building an index on queries rather than data records, called a query index, have been proposed to evaluate continuous queries over streaming data. EPCglobal proposed an Event Cycle Specification (ECSpec) model, which is a de facto standard query interface for RFID applications. Continuous queries based on ECSpec consist of a large number of segments that represent the query conditions. The problem when using any of the existing query indexes on these continuous queries is that it takes a long time to build the index, because it is necessary to insert a large number of segments into the index. To solve this problem, we propose a transform method that converts a group of segments into compressed data. We also propose an efficient query index scheme for the transformed space. Comparing with existing query indexes, the performance of proposed index outperforms the others on various datasets.  相似文献   

7.
探讨双布鲁姆过滤器查询法查询集合并集、交集、补集、差集或对称差成员的性能问题。理论分析和实验结果表明,双布鲁姆过滤器查询法能够较好地支持集合并集、交集、补集、差集及对称差的成员查询问题,其中双布鲁姆过滤器并集及交集查询不会产生假阴性,仅有少量假阳性的存在,而双布鲁姆过滤器补集、差集及对称差查询则除存在少量假阳性外,还存在少量假阴性。  相似文献   

8.
根据数据流连续达到、大小无界和实时性强的特点,引出数据流多连续查询的基本概念.针对多连续查询的特点和用户的需求,将多连续查询优化技术分为单流多查询和多流多查询.详细论述了单流过滤型多连续查询优化技术和基于共享的多流多连续查询优化技术,通过全面系统地分析每种优化算法的基本思想,得出每种查询技术的优缺点及适用场合.  相似文献   

9.
Because it operates under a strict time constraint, query processing for data streams should be continuous and rapid. To guarantee this constraint, most previous researches optimize the evaluation order of multiple join operations in a set of continuous queries using a greedy optimization strategy so that the order is re-optimized dynamically in run-time due to the time-varying characteristics of data streams. However, this method often results in a sub-optimal plan because the greedy strategy traces only the first promising plan. This paper proposes a new multiple query optimization approach, Adaptive Sharing-based Extended Greedy Optimization Approach (A-SEGO), that traces multiple promising partial plans simultaneously. A-SEGO presents a novel method for sharing the results of common sub-expressions in a set of queries cost-effectively. The number of partial plans can be flexibly controlled according to the query processing workload. In addition, to avoid invoking the optimization process too frequently, optimization is performed only when the current execution plan is relatively no longer efficient. A series of experiments are comparatively analyzed to evaluate the performance of the proposed method in various stream environments.  相似文献   

10.
Location-based services have attracted the attention of important research in the field of mobile computing. Specifically, different mechanisms have been proposed to process location-dependent queries. In the above mentioned context, it is usually assumed that the location data are expressed at a fine geographic precision. However, a different granularity may be more appropriate in certain situations. Thus, a location resolution higher than required may even be inconvenient or not understandable by the user (for example, if the user expects a city name as an answer and instead the system provides the latitude/longitude coordinates). Moreover, if the locations presented to the user need to be refreshed automatically as the objects move, it is obvious that maintaining up-to-date GPS-like geographic coordinates would be more expensive in terms of processing and communication. Unfortunately, the existing approaches assume queries whose locations are always given with maximum precision (i.e., GPS locations).In this paper, a distributed query processing approach that adapts itself to the level of the location resolution required is presented. Thus, it supports continuous location-dependent queries based on the required terminology for the locations, depending on the granularity used (e.g., GPS, cities, states, provinces, or any other predefined geographic area). For the above mentioned purpose, location granules can be defined to specify the semantics appropriate for the queries and/or the way the results should be presented. A prototype showing the functionality and benefits of the approach has been implemented and used in an extensive experimental evaluation. The proposal not only increases the flexibility and expressive power of the queries considerably but also performs efficiently.  相似文献   

11.
In location-based services, a density query returns the regions with high concentrations of moving objects (MOs). The use of density queries can help users identify crowded regions so as to avoid congestion. Most of the existing methods try very hard to improve the accuracy of query results, but ignore query efficiency. However, response time is also an important concern in query processing and may have an impact on user experience. In order to address this issue, we present a new definition of continuous density queries. Our approach for processing continuous density queries is based on the new notion of a safe interval, using which the states of both dense and sparse regions are dynamically maintained. Two indexing structures are also used to index candidate regions for accelerating query processing and improving the quality of results. The efficiency and accuracy of our approach are shown through an experimental comparison with snapshot density queries.  相似文献   

12.
This paper describes the theoretical framework and implementation of a database management system for storing and manipulating diverse probability distributions of discrete random variables with finite domains, and associated information. A formal Semistructured Probabilistic Object (SPO) data model and a Semistructured Probabilistic Query Algebra (SP-algebra) are proposed. The SP-algebra supports standard database queries as well as some specific to probabilities, such as conditionalization and marginalization. Thus, the Semistructured Probabilistic Database may be used as a backend to any application that involves the management of large quantities of probabilistic information, such as building stochastic models. The implementation uses XML encoding of SPOs to facilitate communication with diverse applications. The database management system has been implemented on top of a relational DBMS. The translation of SP-algebra queries into relational queries are discussed here, and the results of initial experiments evaluating the system are reported. Work performed while a Ph.D. student at the University of Kentucky.  相似文献   

13.
XML data broadcast is an efficient way to disseminate XML data to a large number of mobile clients in mobile wireless networks. Recently, several indexing methods have been proposed to improve the performance of XML query processing in terms of access time and tuning time over XML streams. However, existing indexing methods cannot process twig pattern XML queries. In this paper, we propose a novel structure for streaming XML data called PS+Pre/Post by integrating the path summary technique and the pre/post labeling scheme. Our proposed XML stream structure exploits the benefits of the path summary technique and the pre/post labeling scheme to efficiently process different types of XML queries over the broadcast stream. Experimental results show that our proposed XML stream structure improves the performance of access time and tuning time in processing different types of XML queries.  相似文献   

14.
A common approach to improve the reliability of query results based on error-prone sensors is to introduce redundant sensors. However, using multiple sensors to generate the value for a data item can be expensive, especially in wireless environments where continuous queries are executed. Moreover, some sensors may not be working properly and their readings need to be discarded. In this paper, we propose a statistical approach to decide which sensor nodes to be used to answer a query. In particular, we propose to solve the problem with the aid of continuous probabilistic query (CPQ), which is originally used to manage uncertain data and is associated with a probabilistic guarantee on the query result. Based on the historical data values from the sensor nodes, the query type, and the requirement on the query, we present methods to select an appropriate set of sensors and provide reliable answers for several common aggregate queries. Our statistics-based sensor node selection algorithm is demonstrated in a number of simulation experiments, which shows that a small number of sensor nodes can provide accurate and robust query results.  相似文献   

15.
Skyline查询是基于位置服务工13S的一项重要操作,其目的是发现数据集中不被其它点支配的点的集合。对道路网络环境下移动对象的连续概率Skyline查询进行了研究。在对道路网络和移动对象建模的基础上,定义了基于道路网络的数据间支配概率和Skyline概率的表示方式,提出了两类可能引起p-Skyline集合变动的event事件,并提出4条剪枝方案进行优化。在此基础上,设计了对网络受限的不确定移动对象进行连续概率Skyline查询的动态增量算法U-CPSQRN。该算法通过对event的跟踪计算实现了对p-Skyline的连续更新操作,减少了算法的查找和计算开销。实验结果显示了算法的有效性。  相似文献   

16.
在分布式数据流管理系统中,需要将查询操作放置到不同的处理结点执行。因此,如何放置查询操作成为分布式数据流管理研究的核心问题。Peter等人提出一种基于时延空间和弹簧张弛技术的查询操作放置算法,但是该算法假设查询操作之间数据流的流速不变,没有考虑数据流的流速与数据流查询操作之间的相关性。为此,通过分析不同的数据流查询操作与其输出的数据流的流速之间的关系,对Peter等人提出的算法加以改进,实验结果表明,改进后的算法可以有效地应用于分布式数据流管理系统。  相似文献   

17.
Recent research has focused on Continuous K Nearest Neighbor (CKNN) queries in road networks, where the queries and the data objects are moving. Most existing approaches assume the fixed velocity of moving objects. The release of fixed moving velocity makes the query process slowly due to the significant repetitive query cost. In this paper, we study CKNN queries over moving objects with uncertain velocity in road networks. A Distance Interval Model (DIM) is designed to calculate the minimal and maximal road network distances between moving objects and query point. Furthermore, we propose a novel Possibility-based Vague KNN (PVKNN) algorithm to process the query efficiently, which determines the CKNN query results with possibility within each division time subinterval of given time interval by applying the vague set theory. In the PVKNN algorithm, the query efficiency can be improved significantly with the pruning, distilling and possibility-ranking phases. With these phases, the objects candidates are scaled down and the given time interval is divided into subintervals to reduce the repetitive query cost. In addition, an index structure TPRuv-Tree is designed to efficiently index moving objects with uncertain velocity in road network by involving edge connection and moving objects information. Experiments with simulation and comparison show that significant improvement in the performance of efficiency can be achieved with our proposed algorithms.  相似文献   

18.
Processing lineages (also called provenances) over uncertain data consists in tracing the origin of uncertainty based on the process of data production and evolution. In this paper, we focus on the representation and processing of lineages over uncertain data, where we adopt Bayesian network (BN), one of the popular and important probabilistic graphical models (PGMs), as the framework of uncertainty representation and inferences. Starting from the lineage expressed as Boolean formulae for SPJ (Selection–Projection–Join) queries over uncertain data, we propose a method to transform the lineage expression into directed acyclic graphs (DAGs) equivalently. Specifically, we discuss the corresponding probabilistic semantics and properties to guarantee that the graphical model can support effective probabilistic inferences in lineage processing theoretically. Then, we propose the function-based method to compute the conditional probability table (CPT) for each node in the DAG. The BN for representing lineage expressions over uncertain data, called lineage BN and abbreviated as LBN, can be constructed while generally suitable for both safe and unsafe query plans. Therefore, we give the variable-elimination-based algorithm for LBN's exact inferences to obtain the probabilities of query results, called LBN-based query processing. Then, we focus on obtaining the probabilities of inputs or intermediate tuples conditioned on query results, called LBN-based inference query processing, and give the Gibbs-sampling-based algorithm for LBN's approximate inferences. Experimental results show the efficiency and effectiveness of our methods.  相似文献   

19.
随着移动互联网的快速发展以及信息技术的普遍应用,在许多应用中都产生了海量、不确定性数据,包括金融、军事、位置服务、医疗以及气象等。然而,传统的确定性数据管理方法很难管理不确定数据,亟需开发新型数据管理方法。可能世界模型被广泛用于为不确定数据建模,通过该模型可以衍生出诸多确定性的可能世界实例。不确定性数据流是指高速到达的海量不确定元组序列,因而不确定数据流管理比不确定性静态数据管理更具挑战性。面向于不确定数据流的ER-Topk查询是一个典型问题,但是处理复杂度高。提出一种近似算法来处理该查询,具有较小的空间复杂度;同时,还通过搜索策略优化来进一步提升查询处理效率。实验结果验证了所提方法的有效性和高效性。  相似文献   

20.
RRSi: indexing XML data for proximity twig queries   总被引:2,自引:2,他引:0  
Twig query pattern matching is a core operation in XML query processing. Indexing XML documents for twig query processing is of fundamental importance to supporting effective information retrieval. In practice, many XML documents on the web are heterogeneous and have their own formats; documents describing relevant information can possess different structures. Therefore some “user-interesting” documents having similar but non-exact structures against a user query are often missed out. In this paper, we propose the RRSi, a novel structural index designed for structure-based query lookup on heterogeneous sources of XML documents supporting proximate query answers. The index avoids the unnecessary processing of structurally irrelevant candidates that might show good content relevance. An optimized version of the index, oRRSi, is also developed to further reduce both space requirements and computational complexity. To our knowledge, these structural indexes are the first to support proximity twig queries on XML documents. The results of our preliminary experiments show that RRSi and oRRSi based query processing significantly outperform previously proposed techniques in XML repositories with structural heterogeneity.
Vincent T. Y. NgEmail:
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号