共查询到20条相似文献,搜索用时 0 毫秒
1.
Due to the pervasive data uncertainty in many real applications, efficient and effective query answering on uncertain data has recently gained much attention from the database community. In this paper, we propose a novel and important query in the context of uncertain databases, namely probabilistic group subspace skyline (PGSS) query, which is useful in applications like sensor data analysis. Specifically, a PGSS query retrieves those uncertain objects that are, with high confidence, not dynamically dominated by other objects, with respect to a group of query points in ad-hoc subspaces. In order to enable fast PGSS query answering, we propose effective pruning methods to reduce the PGSS search space, which are seamlessly integrated into an efficient PGSS query procedure. Furthermore, to achieve low query cost, we provide a cost model, in light of which uncertain data are pre-processed and indexed. Extensive experiments have been conducted to demonstrate the efficiency and effectiveness of our proposed approaches. 相似文献
2.
With the increasing demands for advanced use of streaming data, efficient execution of continuous queries is an important research issue. This paper focuses on event-driven continuous queries that are activated by foreign events such as data arrival and the progression of time. Existing approaches to multiple continuous query optimization decide the optimal query plan by extracting common subexpressions from the given queries. Event-driven queries containing the common subexpressions may produce many common intermediate results when they are activated within a small interval, but may produce only disjoint data when activated at completely different timings.This paper proposes an efficient data stream processing scheme for multiple event-driven continuous queries. In the proposed approach, we introduce query result caching to achieve a flexible way to share common operators among queries activated by unpredictable events. When a query is activated, an intermediate result generated for the query is stored into the cache area if it is expected to be reused by other queries. When other queries including the same operator are activated, they reuse the cached result if the cache includes reusable data. Efficiency of the proposed scheme is validated by intensive experimental evaluations. 相似文献
3.
A cache invalidation scheme for continuous partial match queries in mobile computing environments 总被引:1,自引:0,他引:1
Yon Dohn Chung 《Distributed and Parallel Databases》2008,23(3):207-234
The continuous partial match query is a partial match query whose result remains consistently in the client’s memory. Conventional
cache invalidation methods for mobile clients are record ID-based. However, since the partial match query uses content-based
retrieval, the conventional ID-based approaches cannot efficiently manage the cache consistency of mobile clients. In this
paper, we propose a predicate-based cache invalidation scheme for continuous partial match queries in mobile computing environments.
We represent the cache state of a mobile client as a predicate, and also construct a cache invalidation report (CIR), which
the server broadcasts to clients for cache management, with predicates. In order to reduce the amount of information that
is needed for cache management, we propose a set of methods for CIR construction (in the server) and identification of invalidated
data (in the client). Through experiments, we show that the predicate-based approach is very effective for the cache management
of mobile clients. 相似文献
4.
We propose a new similar sequence matching method that efficiently supports variable-length and variable-tolerance continuous query sequences on time-series data stream. Earlier methods do not support variable lengths or variable tolerances adequately for continuous query sequences if there are too many query sequences registered to handle in main memory. To support variable-length query sequences, we use the window construction mechanism that divides long sequences into smaller windows for indexing and searching the sequences. To support variable-tolerance query sequences, we present a new notion of intervaled sequences whose individual entries are an interval of real numbers rather than a real number itself. We also propose a new similar sequence matching method based on these notions, and then, formally prove correctness of the method. In addition, we show that our method has the prematching characteristic, which finds future candidates of similar sequences in advance. Experimental results show that our method outperforms the naive one by 2.6-102.1 times and the existing methods in the literature by 1.4-9.8 times over the entire ranges of parameters tested when the query selectivities are low (<32%), which are practically useful in large database applications. 相似文献
5.
RFID middleware collects and filters RFID streaming data to process applications' requests called continuous queries, because they are executed continuously during tag movement. Several approaches to building an index on queries rather than data records, called a query index, have been proposed to evaluate continuous queries over streaming data. EPCglobal proposed an Event Cycle Specification (ECSpec) model, which is a de facto standard query interface for RFID applications. Continuous queries based on ECSpec consist of a large number of segments that represent the query conditions. The problem when using any of the existing query indexes on these continuous queries is that it takes a long time to build the index, because it is necessary to insert a large number of segments into the index. To solve this problem, we propose a transform method that converts a group of segments into compressed data. We also propose an efficient query index scheme for the transformed space. Comparing with existing query indexes, the performance of proposed index outperforms the others on various datasets. 相似文献
6.
7.
Because it operates under a strict time constraint, query processing for data streams should be continuous and rapid. To guarantee this constraint, most previous researches optimize the evaluation order of multiple join operations in a set of continuous queries using a greedy optimization strategy so that the order is re-optimized dynamically in run-time due to the time-varying characteristics of data streams. However, this method often results in a sub-optimal plan because the greedy strategy traces only the first promising plan. This paper proposes a new multiple query optimization approach, Adaptive Sharing-based Extended Greedy Optimization Approach (A-SEGO), that traces multiple promising partial plans simultaneously. A-SEGO presents a novel method for sharing the results of common sub-expressions in a set of queries cost-effectively. The number of partial plans can be flexibly controlled according to the query processing workload. In addition, to avoid invoking the optimization process too frequently, optimization is performed only when the current execution plan is relatively no longer efficient. A series of experiments are comparatively analyzed to evaluate the performance of the proposed method in various stream environments. 相似文献
8.
Sergio IlarriAuthor Vitae Carlos Bobed Author VitaeEduardo Mena Author Vitae 《Journal of Systems and Software》2011,84(8):1327-1350
Location-based services have attracted the attention of important research in the field of mobile computing. Specifically, different mechanisms have been proposed to process location-dependent queries. In the above mentioned context, it is usually assumed that the location data are expressed at a fine geographic precision. However, a different granularity may be more appropriate in certain situations. Thus, a location resolution higher than required may even be inconvenient or not understandable by the user (for example, if the user expects a city name as an answer and instead the system provides the latitude/longitude coordinates). Moreover, if the locations presented to the user need to be refreshed automatically as the objects move, it is obvious that maintaining up-to-date GPS-like geographic coordinates would be more expensive in terms of processing and communication. Unfortunately, the existing approaches assume queries whose locations are always given with maximum precision (i.e., GPS locations).In this paper, a distributed query processing approach that adapts itself to the level of the location resolution required is presented. Thus, it supports continuous location-dependent queries based on the required terminology for the locations, depending on the granularity used (e.g., GPS, cities, states, provinces, or any other predefined geographic area). For the above mentioned purpose, location granules can be defined to specify the semantics appropriate for the queries and/or the way the results should be presented. A prototype showing the functionality and benefits of the approach has been implemented and used in an extensive experimental evaluation. The proposal not only increases the flexibility and expressive power of the queries considerably but also performs efficiently. 相似文献
9.
This paper describes the theoretical framework and implementation of a database management system for storing and manipulating
diverse probability distributions of discrete random variables with finite domains, and associated information. A formal Semistructured
Probabilistic Object (SPO) data model and a Semistructured Probabilistic Query Algebra (SP-algebra) are proposed. The SP-algebra
supports standard database queries as well as some specific to probabilities, such as conditionalization and marginalization.
Thus, the Semistructured Probabilistic Database may be used as a backend to any application that involves the management of
large quantities of probabilistic information, such as building stochastic models. The implementation uses XML encoding of
SPOs to facilitate communication with diverse applications. The database management system has been implemented on top of
a relational DBMS. The translation of SP-algebra queries into relational queries are discussed here, and the results of initial
experiments evaluating the system are reported.
Work performed while a Ph.D. student at the University of Kentucky. 相似文献
10.
A common approach to improve the reliability of query results based on error-prone sensors is to introduce redundant sensors. However, using multiple sensors to generate the value for a data item can be expensive, especially in wireless environments where continuous queries are executed. Moreover, some sensors may not be working properly and their readings need to be discarded. In this paper, we propose a statistical approach to decide which sensor nodes to be used to answer a query. In particular, we propose to solve the problem with the aid of continuous probabilistic query (CPQ), which is originally used to manage uncertain data and is associated with a probabilistic guarantee on the query result. Based on the historical data values from the sensor nodes, the query type, and the requirement on the query, we present methods to select an appropriate set of sensors and provide reliable answers for several common aggregate queries. Our statistics-based sensor node selection algorithm is demonstrated in a number of simulation experiments, which shows that a small number of sensor nodes can provide accurate and robust query results. 相似文献
11.
Processing lineages (also called provenances) over uncertain data consists in tracing the origin of uncertainty based on the process of data production and evolution. In this paper, we focus on the representation and processing of lineages over uncertain data, where we adopt Bayesian network (BN), one of the popular and important probabilistic graphical models (PGMs), as the framework of uncertainty representation and inferences. Starting from the lineage expressed as Boolean formulae for SPJ (Selection–Projection–Join) queries over uncertain data, we propose a method to transform the lineage expression into directed acyclic graphs (DAGs) equivalently. Specifically, we discuss the corresponding probabilistic semantics and properties to guarantee that the graphical model can support effective probabilistic inferences in lineage processing theoretically. Then, we propose the function-based method to compute the conditional probability table (CPT) for each node in the DAG. The BN for representing lineage expressions over uncertain data, called lineage BN and abbreviated as LBN, can be constructed while generally suitable for both safe and unsafe query plans. Therefore, we give the variable-elimination-based algorithm for LBN's exact inferences to obtain the probabilities of query results, called LBN-based query processing. Then, we focus on obtaining the probabilities of inputs or intermediate tuples conditioned on query results, called LBN-based inference query processing, and give the Gibbs-sampling-based algorithm for LBN's approximate inferences. Experimental results show the efficiency and effectiveness of our methods. 相似文献
12.
Vague continuous K-nearest neighbor queries over moving objects with uncertain velocity in road networks 总被引:1,自引:0,他引:1
Recent research has focused on Continuous K Nearest Neighbor (CKNN) queries in road networks, where the queries and the data objects are moving. Most existing approaches assume the fixed velocity of moving objects. The release of fixed moving velocity makes the query process slowly due to the significant repetitive query cost. In this paper, we study CKNN queries over moving objects with uncertain velocity in road networks. A Distance Interval Model (DIM) is designed to calculate the minimal and maximal road network distances between moving objects and query point. Furthermore, we propose a novel Possibility-based Vague KNN (PVKNN) algorithm to process the query efficiently, which determines the CKNN query results with possibility within each division time subinterval of given time interval by applying the vague set theory. In the PVKNN algorithm, the query efficiency can be improved significantly with the pruning, distilling and possibility-ranking phases. With these phases, the objects candidates are scaled down and the given time interval is divided into subintervals to reduce the repetitive query cost. In addition, an index structure TPRuv-Tree is designed to efficiently index moving objects with uncertain velocity in road network by involving edge connection and moving objects information. Experiments with simulation and comparison show that significant improvement in the performance of efficiency can be achieved with our proposed algorithms. 相似文献
13.
PRL: A probabilistic relational language 总被引:1,自引:0,他引:1
In this paper, we describe the syntax and semantics for a probabilistic relational language (PRL). PRL is a recasting of recent
work in Probabilistic Relational Models (PRMs) into a logic programming framework. We show how to represent varying degrees
of complexity in the semantics including attribute uncertainty, structural uncertainty and identity uncertainty. Our approach
is similar in spirit to the work in Bayesian Logic Programs (BLPs), and Logical Bayesian Networks (LBNs). However, surprisingly,
there are still some important differences in the resulting formalism; for example, we introduce a general notion of aggregates
based on the PRM approaches. One of our contributions is that we show how to support richer forms of structural uncertainty
in a probabilistic logical language than have been previously described. Our goal in this work is to present a unifying framework
that supports all of the types of relational uncertainty yet is based on logic programming formalisms. We also believe that
it facilitates understanding the relationship between the frame-based approaches and alternate logic programming approaches,
and allows greater transfer of ideas between them.
Editors: Hendrik Blockeel, David Jensen and Stefan Kramer
An erratum to this article is available at . 相似文献
14.
Twig query pattern matching is a core operation in XML query processing. Indexing XML documents for twig query processing
is of fundamental importance to supporting effective information retrieval. In practice, many XML documents on the web are
heterogeneous and have their own formats; documents describing relevant information can possess different structures. Therefore
some “user-interesting” documents having similar but non-exact structures against a user query are often missed out. In this
paper, we propose the RRSi, a novel structural index designed for structure-based query lookup on heterogeneous sources of XML documents supporting
proximate query answers. The index avoids the unnecessary processing of structurally irrelevant candidates that might show
good content relevance. An optimized version of the index, oRRSi, is also developed to further reduce both space requirements and computational complexity. To our knowledge, these structural
indexes are the first to support proximity twig queries on XML documents. The results of our preliminary experiments show
that RRSi and oRRSi based query processing significantly outperform previously proposed techniques in XML repositories with structural heterogeneity.
相似文献
Vincent T. Y. NgEmail: |
15.
In a moving-object database system that supports continuous queries (CQ), an important problem is to keep the location data consistent with the actual locations of the entities being monitored, in order to produce correct query results. This goal is often difficult to achieve due to limited network resources. However, if an object is not required by any query, its value need not be refreshed. Based on this observation, we redefine the notion of temporal consistency of data items with respect to the query result, where only data items that are relevant to the CQs need to be fresh. To exploit this correctness definition, we develop an adaptive time-based update technique called query-result update (QRU). The advantage of this technique is that it identifies objects with different levels of significance to the correctness of query results. Locations of objects that have more impact to the query result are acquired more frequently than the ones that do not. 相似文献
16.
A data model and algebra for probabilistic complex values 总被引:1,自引:0,他引:1
Thomas Eiter Thomas Lukasiewicz Michael Walter 《Annals of Mathematics and Artificial Intelligence》2001,33(2-4):205-252
We present a probabilistic data model for complex values. More precisely, we introduce probabilistic complex value relations, which combine the concept of probabilistic relations with the idea of complex values in a uniform framework. We elaborate a model-theoretic definition of probabilistic combination strategies, which has a rigorous foundation on probability theory. We then define an algebra for querying database instances, which comprises the operations of selection, projection, renaming, join, Cartesian product, union, intersection, and difference. We prove that our data model and algebra for probabilistic complex values generalizes the classical relational data model and algebra. Moreover, we show that under certain assumptions, all our algebraic operations are tractable. We finally show that most of the query equivalences of classical relational algebra carry over to our algebra on probabilistic complex value relations. Hence, query optimization techniques for classical relational algebra can easily be applied to optimize queries on probabilistic complex value relations. 相似文献
17.
18.
Due to the existence of many probabilistic lossy links in Wireless Sensor Networks (WSNs) (Liu et al., 2010) [25], it is not practical to study the network capacity issue under the Deterministic Network Model (DNM). A more realistic one is actually the Probabilistic Network Model (PNM). Therefore, we study the Snapshot Data Aggregation (SDA) problem, the Continuous Data Aggregation (CDA) problem, and their achievable capacities for probabilistic WSNs under both the independent and identically distributed (i.i.d.) node distribution model and the Poisson point distribution model in this paper. First, we partition a network into cells and use two vectors to further partition these cells into equivalent color classes. Subsequently, based on the partitioned cells and equivalent color classes, we propose a Cell-based Aggregation Scheduling (CAS) algorithm for the SDA problem in probabilistic WSNs. Theoretical analysis of CAS and the upper bound capacity of the SDA problem show that the achievable capacities of CAS are all order optimal in the worst case, the average case, and the best case. For the CDA problem in probabilistic WSNs, we propose a Level-based Aggregation Scheduling (LAS) algorithm. LAS gathers the aggregation values of continuous snapshots by forming a data aggregation/transmission pipeline on the segments and scheduling all the cell-levels in a cell-level class concurrently. By theoretical analysis of LAS and the upper bound capacity of the CDA problem, we prove that LAS also successfully achieves order optimal capacities in all the cases. The extensive simulation results further validate the effectiveness of CAS and LAS. 相似文献
19.
John H. M. De Vet 《Software》1989,19(5):491-504
This paper describes an algorithm for evaluating database queries represented as expressions in a logical language. Such a database query expression can be evaluated efficiently by focusing on the variable dependencies. The algorithm recursively computes the values of subexpressions to evaluate the input expression, but it avoids re-evaluation of those subexpressions whose values are not affected by new variable assignments. The input expression is internally structured as a directed acyclic graph. Two additional techniques to improve efficiency of the evaluation are discussed: transformations of the input expression and special primitive database operations. Finally, its implementation in the natural language question-answering system SPICOS is described. 相似文献
20.
The objective of this study is to develop a knowledge-base framework for generatingcooperative answers to indirect queries. Anindirect query can be considered as a nonstandard database query in which a user did not specify explicitly the information request. In a cooperative query answering system, a user's indirect query should be answered with an informative response, either anaffirmative response or anegative response, which is generated on the basis of the inference of the user's information request and the reformulation of the users' indirect query.This paper presents methods for inferring users' intended actions, determining users' information requirements, and for automatically reformulating indirect queries into direct queries. The inference process is carried out on the basis of a user model, calluser action model, as well as the query context. Two kinds ofinformative responses, i.e.affirmative responses andnegative responses can be generated by arule-based approach. 相似文献