期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient processing of probabilistic group subspace skyline queries in uncertain databases

Xiang Lian Lei Chen 《Information Systems》2013

Due to the pervasive data uncertainty in many real applications, efficient and effective query answering on uncertain data has recently gained much attention from the database community. In this paper, we propose a novel and important query in the context of uncertain databases, namely probabilistic group subspace skyline (PGSS) query, which is useful in applications like sensor data analysis. Specifically, a PGSS query retrieves those uncertain objects that are, with high confidence, not dynamically dominated by other objects, with respect to a group of query points in ad-hoc subspaces. In order to enable fast PGSS query answering, we propose effective pruning methods to reduce the PGSS search space, which are seamlessly integrated into an efficient PGSS query procedure. Furthermore, to achieve low query cost, we provide a cost model, in light of which uncertain data are pre-processed and indexed. Extensive experiments have been conducted to demonstrate the efficiency and effectiveness of our proposed approaches. 相似文献

2.

Query result caching for multiple event-driven continuous queries

Yousuke Watanabe Hiroyuki Kitagawa 《Information Systems》2010,35(1):94-110

With the increasing demands for advanced use of streaming data, efficient execution of continuous queries is an important research issue. This paper focuses on event-driven continuous queries that are activated by foreign events such as data arrival and the progression of time. Existing approaches to multiple continuous query optimization decide the optimal query plan by extracting common subexpressions from the given queries. Event-driven queries containing the common subexpressions may produce many common intermediate results when they are activated within a small interval, but may produce only disjoint data when activated at completely different timings.This paper proposes an efficient data stream processing scheme for multiple event-driven continuous queries. In the proposed approach, we introduce query result caching to achieve a flexible way to share common operators among queries activated by unpredictable events. When a query is activated, an intermediate result generated for the query is stored into the cache area if it is expected to be reused by other queries. When other queries including the same operator are activated, they reuse the cached result if the cache includes reusable data. Efficiency of the proposed scheme is validated by intensive experimental evaluations. 相似文献

3.

A cache invalidation scheme for continuous partial match queries in mobile computing environments 总被引：1，自引：0，他引：1

Yon Dohn Chung 《Distributed and Parallel Databases》2008,23(3):207-234

The continuous partial match query is a partial match query whose result remains consistently in the client’s memory. Conventional cache invalidation methods for mobile clients are record ID-based. However, since the partial match query uses content-based retrieval, the conventional ID-based approaches cannot efficiently manage the cache consistency of mobile clients. In this paper, we propose a predicate-based cache invalidation scheme for continuous partial match queries in mobile computing environments. We represent the cache state of a mobile client as a predicate, and also construct a cache invalidation report (CIR), which the server broadcasts to clients for cache management, with predicates. In order to reduce the amount of information that is needed for cache management, we propose a set of methods for CIR construction (in the server) and identification of invalidated data (in the client). Through experiments, we show that the predicate-based approach is very effective for the cache management of mobile clients. 相似文献

4.

Hyo-Sang Lim Yang-Sae Moon 《Information Sciences》2008,178(6):1461-1478

We propose a new similar sequence matching method that efficiently supports variable-length and variable-tolerance continuous query sequences on time-series data stream. Earlier methods do not support variable lengths or variable tolerances adequately for continuous query sequences if there are too many query sequences registered to handle in main memory. To support variable-length query sequences, we use the window construction mechanism that divides long sequences into smaller windows for indexing and searching the sequences. To support variable-tolerance query sequences, we present a new notion of intervaled sequences whose individual entries are an interval of real numbers rather than a real number itself. We also propose a new similar sequence matching method based on these notions, and then, formally prove correctness of the method. In addition, we show that our method has the prematching characteristic, which finds future candidates of similar sequences in advance. Experimental results show that our method outperforms the naive one by 2.6-102.1 times and the existing methods in the literature by 1.4-9.8 times over the entire ranges of parameters tested when the query selectivities are low (<32%), which are practically useful in large database applications. 相似文献

5.

A query index for continuous queries on RFID streaming data

Jaekwan Park Bonghee Hong Chaehoon Ban 《中国科学F辑(英文版)》2008,51(12):2047-2061

RFID middleware collects and filters RFID streaming data to process applications＇ requests called continuous queries, because they are executed continuously during tag movement. Several approaches to building an index on queries rather than data records, called a query index, have been proposed to evaluate continuous queries over streaming data. EPCglobal proposed an Event Cycle Specification （ECSpec） model, which is a de facto standard query interface for RFID applications. Continuous queries based on ECSpec consist of a large number of segments that represent the query conditions. The problem when using any of the existing query indexes on these continuous queries is that it takes a long time to build the index, because it is necessary to insert a large number of segments into the index. To solve this problem, we propose a transform method that converts a group of segments into compressed data. We also propose an efficient query index scheme for the transformed space. Comparing with existing query indexes, the performance of proposed index outperforms the others on various datasets. 相似文献

6.

数据流多连续查询优化技术

赵宗敏王洋吴海涛《计算机应用》2009,29(Z2)

根据数据流连续达到、大小无界和实时性强的特点,引出数据流多连续查询的基本概念.针对多连续查询的特点和用户的需求,将多连续查询优化技术分为单流多查询和多流多查询.详细论述了单流过滤型多连续查询优化技术和基于共享的多流多连续查询优化技术,通过全面系统地分析每种优化算法的基本思想,得出每种查询技术的优缺点及适用场合. 相似文献

7.

Adaptive optimization for multiple continuous queries

Hong Kyu Park Author VitaeWon Suk LeeAuthor Vitae 《Data & Knowledge Engineering》2012,71(1):29-46

Because it operates under a strict time constraint, query processing for data streams should be continuous and rapid. To guarantee this constraint, most previous researches optimize the evaluation order of multiple join operations in a set of continuous queries using a greedy optimization strategy so that the order is re-optimized dynamically in run-time due to the time-varying characteristics of data streams. However, this method often results in a sub-optimal plan because the greedy strategy traces only the first promising plan. This paper proposes a new multiple query optimization approach, Adaptive Sharing-based Extended Greedy Optimization Approach (A-SEGO), that traces multiple promising partial plans simultaneously. A-SEGO presents a novel method for sharing the results of common sub-expressions in a set of queries cost-effectively. The number of partial plans can be flexibly controlled according to the query processing workload. In addition, to avoid invoking the optimization process too frequently, optimization is performed only when the current execution plan is relatively no longer efficient. A series of experiments are comparatively analyzed to evaluate the performance of the proposed method in various stream environments. 相似文献

8.

An approach to process continuous location-dependent queries on moving objects with support for location granules

Sergio Ilarri^{Author Vitae} Carlos Bobed Author VitaeEduardo Mena Author Vitae 《Journal of Systems and Software》2011,84(8):1327-1350

Location-based services have attracted the attention of important research in the field of mobile computing. Specifically, different mechanisms have been proposed to process location-dependent queries. In the above mentioned context, it is usually assumed that the location data are expressed at a fine geographic precision. However, a different granularity may be more appropriate in certain situations. Thus, a location resolution higher than required may even be inconvenient or not understandable by the user (for example, if the user expects a city name as an answer and instead the system provides the latitude/longitude coordinates). Moreover, if the locations presented to the user need to be refreshed automatically as the objects move, it is obvious that maintaining up-to-date GPS-like geographic coordinates would be more expensive in terms of processing and communication. Unfortunately, the existing approaches assume queries whose locations are always given with maximum precision (i.e., GPS locations).In this paper, a distributed query processing approach that adapts itself to the level of the location resolution required is presented. Thus, it supports continuous location-dependent queries based on the required terminology for the locations, depending on the granularity used (e.g., GPS, cities, states, provinces, or any other predefined geographic area). For the above mentioned purpose, location granules can be defined to specify the semantics appropriate for the queries and/or the way the results should be presented. A prototype showing the functionality and benefits of the approach has been implemented and used in an extensive experimental evaluation. The proposal not only increases the flexibility and expressive power of the queries considerably but also performs efficiently. 相似文献

9.

A Framework for Management of Semistructured Probabilistic Data

Wenzhong?Zhao Alex?Dekhtyar Email author Judy?Goldsmith 《Journal of Intelligent Information Systems》2005,25(3):293-332

This paper describes the theoretical framework and implementation of a database management system for storing and manipulating diverse probability distributions of discrete random variables with finite domains, and associated information. A formal Semistructured Probabilistic Object (SPO) data model and a Semistructured Probabilistic Query Algebra (SP-algebra) are proposed. The SP-algebra supports standard database queries as well as some specific to probabilities, such as conditionalization and marginalization. Thus, the Semistructured Probabilistic Database may be used as a backend to any application that involves the management of large quantities of probabilistic information, such as building stochastic models. The implementation uses XML encoding of SPOs to facilitate communication with diverse applications. The database management system has been implemented on top of a relational DBMS. The translation of SP-algebra queries into relational queries are discussed here, and the results of initial experiments evaluating the system are reported. Work performed while a Ph.D. student at the University of Kentucky. 相似文献

10.

A statistics-based sensor selection scheme for continuous probabilistic queries in sensor networks

Song Han Edward Chan Reynold Cheng Kam-Yiu Lam 《Real-Time Systems》2007,35(1):33-58

A common approach to improve the reliability of query results based on error-prone sensors is to introduce redundant sensors. However, using multiple sensors to generate the value for a data item can be expensive, especially in wireless environments where continuous queries are executed. Moreover, some sensors may not be working properly and their readings need to be discarded. In this paper, we propose a statistical approach to decide which sensor nodes to be used to answer a query. In particular, we propose to solve the problem with the aid of continuous probabilistic query (CPQ), which is originally used to manage uncertain data and is associated with a probabilistic guarantee on the query result. Based on the historical data values from the sensor nodes, the query type, and the requirement on the query, we present methods to select an appropriate set of sensors and provide reliable answers for several common aggregate queries. Our statistics-based sensor node selection algorithm is demonstrated in a number of simulation experiments, which shows that a small number of sensor nodes can provide accurate and robust query results. 相似文献

11.

Representing and processing lineages over uncertain data based on the Bayesian network

《Applied Soft Computing》2015

Processing lineages (also called provenances) over uncertain data consists in tracing the origin of uncertainty based on the process of data production and evolution. In this paper, we focus on the representation and processing of lineages over uncertain data, where we adopt Bayesian network (BN), one of the popular and important probabilistic graphical models (PGMs), as the framework of uncertainty representation and inferences. Starting from the lineage expressed as Boolean formulae for SPJ (Selection–Projection–Join) queries over uncertain data, we propose a method to transform the lineage expression into directed acyclic graphs (DAGs) equivalently. Specifically, we discuss the corresponding probabilistic semantics and properties to guarantee that the graphical model can support effective probabilistic inferences in lineage processing theoretically. Then, we propose the function-based method to compute the conditional probability table (CPT) for each node in the DAG. The BN for representing lineage expressions over uncertain data, called lineage BN and abbreviated as LBN, can be constructed while generally suitable for both safe and unsafe query plans. Therefore, we give the variable-elimination-based algorithm for LBN's exact inferences to obtain the probabilities of query results, called LBN-based query processing. Then, we focus on obtaining the probabilities of inputs or intermediate tuples conditioned on query results, called LBN-based inference query processing, and give the Gibbs-sampling-based algorithm for LBN's approximate inferences. Experimental results show the efficiency and effectiveness of our methods. 相似文献

12.

Vague continuous K-nearest neighbor queries over moving objects with uncertain velocity in road networks 总被引：1，自引：0，他引：1

Ping Fan Guohui LiLing Yuan Yanhong Li 《Information Systems》2012,37(1):13-32

Recent research has focused on Continuous K Nearest Neighbor (CKNN) queries in road networks, where the queries and the data objects are moving. Most existing approaches assume the fixed velocity of moving objects. The release of fixed moving velocity makes the query process slowly due to the significant repetitive query cost. In this paper, we study CKNN queries over moving objects with uncertain velocity in road networks. A Distance Interval Model (DIM) is designed to calculate the minimal and maximal road network distances between moving objects and query point. Furthermore, we propose a novel Possibility-based Vague KNN (PVKNN) algorithm to process the query efficiently, which determines the CKNN query results with possibility within each division time subinterval of given time interval by applying the vague set theory. In the PVKNN algorithm, the query efficiency can be improved significantly with the pruning, distilling and possibility-ranking phases. With these phases, the objects candidates are scaled down and the given time interval is divided into subintervals to reduce the repetitive query cost. In addition, an index structure TPR^uv-Tree is designed to efficiently index moving objects with uncertain velocity in road network by involving edge connection and moving objects information. Experiments with simulation and comparison show that significant improvement in the performance of efficiency can be achieved with our proposed algorithms. 相似文献

13.

PRL: A probabilistic relational language 总被引：1，自引：0，他引：1

Lise Getoor John Grant 《Machine Learning》2006,62(1-2):7-31

In this paper, we describe the syntax and semantics for a probabilistic relational language (PRL). PRL is a recasting of recent work in Probabilistic Relational Models (PRMs) into a logic programming framework. We show how to represent varying degrees of complexity in the semantics including attribute uncertainty, structural uncertainty and identity uncertainty. Our approach is similar in spirit to the work in Bayesian Logic Programs (BLPs), and Logical Bayesian Networks (LBNs). However, surprisingly, there are still some important differences in the resulting formalism; for example, we introduce a general notion of aggregates based on the PRM approaches. One of our contributions is that we show how to support richer forms of structural uncertainty in a probabilistic logical language than have been previously described. Our goal in this work is to present a unifying framework that supports all of the types of relational uncertainty yet is based on logic programming formalisms. We also believe that it facilitates understanding the relationship between the frame-based approaches and alternate logic programming approaches, and allows greater transfer of ideas between them. Editors: Hendrik Blockeel, David Jensen and Stefan Kramer An erratum to this article is available at . 相似文献

14.

RRSi: indexing XML data for proximity twig queries

Patrick K. L. Ng Vincent T. Y. Ng 《Knowledge and Information Systems》2008,17(2):193-216

Twig query pattern matching is a core operation in XML query processing. Indexing XML documents for twig query processing is of fundamental importance to supporting effective information retrieval. In practice, many XML documents on the web are heterogeneous and have their own formats; documents describing relevant information can possess different structures. Therefore some “user-interesting” documents having similar but non-exact structures against a user query are often missed out. In this paper, we propose the RRSi, a novel structural index designed for structure-based query lookup on heterogeneous sources of XML documents supporting proximate query answers. The index avoids the unnecessary processing of structurally irrelevant candidates that might show good content relevance. An optimized version of the index, oRRSi, is also developed to further reduce both space requirements and computational complexity. To our knowledge, these structural indexes are the first to support proximity twig queries on XML documents. The results of our preliminary experiments show that RRSi and oRRSi based query processing significantly outperform previously proposed techniques in XML repositories with structural heterogeneity.

Vincent T. Y. NgEmail:

相似文献

15.

An efficient location update mechanism for continuous queries over moving objects

Reynold Cheng Kam-Yiu Lam Sunil Prabhakar Biyu Liang 《Information Systems》2007

In a moving-object database system that supports continuous queries (CQ), an important problem is to keep the location data consistent with the actual locations of the entities being monitored, in order to produce correct query results. This goal is often difficult to achieve due to limited network resources. However, if an object is not required by any query, its value need not be refreshed. Based on this observation, we redefine the notion of temporal consistency of data items with respect to the query result, where only data items that are relevant to the CQs need to be fresh. To exploit this correctness definition, we develop an adaptive time-based update technique called query-result update (QRU). The advantage of this technique is that it identifies objects with different levels of significance to the correctness of query results. Locations of objects that have more impact to the query result are acquired more frequently than the ones that do not. 相似文献

16.

A data model and algebra for probabilistic complex values 总被引：1，自引：0，他引：1

Thomas Eiter Thomas Lukasiewicz Michael Walter 《Annals of Mathematics and Artificial Intelligence》2001,33(2-4):205-252

We present a probabilistic data model for complex values. More precisely, we introduce probabilistic complex value relations, which combine the concept of probabilistic relations with the idea of complex values in a uniform framework. We elaborate a model-theoretic definition of probabilistic combination strategies, which has a rigorous foundation on probability theory. We then define an algebra for querying database instances, which comprises the operations of selection, projection, renaming, join, Cartesian product, union, intersection, and difference. We prove that our data model and algebra for probabilistic complex values generalizes the classical relational data model and algebra. Moreover, we show that under certain assumptions, all our algebraic operations are tractable. We finally show that most of the query equivalences of classical relational algebra carry over to our algebra on probabilistic complex value relations. Hence, query optimization techniques for classical relational algebra can easily be applied to optimize queries on probabilistic complex value relations. 相似文献

17.

一个新的概率数据模型 总被引：1，自引：0，他引：1

袁鼎荣严小卫陈宏朝《计算机应用研究》2003,20(10):65-67,73

为扩展传统关系数据模型,处理概率方面的不确定性数据,以描述动态的世界,建立了一个新型的数据模型——PRM模型。在该模型中引进概率属性,并提出对象属性、静态属性、动态属性、概率属性联合构成元组,形成二维关系表中的一个动态数据理论;为克服当前在概率数据模型研究方面的缺陷,定义了一个新的PRM模型体系及操作,使其具有处理概率数据的能力。相似文献

18.

Continuous data aggregation and capacity in probabilistic wireless sensor networks

Shouling Ji Jing He Yi Pan Yingshu Li 《Journal of Parallel and Distributed Computing》2013

Due to the existence of many probabilistic lossy links in Wireless Sensor Networks (WSNs) (Liu et al., 2010) [25], it is not practical to study the network capacity issue under the Deterministic Network Model (DNM). A more realistic one is actually the Probabilistic Network Model (PNM). Therefore, we study the Snapshot Data Aggregation (SDA) problem, the Continuous Data Aggregation (CDA) problem, and their achievable capacities for probabilistic WSNs under both the independent and identically distributed (i.i.d.) node distribution model and the Poisson point distribution model in this paper. First, we partition a network into cells and use two vectors to further partition these cells into equivalent color classes. Subsequently, based on the partitioned cells and equivalent color classes, we propose a Cell-based Aggregation Scheduling (CAS) algorithm for the SDA problem in probabilistic WSNs. Theoretical analysis of CAS and the upper bound capacity of the SDA problem show that the achievable capacities of CAS are all order optimal in the worst case, the average case, and the best case. For the CDA problem in probabilistic WSNs, we propose a Level-based Aggregation Scheduling (LAS) algorithm. LAS gathers the aggregation values of continuous snapshots by forming a data aggregation/transmission pipeline on the segments and scheduling all the cell-levels in a cell-level class concurrently. By theoretical analysis of LAS and the upper bound capacity of the CDA problem, we prove that LAS also successfully achieves order optimal capacities in all the cases. The extensive simulation results further validate the effectiveness of CAS and LAS. 相似文献

19.

A practical algorithm for evaluating database queries

John H. M. De Vet 《Software》1989,19(5):491-504

This paper describes an algorithm for evaluating database queries represented as expressions in a logical language. Such a database query expression can be evaluated efficiently by focusing on the variable dependencies. The algorithm recursively computes the values of subexpressions to evaluate the input expression, but it avoids re-evaluation of those subexpressions whose values are not affected by new variable assignments. The input expression is internally structured as a directed acyclic graph. Two additional techniques to improve efficiency of the evaluation are discussed: transformations of the input expression and special primitive database operations. Finally, its implementation in the natural language question-answering system SPICOS is described. 相似文献

20.

A knowledge-based system for generating informative responses to indirect database queries

Xu Wu Nick Cercone Tadao Ichikawa 《Journal of Intelligent Information Systems》1995,5(1):5-23

The objective of this study is to develop a knowledge-base framework for generatingcooperative answers to indirect queries. Anindirect query can be considered as a nonstandard database query in which a user did not specify explicitly the information request. In a cooperative query answering system, a user's indirect query should be answered with an informative response, either anaffirmative response or anegative response, which is generated on the basis of the inference of the user's information request and the reformulation of the users' indirect query.This paper presents methods for inferring users' intended actions, determining users' information requirements, and for automatically reformulating indirect queries into direct queries. The inference process is carried out on the basis of a user model, calluser action model, as well as the query context. Two kinds ofinformative responses, i.e.affirmative responses andnegative responses can be generated by arule-based approach. 相似文献