首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Incompleteness due to missing attribute values (aka “null values”) is very common in autonomous web databases, on which user accesses are usually supported through mediators. Traditional query processing techniques that focus on the strict soundness of answer tuples often ignore tuples with critical missing attributes, even if they wind up being relevant to a user query. Ideally we would like the mediator to retrieve such possibleanswers and gauge their relevance by accessing their likelihood of being pertinent answers to the query. The autonomous nature of web databases poses several challenges in realizing this objective. Such challenges include the restricted access privileges imposed on the data, the limited support for query patterns, and the bounded pool of database and network resources in the web environment. We introduce a novel query rewriting and optimization framework QPIAD that tackles these challenges. Our technique involves reformulating the user query based on mined correlations among the database attributes. The reformulated queries are aimed at retrieving the relevant possibleanswers in addition to the certain answers. QPIAD is able to gauge the relevance of such queries allowing tradeoffs in reducing the costs of database query processing and answer transmission. To support this framework, we develop methods for mining attribute correlations (in terms of Approximate Functional Dependencies), value distributions (in the form of Naïve Bayes Classifiers), and selectivity estimates. We present empirical studies to demonstrate that our approach is able to effectively retrieve relevant possibleanswers with high precision, high recall, and manageable cost.  相似文献   

2.
The global skyline, as an important variant of the skyline, has been widely applied in multi-criteria decision making, business planning and data mining. In this paper, we extend our early work and propose the maintenance methods to process the subspace global skyline (SGS) queries in dynamic databases. In the previous work, we proposed the index structure RB-tree, which can effectively manage the data to accelerate the subspace global skyline calculation. Also, the basic single SGS algorithm based on RB-tree (SSRB) and the optimized single SGS algorithm (OSSRB) were proposed to process a single SGS query. In addition, the multiple SGS algorithm (MSRB) was proposed to calculate multiple SGS queries by sharing the scan spaces of different queries. In this paper, we design some data structures and propose the maintenance approaches of SSRB, OSSRB and MSRB to cope with updates that happen to data sets. Thus our extended algorithms can be adopted for dynamic data sets. Finally, the experimental results show that the proposed algorithms OSSRB and MSRB have good performance to process SGS queries and they can be easily maintained with dynamic datasets.  相似文献   

3.
This paper presents a query processing strategy for the content-based video query language named CVQL. By CVQL, users can flexibly specify query predicates by the spatial and temporal relationships of the content objects. The query processing strategy evaluates the predicates and returns qualified videos or frames as results. Before the evaluation of the predicates, a preprocessing is performed to avoid unnecessary accessing of videos which are impossible to be the answers. The preprocessing checks the existence of the content objects specified in the predicates to eliminate unqualified videos. For the evaluation of the predicates, an M-index is designed based on the analysis of the behaviors of the content objects. The M-index is employed to avoid frame-by-frame evaluation of the predicates. Experimental results are presented to illustrate the performance of this approach  相似文献   

4.
In our earlier work, we proposed an architecture for a Web-based video database management system (VDBMS) providing an integrated support for spatiotemporal and semantic queries. In this paper, we focus on the task of spatiotemporal query processing and also propose an SQL-like video query language that has the capability to handle a broad range of spatiotemporal queries. The language is rule-based in that it allows users to express spatial conditions in terms of Prolog-type predicates. Spatiotemporal query processing is carried out in three main stages: query recognition, query decomposition, and query execution.Received: 11 October 2001, Accepted: 3 October 2003, Published online: 12 December 2003Edited by: A. Buchmann Correspondence to: Özgür UlusoyThis work is supported by the Scientific and Research Council of Turkey (TÜBITAK) under Project Code 199E025. This work was done while the first author was at Bilkent University.  相似文献   

5.
Lee  D.L. Leung  Y.Y. 《Software, IEEE》1993,10(6):66-74
A special-purpose algorithm, that analyzes the structure of a recursion and exploits its properties in query processing in a deductive database is presented. This method is applied to linear rules, a large and common class of recursion. The structural approach to rule processing (SARP) prototype system that implements the algorithm is described  相似文献   

6.
The skyline-join operator, as an important variant of skylines, plays an important role in multi-criteria decision making problems. However, as the data scale increases, previous methods of skyline-join queries cannot be applied to new applications. Therefore, in this paper, it is the first attempt to propose a scalable method to process skyline-join queries in distributed databases. First, a tailored distributed framework is presented to facilitate the computation of skyline-join queries. Second, the distributed skyline-join query algorithm (DSJQ) is designed to process skyline-join queries. DSJQ contains two phases. In the first phase, two filtering strategies are used to filter out unpromising tuples from the original tables. The remaining tuples are transmitted to the corresponding data nodes according a partition function, which can guarantee that the tuples with the same join value are transferred to the same node. In the second phase, we design a scheduling plan based on rotations to calculate the final skyline-join result. The scheduling plan can ensure that calculations are equally assigned to all the data nodes, and the calculations on each data node can be processed in parallel without creating a bottleneck node. Finally, the effectiveness of DSJQ is evaluated through a series of experiments.  相似文献   

7.
This paper presents a framework for querying inconsistent databases in the presence of functional dependencies. Most of the works dealing with the problem of extracting reliable information from inconsistent databases are based on the notion of repair, a minimal set of tuple insertions and deletions which leads the database to a consistent state (called repaired database), and the notion of consistent query answer, a query answer that can be obtained from every repaired database. In this work, both the notion of repair and query answer differ from the original ones. In the presence of functional dependencies, tuple deletions are the only operations that are performed in order to restore the consistency of an inconsistent database. However, deleting a tuple to remove an integrity violation potentially eliminates useful information in that tuple. In order to cope with this problem, we adopt a notion of repair, based on tuple updates, which allows us to better preserve information in the source database. A drawback of the notion of consistent query answer is that it does not allow us to discriminate among non-consistent answers, namely answers which can be obtained from a non-empty proper subset of the repaired databases. To obtain more informative query answers, we propose the notion of probabilistic query answer, that is query answers are tuples associated with probabilities. This new semantics of query answering over inconsistent databases allows us to give a measure of uncertainty to query answers. We show that the problem of computing probabilistic query answers is FP #P -complete. We also propose a technique for computing probabilistic answers to arbitrary relational algebra queries.  相似文献   

8.
RPE query processing and optimization techniques for XML databases   总被引:5,自引:1,他引:4       下载免费PDF全文
An extent join to compute path expressions containing parent-children and ancestor-descendent operations and two path expression optimization rules, path-shortening and path-complementing, are presented in this paper. Path-shortening reduces the number of joins by shortening the path while path-complementing optimizes the path execution by using an equivalent complementary path expression to compute the original one. Experimental results show that the algorithms proposed are more efficient than traditional algorithms.  相似文献   

9.
Object-based directional query processing in spatial databases   总被引:4,自引:0,他引:4  
Direction-based spatial relationships are critical in many domains, including geographic information systems (GIS) and image interpretation. They are also frequently used as selection conditions in spatial queries. In this paper, we explore the processing of object-based direction queries and propose a new open shape-based strategy (OSS). OSS models the direction region as an open shape and converts the processing of the direction predicates into the processing of topological operations between open shapes and closed geometry objects. The proposed strategy OSS makes it unnecessary to know the boundary of the embedding world and also eliminates the computation related to the world boundary. OSS reduces both I/O and CPU costs by greatly improving the filtering effectiveness. Our experimental evaluation shows that OSS consistently outperforms classical range query strategies (RQS) while the degree of performance improvement varies by several parameters. Experimental results also demonstrate that OSS is more scalable than RQS for large data sets.  相似文献   

10.
XML is an ordered data model and XQuery expressions return results that have a well-defined order. However, little work on how order is supported in XML query processing has been done to date. In this paper we study the issues related to handling order in the XML context, namely challenges imposed by the XML data model, the variety of order requirements of the XQuery language, and the need to maintain order in the presence of updates to the XML data. We propose an efficient solution that addresses all these issues. Our solution is based on a key encoding for XML nodes that serves as node identity and at the same time encodes order. We design rules for encoding order of processed XML nodes based on the XML algebraic query execution model and the node key encoding. These rules do not require any actual sorting for intermediate results during execution. Our approach enables efficient order-sensitive incremental view maintenance as it makes most XML algebra operators distributive with respect to bag union. We prove the correctness of our order encoding approach. Our approach is implemented and integrated with Rainbow, an XML data management system developed at WPI. We have tested the efficiency of our approach using queries that have different order requirements. We have also measured the relative cost of different components related to our order solution in different types of queries. In general the overhead of maintaining order in our approach is very small relative to the query processing time.  相似文献   

11.
Spatial range query is one of the most common queries in spatial databases, where a user invokes a query to find all the surrounding interest objects. Most studies in range search consider Euclidean distances to retrieve the result in low cost, but with poor accuracy (i.e., Euclidean distance less than or equal network distance). Thus, researchers show that range search in network distance retrieves the results with high accuracy but with a vast amount of network distance computations. However, both of these techniques retrieve all objects in a given radius with a high number of false hits. Yet, in many situations, retrieving all objects is not necessary, especially when there are already enough objects closer to the query point. Also, when the radius of the search increases, a demotion in the performance will occur. Hence, approximate results are valuable just as the exact result, and approximate results can be obtained much faster than the exact result and are less costly. In this paper, we propose two approximate range search methods in spatial road network, namely approximate range Euclidean restriction and approximate range network expansion, to reduce the number of false hits and the number of network distance computations in a considerable manner. After the verification, these two methods are shown to be robust and accurate.  相似文献   

12.
K.  Wen-Syan  M.   《Data & Knowledge Engineering》2000,35(3):259-298
Since media-based evaluation yields similarity values, results to a multimedia database query, Q(Y1,…,Yn), is defined as an ordered list SQ of n-tuples of the form X1,…,Xn. The query Q itself is composed of a set of fuzzy and crisp predicates, constants, variables, and conjunction, disjunction, and negation operators. Since many multimedia applications require partial matches, SQ includes results which do not satisfy all predicates. Due to the ranking and partial match requirements, traditional query processing techniques do not apply to multimedia databases. In this paper, we first focus on the problem of “given a multimedia query which consists of multiple fuzzy and crisp predicates, providing the user with a meaningful final ranking”. More specifically, we study the problem of merging similarity values in queries with multiple fuzzy predicates. We describe the essential multimedia retrieval semantics, compare these with the known approaches, and propose a semantics which captures the requirements of multimedia retrieval problem. We then build on these results in answering the related problem of “given a multimedia query which consists of multiple fuzzy and crisp predicates, finding an efficient way to process the query.” We develop an algorithm to efficiently process queries with unordered fuzzy predicates (sub-queries). Although this algorithm can work with different fuzzy semantics, it benefits from the statistical properties of the semantics proposed in this paper. We also present experimental results for evaluating the proposed algorithm in terms of quality of results and search space reduction.  相似文献   

13.
Nowadays, the road network has gained more and more attention in the research area of databases. Existing works mainly focus on standalone queries, such as k-nearest neighbor queries over a single type of objects (e.g., facility like restaurant or hotel). In this paper, we propose a k-multi-preference (kMP) query over road networks, involving complex query predicates and multiple facilities. In particular, given a query graph, a kMP query retrieves of the top-k groups of vertices (of k facility types) satisfying the label constraints and their aggregate distances are the smallest. A naïve solution to this problem is to enumerate all combinations of vertices with k possible facility types and then select the one with the minimum sum distance. This method, however, incurs rather high computation cost due to exponential possible combinations. In addition, the existing solutions to other standalone queries are for a single type of facilities and cannot be directly used to answer kMP queries. Therefore, in this paper, we propose an efficient approach to process a kMP query, which utilizes an index with bounded space and reduces the computation cost of the shortest path queries. We also design effective pruning techniques to filter out false alarms. Through our extensive experiments, we demonstrate the efficiency and effectiveness of our proposed solutions.  相似文献   

14.
This paper presents a query processing algorithm, formulated and developed in support of the prototype architecture of the Distributed Access View Integrated Database (DAVID) which is a heterogeneous distributed database management system. The objective of the proposed query processing algorithm is to produce an inexpensive strategy for a given query. The inexpensive query strategy is obtained primarily by computing the most profitable semi-joins and by determining the best sequence of join operations per processing site. The latter is obtained by applying a zero-one integer linear program that uses a non-parametric statistical estimation technique to compute the sizes of the temporary clusters. A cluster is a subset of the cartesian product of a list of atomic and non-atomic domains and is the structure that can represent in a uniform way data stored in relational, hierarchical and network databases.Following some background information on the development of the DAVID prototype, this paper introduces the schema architecture. The schema architecture describes the mechanism by which the component heterogeneous database schemata are mapped into the uniform global schema. This is followed by the formulation of the query processing algorithm, its implementation and an illustration of its use in the context of NASA's Astrophysics Data System.Recommended by: Y. Breitbart  相似文献   

15.
《Intelligent Data Analysis》1998,2(1-4):139-160
Current methods to learn Bayesian Networks from incomplete databases share the common assumption that the unreported data are missing at random. This paper describes a method—called Bound and Collapse (BC)—to learn Bayesian Networks from incomplete databases which allows the analyst to efficiently integrate information provided by the observed data and exogenous knowledge about the pattern of missing data. BC starts by bounding the set of estimates consistent with the available information and then collapses the resulting set to a point estimate via a convex combination of the extreme points, with weights depending on the assumed pattern of missing data. Experiments comparing BC to Gibbs Sampling are provided.  相似文献   

16.
Continuous visible nearest neighbor query processing in spatial databases   总被引:1,自引:0,他引:1  
In this paper, we identify and solve a new type of spatial queries, called continuous visible nearest neighbor (CVNN) search. Given a data set P, an obstacle set O, and a query line segment q in a two-dimensional space, a CVNN query returns a set of \({\langle p, R\rangle}\) tuples such that \({p \in P}\) is the nearest neighbor to every point r along the interval \({R \subseteq q}\) as well as p is visible to r. Note that p may be NULL, meaning that all points in P are invisible to all points in R due to the obstruction of some obstacles in O. In contrast to existing continuous nearest neighbor query, CVNN retrieval considers the impact of obstacles on visibility between objects, which is ignored by most of spatial queries. We formulate the problem, analyze its unique characteristics, and develop efficient algorithms for exact CVNN query processing. Our methods (1) utilize conventional data-partitioning indices (e.g., R-trees) on both P and O, (2) tackle the CVNN search by performing a single query for the entire query line segment, and (3) only access the data points and obstacles relevant to the final query result by employing a suite of effective pruning heuristics. In addition, several interesting variations of CVNN queries have been introduced, and they can be supported by our techniques, which further demonstrates the flexibility of the proposed algorithms. A comprehensive experimental evaluation using both real and synthetic data sets has been conducted to verify the effectiveness of our proposed pruning heuristics and the performance of our proposed algorithms.  相似文献   

17.
Two types of parallel processing and optimization algorithms for processing object-oriented databases are the hybrid-hash pointer-based (HHP) algorithms and multi-wavefront (MWF) algorithms. We analyze these two algorithms and develop analytical formulas to capture their main performance features. We study their performance in three application environments, characterized by large databases having many object classes, each of which, respectively, (1) contains a large number of instances; (2) contains a relatively small number of instances; and (3) is of varying size. A horizontal data partitioning strategy is used in (1). A class-per-node assignment strategy is used in (2). In (3), object classes are partitioned horizontally and assigned to a varying number of processors depending on their different sizes. The MWF algorithm has three distinguishing features which contribute to its better performance: (a) a two-phase processing strategy, (b) vertical partitioning of horizontal segments, and (c) dynamic determination of the collision point in MWF propagations, which results in an optimized query execution plan. If these features are adopted by an HHP algorithm, its performance is comparable with that of the MWF algorithm because the difference in CPU time between them is negligible. The computing environment is a network of workstations having a shared-nothing architecture. The schema and some queries selected from the OO7 benchmark are used in the performance analyses and comparisons. The queries are modified slightly in different data environments in order to reflect the features of diverse database applications  相似文献   

18.
Modern spatial database applications built on top of distributed and heterogeneous spatial information sources such as conventional spatial databases underlying Geographical Information Systems (GIS), spatial data files and spatial information acquired or inferred from the Web, suffer from data integration and topological consistency problems. This more-and-more conveys in incomplete information, which makes answering range queries over incomplete spatial databases a leading research challenge in spatial database systems research. A significant instance of this setting is represented by the application scenario in which the geometrical information on a sub-set of spatial database objects is incomplete whereas the spatial database still stores topological relations among these objects (e.g., containment relations). Focusing on the spatial database application scenario above, in this paper we propose and experimentally assess a novel technique for efficiently answering range queries over incomplete spatial databases via integrating geometrical information and topological reasoning. We also propose I-SQE (Spatial Query Engine for Incomplete Information), an innovative query engine implementing this technique. Our proposed technique results not only effective but also efficient against both synthetic and real-life spatial data sets, and it finally allows us to enhance the quality and the expressive power of retrieved answers by meaningfully taking advantages from the amenity of representing spatial database objects via both the geometrical and the topological level.  相似文献   

19.
We present a new access method, called the path dictionary index (PDI) method, for supporting nested queries on object-oriented databases. PDI supports object traversal and associative search, respectively, with a path dictionary and a set of attribute indexes built on top of the path dictionary. We discuss issues on indexing and query processing in object-oriented databases; describe the operations of the new mechanism; develop cost models for its storage overhead and query and update costs; and compare the new mechanism to the path index method. The result shows that the path dictionary index method is significantly better than the path index method over a wide range of parameters in terms of retrieval and update costs and that the storage overhead grows slowly with the number of indexed attributes  相似文献   

20.
Vertical partitioning is a design technique for reducing the number of disk accesses to execute a given set of queries by minimizing the number of irrelevant instance variables accessed. This is accomplished by grouping the frequently accessed instance variables as vertical class fragments. The complexity of object-oriented database models due to subclass hierarchy and class composition hierarchy complicates the definition and representation of vertical partitioning of the classes, which makes the problem of vertical partitioning in OODBs very challenging. In this paper, we develop a comprehensive analytical cost model for processing of queries on vertically partitioned OODB classes. A set of analytical evaluation results is presented to show the effect of vertical partitioning, and to study the trade-off between the projection ratio versus selectivity factor vis-a-vis sequential versus index access. Furthermore, an empirical experimental prototype supporting vertical class partitioning has been implemented on a commercial OODB tool kit to validate our analytical cost model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号