共查询到20条相似文献,搜索用时 15 毫秒
1.
The collective processing of multiple queries in a database system has recently received renewed attention due to its capability of improving the overall performance of a database system and its applicability to the design of knowledge-based expert systems and extensible database systems. A new multiple query processing strategy is presented which utilizes semantic knowledge on data integrity and information on predicate conditions of the access paths (plans) of queries. The processing of multiple queries is accomplished by the utilization of subset relationships between intermediate results of query executions, which are inferred employing both semantic and logical information. Given a set of fixed order access plans, the A* algorithm is used to find the set of reformulated access plans which is optimal for a given collection of semantic knowledge. 相似文献
2.
In spatial networks, clustering adjacent data to disk pages is highly likely to reduce the number of disk page accesses made by the aggregate network operations during query processing. For this purpose, different techniques based on the clustering graph model are proposed in the literature. In this work, we show that the state-of-the-art clustering graph model is not able to correctly capture the disk access costs of aggregate network operations. Moreover, we propose a novel clustering hypergraph model that correctly captures the disk access costs of these operations. The proposed model aims to minimize the total number of disk page accesses in aggregate network operations. Based on this model, we further propose two adaptive recursive bipartitioning schemes to reduce the number of allocated disk pages while trying to minimize the number of disk page accesses. We evaluate our clustering hypergraph model and recursive bipartitioning schemes on a wide range of road network datasets. The results of the conducted experiments show that the proposed model is quite effective in reducing the number of disk accesses incurred by the network operations. 相似文献
3.
Semijoin has traditionally been relied upon to reduce the cost of data transmission for distributed query processing. However, judiciously applying join operations as reducers can lead to further reduction in the amount of data transmission required. In view of this fact, we explore the approach of using join operations as reducers in distributed query processing. We first show that the problem of determining a sequence of join operations for a query can be transformed to that of finding a specific type of set of cuts to the corresponding query graph, where a cut to a graph is a partition of nodes in that graph. Then, in light of this concept, we prove that the problem of determining the optimal sequence of join operations for a given query graph is of exponential complexity, thus justifying the necessity of applying heuristic approaches to solve this problem. By mapping the problem of determining a sequence of join reducers into the one of finding a set of cuts, we develop (for tree and general query graphs, respectively) efficient heuristic algorithms to determine a join reducer sequence for distributed query processing. The algorithms developed are based on the concept of divide and conquer and are of polynomial time complexity. Simulation is performed to evaluate these algorithms 相似文献
4.
Yu Cao Ramadhana Bramandia Chee-Yong Chan Kian-Lee Tan 《The VLDB Journal The International Journal on Very Large Data Bases》2012,21(3):411-436
Many database applications require sorting a table (or relation) over multiple sort orders. Some examples include creation of multiple indices on a relation, generation of multiple reports from a table, evaluation of a complex query that involves multiple instances of a relation, and batch processing of a set of queries. In this paper, we study how to optimize multiple sortings of a table. We investigate the correlation between sort orders and exploit sort-sharing techniques of reusing the (partial) work done to sort a table on a particular order for another order. Specifically, we introduce a novel and powerful evaluation technique, called cooperative sorting, that enables sort sharing between seemingly non-related sort orders. Subsequently, given a specific set of sort orders, we determine the best combination of various sort-sharing techniques so as to minimize the total processing cost. We also develop techniques to make a traditional query optimizer extensible so that it will not miss the truly cheapest execution plan with the sort-sharing (post-) optimization turned on. We demonstrate the efficiency of our ideas with a prototype implementation in PostgreSQL and evaluate the performance using both TPC-DS benchmark and synthetic data. Our experimental results show significant performance improvement over the traditional evaluation scheme. 相似文献
5.
This paper is concerned with data provisioning services (information search, retrieval, storage, etc.) dealing with a large and heterogeneous information repository. Increasingly, this class of services is being hosted and delivered through Cloud infrastructures. Although such systems are becoming popular, existing resource management methods (e.g. load-balancing techniques) do not consider workload patterns nor do they perform well when subjected to non-uniformly distributed datasets. If these problems can be solved, this class of services can be made to operate in more a scalable, efficient, and reliable manner. The main contribution of this paper is a approach that combines proprietary cloud-based load balancing techniques and density-based partitioning for efficient range query processing across relational database-as-a-service in cloud computing environments. The study is conducted over a real-world data provisioning service that manages a large historical news database from Thomson Reuters. The proposed approach has been implemented and tested as a multi-tier web application suite consisting of load-balancing, application, and database layers. We have validated our approach by conducting a set of rigorous performance evaluation experiments using the Amazon EC2 infrastructure. The results prove that augmenting a cloud-based load-balancing service (e.g. Amazon Elastic Load Balancer) with workload characterization intelligence (density and distribution of data; composition of queries) offers significant benefits with regards to the overall system’s performance (i.e. query latency and database service throughput). 相似文献
6.
《Knowledge and Data Engineering, IEEE Transactions on》2002,14(5):955-978
This paper describes VISUAL, a graphical icon-based query language with a user-friendly graphical user interface for scientific databases and its query processing techniques. VISUAL is suitable for domains where visualization of the relationships is important for the domain scientist to express queries. In VISUAL, graphical objects are not tied to the underlying formalism; instead, they represent the relationships of the application domain. VISUAL supports relational, nested, and object-oriented models naturally and has formal basis. For ease of understanding and for efficiency reasons, two VISUAL semantics are introduced, namely, the interpretation and execution semantics. Translations from VISUAL to the Object Query Language (for portability considerations) and to an object algebra (for query processing purposes) are presented. Concepts of external and internal queries are developed as modularization tools. 相似文献
7.
Adaptive query processing generally involves a feedback loop comprising monitoring, assessment and response. So far, individual proposals have tended to group together an approach to monitoring, a means of assessment, and a form of response. However, there are many benefits in decoupling these three phases, and in constructing generic frameworks for each of them. To this end, this paper discusses monitoring of query plan execution as a topic in its own right, and advocates an approach based on self-monitoring algebraic operators. This approach is shown to be generic and independent of any specific adaptation mechanism, easily implementable and portable, sufficiently comprehensive, appropriate for heterogeneous distributed environments, and more importantly, capable of driving on-the-fly adaptations of query plan execution. An experimental evaluation of the overheads and of the quality of the results obtained by monitoring is also presented. 相似文献
8.
In this research, we address the query clustering problem which involves determining globally optimal execution strategies for a set of queries. The need to process a set of queries together often arises in deductive database systems, scientific database systems, large bibliographic retrieval systems and several other database applications. We address the optimization problem from the perspective of overlaps in data requirements, and model the batched operations using a set-partitioning approach. In this model, we first consider the case of m queries each involving a two-way join operation. We develop a recursive methodology to determine all the processing strategies in this case. Next, we establish certain dominance properties among the strategies, and develop exact as well as heuristic algorithms for selecting an appropriate strategy. We extend this analysis to a clustering approach, and outline a framework for optimizing multiway joins. The results show that the proposed approach is viable and efficient, and can easily be incorporated into the query processing component of most database systems 相似文献
9.
Thomas Bernecker Tobias Emrich Hans-Peter Kriegel Nikos Mamoulis Matthias Renz Shiming Zhang Andreas Züfle 《GeoInformatica》2013,17(3):449-487
Traditional spatial queries return, for a given query object q, all database objects that satisfy a given predicate, such as epsilon range and k-nearest neighbors. This paper defines and studies inverse spatial queries, which, given a subset of database objects Q and a query predicate, return all objects which, if used as query objects with the predicate, contain Q in their result. We first show a straightforward solution for answering inverse spatial queries for any query predicate. Then, we propose a filter-and-refinement framework that can be used to improve efficiency. We show how to apply this framework on a variety of inverse queries, using appropriate space pruning strategies. In particular, we propose solutions for inverse epsilon range queries, inverse k-nearest neighbor queries, and inverse skyline queries. Furthermore, we show how to relax the definition of inverse queries in order to ensure non-empty result sets. Our experiments show that our framework is significantly more efficient than naive approaches. 相似文献
10.
11.
This paper addresses some of the issues that arise in representing temporal information in the database context. It deals not only with the explicit representation of temporal information but with mechanisms for reasoning with it as well. It addresses the issue of processing natural language queries with explicit temporal references. The three issues of knowledge representation, natural language processing and query processing are addressed using the axiomatic framework based on equational logic. 相似文献
12.
13.
James Cheng Yiping Ke Ada Wai-Chee Fu Jeffrey Xu Yu 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(4):521-539
This paper studies the problem of processing supergraph queries, that is, given a database containing a set of graphs, find all the graphs in the database of which the query graph is a
supergraph. Existing works usually construct an index and performs a filtering-and-verification process, which still requires many subgraph isomorphism testings. There are also significant overheads in both index construction
and maintenance. In this paper, we design a graph querying system that achieves both fast indexing and efficient query processing.
The index is constructed by a simple but fast method of extracting the commonality among the graphs, which does not involve
any costly operation such as graph mining. Our query processing has two key techniques, direct inclusion and filtering. Direct inclusion allows partial query answers to be included directly without candidate verification. Our filtering technique
further reduces the candidate set by operating on a much smaller projected database. Experimental results show that our method
is significantly more efficient than the existing works in both indexing and query processing, and our index has a low maintenance
cost. 相似文献
14.
Shiyuan Wang Quang Hieu Vu Beng Chin Ooi Anthony K. H. Tung Lizhen Xu 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(1):345-362
This paper looks at the processing of skyline queries on peer-to-peer (P2P) networks. We propose Skyframe, a framework for
efficient skyline query processing in P2P systems, which addresses the challenges of quick response time, low network communication
cost and query load balancing among peers. Skyframe consists of two querying methods: one is optimized for network communication
while the other focuses on query response time. These methods are different in the way in which the query search space is
defined. In particular, the first method uses a high dominating point that has a large dominating region to prune the search
space to achieve a low cost in network communication. On the other hand, the second method relaxes the search space in order
to allow parallel query processing to speed up query response. Skyframe achieves query load balancing by both query load conscious
data space splitting/merging during the join/departure of nodes and dynamic load migration. We further show how to apply Skyframe
to both the P2P systems supporting multi-dimensional indexing and the P2P systems supporting single-dimensional indexing.
Finally, we have conducted extensive experiments on both real and synthetic data sets over two existing P2P systems: CAN (Ratnasamy
in A scalable content-addressable network. In: Proceedings of SIGCOMM Conference, pp. 161–172, 2001) and BATON (Jagadish et
al. in A balanced tree structure for peer-to-peer networks. In: Proceedings of VLDB Conference, pp. 661–672, 2005) to evaluate
the effectiveness and scalability of Skyframe. 相似文献
15.
3D-List: a data structure for efficient video query processing 总被引:1,自引:0,他引:1
A video query model based on the content of video and iconic indexing is proposed. We extend the notion of two-dimensional strings to three-dimensional strings (3D-Strings) for representing the spatial and temporal relationships among the symbols in both a video and a video query. The problem of video query processing is then transformed into a problem of three-dimensional pattern matching. To efficiently match the 3D-Strings, a data structure, called 3D-List, and its related algorithms are proposed. In this approach, the symbols of a video in the video database are retrieved from the video index and organized as a 3D-List according to the 3D-String of the video query. The related algorithms are then applied on the 3D-List to determine whether this video is an answer to the video query. Based on this approach, we have started a project called Vega. In this project, we have implemented a user friendly interface for specifying video queries, a video index tool for constructing the video index, and a video query processor based on the notion of 3D-List. Some experiments are also performed to show the efficiency and effectiveness of the proposed algorithms 相似文献
16.
Time-based operators for relational algebra query languages 总被引:3,自引:0,他引:3
M. A. Bassiouni M. J. Llewellyn A. Mukherjee 《Computer Languages, Systems and Structures》1993,19(4):261-276
We present a new approach for historical relational algebra languages based upon generalized logic for Boolean and comparison operators and a temporal modification of the standard relational algebra operators. Historical versions of standard (snapshot) relational algebra operators based upon this generalized logic are presented. The temporal modification employs a logic that operates on sets of value/time-interval pairs and which can be applied to snapshot as well as historical databases. Our emphasis is that the generalized operators can be used to enrich existing historical query languages and to provide an easier and more natural time-based interface. Using the generalized operators, users can express their queries more naturally, succinctly and elegantly. Examples are presented which illustrate that the modified operators offer a good degree of flexibility in expressing different temporal requirements. 相似文献
17.
Semantic caching and query processing 总被引:2,自引:0,他引:2
Qun Ren Dunham M.H. Kumar V. 《Knowledge and Data Engineering, IEEE Transactions on》2003,15(1):192-210
Semantic caching is very attractive for use in distributed systems due to the reduced network traffic and the improved response time. It is particularly efficient for a mobile computing environment, where the bandwidth of wireless links is a major performance bottleneck. Previous work either does not provide a formal semantic caching model, or lacks efficient query processing strategies. This paper extends the existing research in three ways: formal definitions associated with semantic caching are presented, query processing strategies are investigated and, finally, the performance of the semantic cache model is examined through a detailed simulation study. 相似文献
18.
Data summarization has recently received considerable attention in the knowledge systems community. This paper discusses the design of data summarization query system. Based on an initial analysis of requirement representations in data summarization, the study develops a generic organization of ontology for data summarization query system. Furthermore, this paper proposes a framework of ontology-based query language of data summarization based on the proposed ontology structure. A prototype project of data summarization ontology-based Query by Examples (QBE) for summarizing the data incompleteness demonstrates the effectiveness of the proposed framework. 相似文献
19.
Jagadish H.V. Ooi B.C. Shen H.T. Tan K.-L. 《Knowledge and Data Engineering, IEEE Transactions on》2006,18(3):350-362
In many advanced applications, data are described by multiple high-dimensional features. Moreover, different queries may weight these features differently; some may not even specify all the features. In this paper, we propose our solution to support efficient query processing in these applications. We devise a novel representation that compactly captures f features into two components. The first component is a 2D vector that reflects a distance range (minimum and maximum values) of the f features with respect to a reference point (the center of the space) in a metric space and the second component is a bit signature, with two bits per dimension, obtained by analyzing each feature's descending energy histogram. This representation enables two levels of filtering: the first component prunes away points that do not share similar distance ranges, while the bit signature filters away points based on the dimensions of the relevant features. Moreover, the representation facilitates the use of a single index structure to further speed up processing. We employ the classical B/sup +/-tree for this purpose. We also propose a KNN search algorithm that exploits the access orders of critical dimensions of highly selective features and partial distances to prune the search space more effectively. Our extensive experiments on both real-life and synthetic data sets show that the proposed solution offers significant performance advantages over sequential scan and retrieval methods using single and multiple VA-files. 相似文献
20.
Ozsoyoglu G. Guruswamy S. Kaizheng Du Wen-Chi Hou 《Knowledge and Data Engineering, IEEE Transactions on》1995,7(6):865-884
CASE-DB is a real-time, single-user, relational prototype DBMS that permits the specification of strict time constraints for relational algebra queries. Given a time constrained nonaggregate relational algebra query and a “fragment chain” for each relation involved in the query, CASE-DB initially obtains a response to a modified version of the query and then uses an “iterative query evaluation” technique to successively improve and evaluate the modified version of the query, CASE-DB controls the risk of overspending the time quota at each step using a “risk control technique” 相似文献