首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 27 毫秒
1.
We report on a new, efficient encoding for the data cube, which results in a drastic speed-up of OLAP queries that aggregate along any combination of dimensions over numerical and categorical attributes. We are focusing on a class of queries called cube queries, which return aggregated values rather than sets of tuples. Our approach, termed CubiST++ (Cubing with Statistics Trees Plus Families), represents a drastic departure from existing relational (ROLAP) and multi-dimensional (MOLAP) approaches in that it does not use the view lattice to compute and materialize new views from existing views in some heuristic fashion. Instead, CubiST++ encodes all possible aggregate views in the leaves of a new data structure called statistics tree (ST) during a one-time scan of the detailed data. In order to optimize the queries involving constraints on hierarchy levels of the underlying dimensions, we select andmaterialize a family of candidate trees, which represent superviews over the different hierarchical levels of the dimensions. Given a query, our query evaluation algorithm selects the smallest tree in the family, which can provide the answer. Extensive evaluations of our prototype implementation have demonstrated its superior run-time performance and scalability when compared with existing MOLAP and ROLAP systems.  相似文献   

2.
It is foreseen that more and more music objects in symbolic format and multimedia objects, such as audio, video, or lyrics, integrated with symbolic music representation (SMR) will be published and broadcasted via the Internet. The SMRs of the flowing songs or multimedia objects will form a music stream. Many interesting applications based on music streams, such as interactive music tutorials, distance music education, and similar theme searching, make the research of content-based retrieval over music streams much important. We consider multiple queries with error tolerances over music streams and address the issue of approximate matching in this environment. We propose a novel approach to continuously process multiple queries over the music streams for finding all the music segments that are similar to the queries. Our approach is based on the concept of n-grams, and two mechanisms are designed to reduce the heavy computation of approximate matching. One mechanism uses the clustering of query n-grams to prune the query n-grams that are irrelevant to the incoming data n-gram. The other mechanism records the data n-gram that matches a query n-gram as a partial answer and incrementally merges the partial answers of the same query. We implement a prototype system for experiments in which songs in the MIDI format are continuously broadcasted, and the user can specify musical segments as queries to monitor the music streams. Experiment results show the effectiveness and efficiency of the proposed approach.  相似文献   

3.
In many decision-making scenarios, decision makers require rapid feedback to their queries, which typically involve aggregates. The traditional blocking execution model can no longer meet the demands of these users. One promising approach in the literature, called online aggregation, evaluates an aggregation query progressively as follows: as soon as certain data have been evaluated, approximate answers are produced with their respective running confidence intervals; as more data are examined, the answers and their corresponding running confidence intervals are refined. In this paper, we extend this approach to handle nested queries with aggregates (i.e., at least one inner query block is an aggregate query) by providing users with (approximate) answers progressively as the inner aggregation query blocks are evaluated. We address the new issues pose by nested queries. In particular, the answer space begins with a superset of the final answers and is refined as the aggregates from the inner query blocks are refined. For the intermediary answers to be meaningful, they have to be interpreted with the aggregates from the inner queries. We also propose a multi-threaded model in evaluating such queries: each query block is assigned to a thread, and the threads can be evaluated concurrently and independently. The time slice across the threads is nondeterministic in the sense that the user controls the relative rate at which these subqueries are being evaluated. For enumerative nested queries, we propose a priority-based evaluation strategy to present answers that are certainly in the final answer space first, before presenting those whose validity may be affected as the inner query aggregates are refined. We implemented a prototype system using Java and evaluated our system. Results for nested queries with a level and multiple levels of nesting are reported. Our results show the effectiveness of the proposed mechanisms in providing progressive feedback that reduces the initial waiting time of users significantly without sacrificing the quality of the answers. Received April 25, 2000 / Accepted June 27, 2000  相似文献   

4.
We study here fundamental issues involved in top-k query evaluation in probabilistic databases. We consider simple probabilistic databases in which probabilities are associated with individual tuples, and general probabilistic databases in which, additionally, exclusivity relationships between tuples can be represented. In contrast to other recent research in this area, we do not limit ourselves to injective scoring functions. We formulate three intuitive postulates for the semantics of top-k queries in probabilistic databases, and introduce a new semantics, Global-Topk, that satisfies those postulates to a large degree. We also show how to evaluate queries under the Global-Topk semantics. For simple databases we design dynamic-programming based algorithms. For general databases we show polynomial-time reductions to the simple cases, and provide effective heuristics to speed up the computation in practice. For example, we demonstrate that for a fixed k the time complexity of top-k query evaluation is as low as linear, under the assumption that probabilistic databases are simple and scoring functions are injective. Research partially supported by NSF grant IIS-0307434. An earlier version of some of the results in this paper was presented in [1].  相似文献   

5.
In this paper, we propose an intelligent distributed query processing method considering the characteristics of a distributed ontology environment. We suggest more general models of the distributed ontology query and the semantic mapping among distributed ontologies compared with the previous works. Our approach rewrites a distributed ontology query into multiple distributed ontology queries using the semantic mapping, and we can obtain the integrated answer through the execution of these queries. Furthermore, we propose a distributed ontology query processing algorithm with several query optimization techniques: pruning rules to remove unnecessary queries, a cost model considering site load balancing and caching, and a heuristic strategy for scheduling plans to be executed at a local site. Finally, experimental results show that our optimization techniques are effective to reduce the response time.  相似文献   

6.
We consider the problem of indexing a set of objects moving in d-dimensional spaces along linear trajectories. A simple external-memory indexing scheme is proposed to efficiently answer general range queries. The following are examples of the queries that can be answered by the proposed method: report all moving objects that will (i) pass between two given points within a specified time interval; (ii) become within a given distance from some or all of a given set of other moving objects. Our scheme is based on mapping the objects to a dual space, where queries about moving objects are transformed into polyhedral queries concerning their speeds and initial locations. We then present a simple method for answering such polyhedral queries, based on partitioning the space into disjoint regions and using a B+-tree to index the points in each region. By appropriately selecting the boundaries of each region, we guarantee an average search time that matches a known lower bound for the problem. Specifically, for a fixed d, if the coordinates of a given set of N points are statistically independent, the proposed technique answers polyhedral queries, on the average, in O((N/B)1−1/d⋅(log B N)1/d+K/B) I/O's using O(N/B) space, where B is the block size, and K is the number of reported points. Our approach is novel in that, while it provides a theoretical upper bound on the average query time, it avoids the use of complicated data structures, making it an effective candidate for practical applications. The proposed index is also dynamic in the sense that it allows object insertion and deletion in an amortized update cost of log B(N) I/O's. Experimental results are presented to show the superiority of the proposed index over other methods based on R-trees. recommend Ahmed Elmagarmid  相似文献   

7.
Unlike a twig query, an Xtwig query contains some selection predicates with reverse axes which are either ancestor or parent. To evaluate such queries in the stream-based context, some rewriting rules have been proposed to transform the paths with reverse axes into equivalent reverse-axis-free ones. However, the transformation method is expensive due to multiple scanning input streams and the generation of unnecessary intermediate results. To solve these problems, a holistic stream-based algorithm XtwigStack is proposed for Xtwig queries. Experiments show that XtwigStack is much more efficient than the transformation method.  相似文献   

8.
9.
This paper investigates the optimization problem when executing a join in a distributed database environment. The minimization of the communication cost for sending data through links has been adopted as an optimization criterion. We explore in this paper the approach of judiciously using join operations as reducers in distributed query processing. In general, this problem is computationally intractable. A restriction of the execution of a join in a pre-defined combinatorial order leads to a possible solution in polynomial time. An algorithm for a chain query computation has been proposed in [21]. The time complexity of the algorithm isO(m 2 n 2+m 3 n), wheren is the number of sites in the network, andm is the number of relations (fragments) involved in the join. In this paper, we firstly present a proof of the intuitively well understood fact—that the eigenorder of a chain join will be the best pre-defined combinatorial order to implement the algorithm in [21]. Secondly, we show a sufficient and necessary condition for a chain query with the eigenordering to be a simple query. For the process of the class of simple queries, we show a significant reduction of the time complexity fromO(m 2 n 2+m 3 n) toO(mn+m 2). It is encouraging that, in practice, the most frequent queries belong to the category of simple queries. Editor: Peter Apers  相似文献   

10.
11.
We propose a new approach to the estimation of query result sizes for join queries. The technique, which we have called systematic sampling—SYSSMP, is a novel variant of the sampling-based approach. A key novelty of the systematic sampling is that it exploits the sortedness of data; the result of this is that the sample relation obtained well represents the underlying frequency distribution of the join attribute in the original relation.We first develop a theoretical foundation for systematic sampling which suggests that the method gives a more representative sample than the traditional simple random sampling. Subsequent experimental analysis on a range of synthetic relations confirms that the quality of sample relations yielded by systematic sampling is higher than those produced by the traditional simple random sampling.To ensure that sample relations produced by systematic sampling indeed assist in computing more accurate query result sizes, we compare systematic sampling with the most efficient simple random sampling called t_cross using a variety of relation configurations. The results obtained validate that systematic sampling uses the same amount of sampling but still provides more accurate query result sizes than t_cross. Furthermore, the extra sampling cost incurred by the use of systematic sampling pays off in a cheaper query execution cost at run-time.  相似文献   

12.
Top-k monitoring queries are useful in many wireless sensor network applications. A query of this type continuously returns a list of k ordered nodes with the highest (or lowest) sensor readings. To process these queries, a well-known approach is to install a filter at each sensor node to avoid unnecessary transmissions of sensor readings. In this paper, we propose a new top-k monitoring method, named Distributed Adaptive Filter-based Monitoring. In this method, we first propose a new query reevaluation algorithm that works distributedly in the network to reduce the communication cost of sending probe messages. Then, we present an adaptive filter updating algorithm which is based on predicted benefits to lower down the transmission cost of sending updated filters to the sensor nodes. Experimental results on real data traces show that our proposed method performs much better than the other existing methods in terms of both network lifetime and average energy consumption.  相似文献   

13.
Approximate query processing using wavelets   总被引:7,自引:0,他引:7  
Approximate query processing has emerged as a cost-effective approach for dealing with the huge data volumes and stringent response-time requirements of today's decision support systems (DSS). Most work in this area, however, has so far been limited in its query processing scope, typically focusing on specific forms of aggregate queries. Furthermore, conventional approaches based on sampling or histograms appear to be inherently limited when it comes to approximating the results of complex queries over high-dimensional DSS data sets. In this paper, we propose the use of multi-dimensional wavelets as an effective tool for general-purpose approximate query processing in modern, high-dimensional applications. Our approach is based on building wavelet-coefficient synopses of the data and using these synopses to provide approximate answers to queries. We develop novel query processing algorithms that operate directly on the wavelet-coefficient synopses of relational tables, allowing us to process arbitrarily complex queries entirely in the wavelet-coefficient domain. This guarantees extremely fast response times since our approximate query execution engine can do the bulk of its processing over compact sets of wavelet coefficients, essentially postponing the expansion into relational tuples until the end-result of the query. We also propose a novel wavelet decomposition algorithm that can build these synopses in an I/O-efficient manner. Finally, we conduct an extensive experimental study with synthetic as well as real-life data sets to determine the effectiveness of our wavelet-based approach compared to sampling and histograms. Our results demonstrate that our techniques: (1) provide approximate answers of better quality than either sampling or histograms; (2) offer query execution-time speedups of more than two orders of magnitude; and (3) guarantee extremely fast synopsis construction times that scale linearly with the size of the data. Received: 7 August 2000 / Accepted: 1 April 2001 Published online: 7 June 2001  相似文献   

14.
In recent years, the KDD process has been advocated to be an iterative and interactive process. It is seldom the case that a user is able to answer immediately all his questions on date with a single query. On the contrary, the work-flow of the typical user consists of several steps in which he/she iteratively refines the extracted knowledge by inspecting previous results and posing new queries. Given this view of the KDD process, in order to reduce the computational effort, it becomes crucial to have KDD systems that are able to exploit past results. This is especially true in environments in which the system knowledge base is the result of many discoveries on data made separately by the collaborative effort of different users.

In this paper, we consider the problem of mining frequent association rules from database relations. We first model a general, constraint-based, mining language for this task. Then, we propose an algorithm that answers such queries reusing past results. In particular, this solution is effective for a new class of constraints, called context dependent, which are more difficult than the traditionally studied item dependent constraints. Nevertheless, we show that some typical queries of important application domains, such as market stock trading, analysis of web log, and gene microarrays in bioinformatics, have context-dependent constraints. We show with a set of experiments in these application domains that the proposed solution with an incremental approach is both effective and viable.  相似文献   

15.
Bottom-up query-answering procedures tend to explore a much larger search space than what is strictly needed. Top-down processing methods use the query to perform a more focused search that can result in more efficient query answering. Given a disjunctive deductive database, DB, and a query, Q, we establish a strong connection between model generation and clause derivability in two different representations of DB and Q. This allows us to use a bottom-up procedure for evaluating Q against DB in a top-down fashion. The approach requires no extensive rewriting of the input theory and introduces no new predicates. Rather, it is based on a certain duality principle for interpreting logical connectives. The duality transformation is achieved by reversing the direction of implication arrows in the clauses representing both the theory and the negation of the query. The application of a generic bottom-up procedure to the transformed clause set results in top-down query answering. Under favorable conditions efficiency gains are substantial, as shown by our preliminary testing. We give the logical meaning of the duality transformation and point to the conditions and sources of improved efficiency. We show how the duality approach can be used for refined query answering by specifying the minimal conditions (weakest updates) to DB under which Q becomes derivable. This is shown to be useful for view updates in disjunctive deductive databases as well as for other interesting applications.  相似文献   

16.
In our earlier work, we proposed an architecture for a Web-based video database management system (VDBMS) providing an integrated support for spatiotemporal and semantic queries. In this paper, we focus on the task of spatiotemporal query processing and also propose an SQL-like video query language that has the capability to handle a broad range of spatiotemporal queries. The language is rule-based in that it allows users to express spatial conditions in terms of Prolog-type predicates. Spatiotemporal query processing is carried out in three main stages: query recognition, query decomposition, and query execution.Received: 11 October 2001, Accepted: 3 October 2003, Published online: 12 December 2003Edited by: A. Buchmann Correspondence to: Özgür UlusoyThis work is supported by the Scientific and Research Council of Turkey (TÜBITAK) under Project Code 199E025. This work was done while the first author was at Bilkent University.  相似文献   

17.
Information is valuable to users when it is available not only at the right time but also at the right place. To support efficient location-based data access in wireless data broadcast systems, a distributed spatial index (called DSI) is presented in this paper. DSI is highly efficient because it has a linear yet fully distributed structure that naturally shares links in different search paths. DSI is very resilient to the error-prone wireless communication environment because interrupted search operations based on DSI can be resumed easily. It supports search algorithms for classical location-based queries such as window queries and kNN queries in both of the snapshot and continuous query modes. In-depth analysis and simulation-based evaluation have been conducted. The results show that DSI significantly out-performs a variant of R-trees tailored for wireless data broadcast environments.  相似文献   

18.
This paper introduces a new approach to realize video databases. The approach consists of a VideoText data model based on free text annotations associated with logical video segments and a corresponding query language. Traditional database techniques are inadequate for exploiting queries on unstructured data such as video, supporting temporal queries, and ranking query results according to their relevance to the query. In this paper, we propose to use information retrieval techniques to provide such features and to extend the query language to accommodate interval queries that are particularly suited to video data. Algorithms are provided to show how user queries are evaluated. Finally, a generic and modular video database architecture which is based on VideoText data model is described.  相似文献   

19.
In this paper, we introduce variability of syntactic phrases and propose a new retrieval approach reflecting the variability of syntactic phrase representation. With variability measure of a phrase, we can estimate how likely a phrase in a given query would appear in relevant documents and control the impact of syntactic phrases in a retrieval model. Various experimental results over different types of queries and document collections show that our retrieval model based on variability of syntactic phrases is very effective in terms of retrieval performance, especially for long natural language queries.  相似文献   

20.
In large-scale Internet-based distributed systems, participants (consumers and providers) are typically autonomous, i.e. they may have special interests towards queries and other participants. In this context, a way to avoid a participant to voluntarily leave the system is satisfying its interests when allocating queries. However, participants satisfaction may also be negatively affected by the failures of other participants. Query replication is a solution to deal with providers failures, but, it is challenging because of autonomy: it cannot only quickly overload the system, but also it can dissatisfy participants with uninteresting queries. Thus, a natural question arises: should queries be replicated? If so, which ones? and how many times? In this paper, we answer these questions by revisiting query replication from a satisfaction and probabilistic point of view. We propose a new algorithm, called S b QR, that decides on-the-fly whether a query should be replicated and at which rate. As replicating a large number of queries might overload the system, we propose a variant of our algorithm, called S b QR+. The idea is to voluntarily fail to allocate as many replicas as required by consumers for low critical queries so as to keep resources for high critical queries during query-intensive periods. Our experimental results demonstrate that our algorithms significantly outperform the baseline algorithms from both the performance and satisfaction points of view. We also show that our algorithms automatically adapt to the criticality of queries and different rates of participant failures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号