首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Ranked queries return the top objects of a database according to a preference function. We present and evaluate (experimentally and theoretically) a core algorithm that answers ranked queries in an efficient pipelined manner using materialized ranked views. We use and extend the core algorithm in the described PREFER and MERGE systems. PREFER precomputes a set of materialized views that provide guaranteed query performance. We present an algorithm that selects a near optimal set of views under space constraints. We also describe multiple optimizations and implementation aspects of the downloadable version of PREFER. Then we discuss MERGE, which operates at a metabroker and answers ranked queries by retrieving a minimal number of objects from sources that offer ranked queries. A speculative version of the pipelining algorithm is described.Received: 10 June 2002, Accepted: 11 June 2002, Published online: 30 September 2003Edited by: A. MendelzonWork supported by NSF Grant No. 9734548.  相似文献   

2.
This note serves three purposes: (i) we provide a self-contained exposition of the fact that conjunctive queries are not efficiently learnable in the Probably-Approximately-Correct (PAC) model, paying clear attention to the complicating fact that this concept class lacks the polynomial-size fitting property, a property that is tacitly assumed in much of the computational learning theory literature; (ii) we establish a strong negative PAC learnability result that applies to many restricted classes of conjunctive queries (CQs), including acyclic CQs for a wide range of notions of acyclicity; (iii) we show that CQs (and UCQs) are efficiently PAC learnable with membership queries.  相似文献   

3.
Optimization and evaluation of shortest path queries   总被引:1,自引:0,他引:1  
We investigate the problem of how to evaluate efficiently a collection of shortest path queries on massive graphs that are too big to fit in the main memory. To evaluate a shortest path query efficiently, we introduce two pruning algorithms. These algorithms differ on the extent of materialization of shortest path cost and on how the search space is pruned. By grouping shortest path queries properly, batch processing improves the performance of shortest path query evaluation. Extensive study is also done on fragment sizes, cache sizes and query types that we show that affect the performance of a disk-based shortest path algorithm. The performance and scalability of proposed techniques are evaluated with large road systems in the Eastern United States. To demonstrate that the proposed disk-based algorithms are viable, we show that their search times are significant better than that of main-memory Dijkstra's algorithm.  相似文献   

4.
Many applications often require finding sets of entities of interest that meet certain constraints. Such set-based queries (SQs) can be broadly classified into two types: optimization SQs that involve some optimization constraint and enumerative SQs that do not have any optimization constraint. While there has been much research on the evaluation of optimization SQs, there is very little work on the evaluation of enumerative SQs, which represent the most fundamental fragment of set-based queries. In this paper, we address the problem of evaluating enumerative SQs using RDBMS. While enumerative SQs can be expressed using SQL, existing relational engines, unfortunately, are not able to efficiently evaluate such queries due to their complexity. In this paper, we propose a novel evaluation approach for enumerative SQs. Our experimental results on PostgreSQL demonstrate that our proposed approach outperforms the conventional approach by up to three orders of magnitude.  相似文献   

5.
Genetic algorithms for approximate similarity queries   总被引:1,自引:0,他引:1  
Algorithms to query large sets of simple data (composed of numbers and small character strings) are constructed to retrieve the exact answer, retrieving every relevant element, so the answer said to be exact. Similarity searching over complex data is much more expensive than searching over simple data. Moreover, comparison operations over complex data usually consider features extracted from each element, instead of the elements themselves. Thus, even if an algorithm retrieves an exact answer, it is ‘exact’ regarding the extracted features, not regarding the original elements themselves. Therefore, trading exact answering with query time response can be worthwhile. In this work we developed two search strategies based on genetic algorithms to allow retrieving approximate data indexed by Metric Access Methods (MAM) within a limited, user-defined, amount of time. These strategies allow implementing algorithms to answer both range and k-nearest neighbor queries, and allow also to estimate the precision obtained for the approximate answer. Experimental evaluation shows that very good results (corresponding to what the user would expect) can be obtained in a fraction of the time required to obtain the exact answer.  相似文献   

6.
多用户连续k近邻查询多线程处理技术研究   总被引:2,自引:0,他引:2  
针对面向移动对象集的多用户连续k近邻查询处理,提出了基于多线程的多用户连续查询处理(MPMCQ)框架,采用流水线处理策略,将连续查询处理过程分解为可同时作业的查询预处理、查询执行以及查询结果分发三个执行阶段,利用多线程技术来提高多用户连续查询处理的并行性;基于MPMCQ框架和移动对象内存格网索引,提出了基于多线程的连续k近邻查询处理(MCkNN)算法。实验结果与分析表明,基于MPMCQ框架的MCkNN算法在多核平台上优于CPM、YPK-CNN等现有算法。  相似文献   

7.
空间数据库引擎的R树索引   总被引:6,自引:0,他引:6  
介绍了空间数据库引擎(SDBE)的R树索引结构,给出系统使用R树索引的方式,并描述了利用R树索引实现最近邻居查询的分支—限界算法,包括代价函数及其上、下界函数的定义,以及算法的伪码形式。  相似文献   

8.
In this paper we study the problem of deciding boundedness of (recursive) regular path queries over views in data integration systems, that is, whether a query can be re-expressed without recursion. This problem becomes challenging when the views contain recursion, thereby potentially making recursion in the query unnecessary. We define and solve two related problems of boundedness of regular path queries. One of the problems asks for the existence of a bound, and the other, more restricted one, asks if the query is bounded within a given parameter. For the more restricted version we show it PSPACE complete, and obtain a constructive method for optimizing the queries. For the existential version of boundedness, we show it PTIME reducible to the notorious problem of limitedness in distance automata. This problem has received a lot attention in the formal language community, but only exponential time algorithms are currently known.  相似文献   

9.
Due to the pervasive data uncertainty in many real applications, efficient and effective query answering on uncertain data has recently gained much attention from the database community. In this paper, we propose a novel and important query in the context of uncertain databases, namely probabilistic group subspace skyline (PGSS) query, which is useful in applications like sensor data analysis. Specifically, a PGSS query retrieves those uncertain objects that are, with high confidence, not dynamically dominated by other objects, with respect to a group of query points in ad-hoc subspaces. In order to enable fast PGSS query answering, we propose effective pruning methods to reduce the PGSS search space, which are seamlessly integrated into an efficient PGSS query procedure. Furthermore, to achieve low query cost, we provide a cost model, in light of which uncertain data are pre-processed and indexed. Extensive experiments have been conducted to demonstrate the efficiency and effectiveness of our proposed approaches.  相似文献   

10.
Being decades of study, the usability of database systems have received more attention in recent years. Now it is especially able to explain missing objects in a query result, which is called “why-not” questions, and is the focus of concern. This paper studies the problem of answering whynot questions on KNN queries. In our real life, many users would like to use KNN queries to investigate the surrounding circumstances. Nevertheless, they often feel disappointed when finding the result not including their expected objects. In this paper, we use the query refinement approach to resolve the problem. Given the original KNN query and a set of missing objects as input, our algorithm offer a refined KNN query that includes the missing objects to the user. The experimental results demonstrate the efficiency of our proposed optimizations and algorithms.  相似文献   

11.
With the popularization of data access and usage, an increasing number of users without expert knowledge of databases is required to perform data interactions. Often, these users face the challenges of writing and reformulating database queries, which consume a considerable amount of time and frequently yield unsatisfactory results. To facilitate this human–database interaction, researchers have investigated the Query By Example (QBE) paradigm in which database queries are (semi) automatically discovered from data examples given by users. This paradigm allows non-database experts to formulate queries without relying on complex query languages. In this context, this work aims to present a systematic review of the recent developments, open challenges, and research opportunities of the QBE reported in the literature. This work also describes strategies employed to leverage efficient example acquisition and query reverse engineering. The obtained results show that recent research developments have focused on enhancing the expressiveness of produced queries, minimizing user interaction, and enabling efficient query learning in the context of data retrieval, exploration, integration, and analytics. Our findings indicate that future research should concentrate efforts to provide innovative solutions to the challenges of improving controllability and transparency, considering diverse user preferences in the processes of learning personalized queries, ensuring data quality, and improving the support of additional SQL features and operators.  相似文献   

12.
With the proliferation of mobile devices and wireless technologies, location based services (LBSs) are becoming popular in smart cities. Two important classes of LBSs are Nearest Neighbor (NN) queries and range queries that provide user information about the locations of point of interests (POIs) such as hospitals or restaurants. Answers of these queries are more reliable and satisfiable if they come from trustworthy crowd instead of traditional location service providers (LSPs). We introduce an approach to evaluate NN and range queries with crowdsourced data and computation that eliminates the role of an LSP. In our crowdsourced approach, a user evaluates LBSs in a group. It may happen that group members do not have knowledge of all POIs in a certain area. We present efficient algorithms to evaluate queries with accuracy guarantee in incomplete databases. Experiments show that our approach is scalable and incurs less computational overhead.  相似文献   

13.
A common technique used to minimize I/O in data intensive applications is data declustering over parallel servers. This technique involves distributing data among several disks so as to parallelize query retrieval and thus, improve performance. We focus on optimizing access to large spatial data, and the most common type of queries on such data, i.e., range queries. An optimal declustering scheme is one in which the processing for all range queries is balanced uniformly among the available disks. It has been shown that single copy based declustering schemes are non-optimal for range queries. In this paper, we integrate replication in conjunction with parallel disk declustering for efficient processing of range queries. We note that replication is largely used in database applications for several purposes like load balancing, fault tolerance and availability of data. We propose theoretical foundations for replicated declustering and propose a class of replicated declustering schemes, periodic allocations, which are shown to be strictly optimal for a number of disks. We propose a framework for replicated declustering, using a limited amount of replication and provide extensions to apply it on real data, which include arbitrary grids and a large number of disks. Our framework also provides an effective indexing scheme that enables fast identification of data of interest in parallel servers. In addition to optimal processing of single queries, we show that this framework is effective for parallel processing of multiple queries. We present experimental results comparing the proposed replication scheme to other techniques for both single queries and multiple queries, on synthetic and real data sets. Recommended by: Ahmed Elmagarmid Supported by U.S. Department of Energy (DOE) Award No. DE-FG02-03ER25573, and National Science Foundation (NSF) grant CNS-0403342.  相似文献   

14.
针对基于道路网络的多用户连续k近邻查询处理,提出了一种可伸缩的多用户连续查询处理(scalable processing of multiple continuous queries,SPMCQ)框架.SPMCQ框架采用流水线处理策略,将连续k近邻查询执行分解为可同时作业的预处理、查询执行和结果分发3个阶段,利用多线程技术提高查询处理的并行性.基于SPMCO框架,分别利用基于内存的哈希表和线性链表结构对移动对象位置和道路网络有向图模型进行存储和管理,提出了多连续k近邻查询处理SCkNN算法.实验结果表明,在处理多用户连续k近邻查询时,该算法性能优于目前的道路网络连续k近邻查询处理算法.  相似文献   

15.
In this paper we discuss the problem of packing a set of small rectangles (pieces) in an enclosing final rectangle. We present first a best-first branch-and-bound exact algorithm and second a heuristic approach in order to solve exactly and approximately this problem. The performances of the proposed approaches are evaluated on several randomly generated problem instances. Computational results show that the proposed exact algorithm is able to solve small and medium problem instances within reasonable execution time. The derived heuristic performs very well in the sense that it produces high-quality solutions within small computational time.  相似文献   

16.
In this paper we present algorithms for building and maintaining efficient collection trees that provide the conduit to disseminate data required for processing monitoring queries in a wireless sensor network. While prior techniques base their operation on the assumption that the sensor nodes that collect data relevant to a specified query need to include their measurements in the query result at every query epoch, in many event monitoring applications such an assumption is not valid. We introduce and formalize the notion of event monitoring queries and demonstrate that they can capture a large class of monitoring applications. We then show techniques which, using a small set of intuitive statistics, can compute collection trees that minimize important resources such as the number of messages exchanged among the nodes or the overall energy consumption. Our experiments demonstrate that our techniques can organize the data collection process while utilizing significantly lower resources than prior approaches.  相似文献   

17.
18.
Approaches for the processing of location-dependent queries usually assume that the location data are expressed precisely, usually using GPS locations. However, this is unrealistic because positioning methods do not have a perfect accuracy (e.g., the positioning approach used in cellular networks handles only the cell where mobile users are located). Besides, users may need to express queries based on concepts of locations other than traditional GPS locations, which we call location granules.In this paper, we focus on location granule-based query processing (i.e., processing of queries with location granules) in situations where the location data available is imprecise, which we have called probabilistic location-dependent queries. For that purpose, we exploit the concept of uncertainty location granule, which represents the location uncertainty of an object. In particular, we tackle the problem of processing probabilistic inside (range) constraints. We analyze in detail how those constraints can be processed, taking into account both the existence of location uncertainty affecting the relevant objects and the location granularity specified. An extensive experimental evaluation shows the feasibility of the proposed probabilistic query processing approach and analyzes the advantages of using index structures to speed up the query processing.  相似文献   

19.
Nearest and reverse nearest neighbor queries for moving objects   总被引:4,自引:0,他引:4  
With the continued proliferation of wireless communications and advances in positioning technologies, algorithms for efficiently answering queries about large populations of moving objects are gaining interest. This paper proposes algorithms for k nearest and reverse k nearest neighbor queries on the current and anticipated future positions of points moving continuously in the plane. The former type of query returns k objects nearest to a query object for each time point during a time interval, while the latter returns the objects that have a specified query object as one of their k closest neighbors, again for each time point during a time interval. In addition, algorithms for so-called persistent and continuous variants of these queries are provided. The algorithms are based on the indexing of object positions represented as linear functions of time. The results of empirical performance experiments are reported.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号