共查询到20条相似文献,搜索用时 0 毫秒
1.
Patrick Bosc 《国际智能系统杂志》1996,11(9):613-632
This article deals with query processing techniques for the SQLf language which is an extended version of SQL supporting imprecise queries interpreted in the framework of fuzzy sets. SQLf, as well as SQL, allows for the use of nested queries, in which a (fuzzy) condition involved in a select block, calls on another select block (the nested one). Two types of processing strategies for nested queries are discussed. The first one tends to take advantage of existing database management systems (DBMS) to process fuzzy queries thanks to an additional layer which is in charge of translating the initial query into a Boolean one. In this perspective, the performances obtained depend strongly on the efficiency of the underlying DBMS. The other strategy is slightly different and it is situated in the context of the design of systems involving specific algorithms for processing fuzzy queries. In this article, the focus is put on algorithms related to the generic nesting construct “exists.” © 1996 John Wiley & Sons, Inc. 相似文献
2.
James Cheng Zechao Shang Hong Cheng Haixun Wang Jeffrey Xu Yu 《The VLDB Journal The International Journal on Very Large Data Bases》2014,23(2):227-252
We study the problem of answering k -hop reachability queries in a directed graph, i.e., whether there exists a directed path of length $k$ , from a source query vertex to a target query vertex in the input graph. The problem of $k$ -hop reachability is a general problem of the classic reachability (where $k=\infty $ ). Existing indexes for processing classic reachability queries, as well as for processing shortest path distance queries, are not applicable or not efficient for processing $k$ -hop reachability queries. We propose an efficient index for processing $k$ -hop reachability queries. Our experimental results on a wide range of real datasets show that our method is efficient and scalable in terms of both index construction and query processing. 相似文献
3.
Many applications often require finding sets of entities of interest that meet certain constraints. Such set-based queries (SQs) can be broadly classified into two types: optimization SQs that involve some optimization constraint and enumerative SQs that do not have any optimization constraint. While there has been much research on the evaluation of optimization SQs, there is very little work on the evaluation of enumerative SQs, which represent the most fundamental fragment of set-based queries. In this paper, we address the problem of evaluating enumerative SQs using RDBMS. While enumerative SQs can be expressed using SQL, existing relational engines, unfortunately, are not able to efficiently evaluate such queries due to their complexity. In this paper, we propose a novel evaluation approach for enumerative SQs. Our experimental results on PostgreSQL demonstrate that our proposed approach outperforms the conventional approach by up to three orders of magnitude. 相似文献
4.
Dongjoon Hyun No Joon Park Jin Hyun Son Myoung Ho Kim 《Distributed and Parallel Databases》2006,20(3):171-197
Continuous aggregation queries with a tolerable error threshold have many applications in sensor networks. Since the communication
cost is important in the lifetime of sensor networks, there have been a few methods to reduce the communication cost for continuous
aggregation queries having a tolerable error threshold. In previous methods, the error threshold in each node is periodically
adjusted based on the global statistics collected in the central site that are obtained from all the nodes in the network.
These methods require that users specify a few parameters, e.g., adjustment period. However, determination of these parameters
by users, in practice, is very difficult and undesirable for sensor network applications demanding unattended operations in
dynamically changing environments. In this paper, we propose a new in-network data aggregation protocol, called the Distributed
Adaptive Filtering (DAF) protocol. It works in a distributed manner and proceeds adaptively in the sense that the filtering
condition in each node is adaptively changed by using only local information. It does not require user parameters that are
used in the previous method. We show through various experiments that the proposed method outperforms other existing methods.
Recommended by: Ahmed Elmagarmid 相似文献
5.
A number of indexing techniques have been proposed in recent times for optimizing the queries on XML and other semi-structured data models. Most of the semi-structured models use tree-like structures and query languages (XPath, XQuery, etc.) which make use of regular path expressions to optimize the query processing. In this paper, we propose two algorithms called Entry-point algorithm (EPA) and Two-point Entry algorithms that exploit different types of indices to efficiently process XPath queries. We discuss and compare two approaches namely, Root-first and Bottom-first in implementing the EPA. We present the experimental results of the algorithms using XML benchmark queries and data and compare the results with that of traditional methods of query processing with and without the use of indexes, and ToXin indexing approach. Our algorithms show improved performance results than the traditional methods and Toxin indexing approach. 相似文献
6.
Due to the recent massive data generation, preference queries are becoming an increasingly important for users because such queries retrieve only a small number of preferable data objects from a huge multi-dimensional dataset. A top-k dominating query, which retrieves the k data objects dominating the highest number of data objects in a given dataset, is particularly important in supporting multi-criteria decision making because this query can find interesting data objects in an intuitive way exploiting the advantages of top-k and skyline queries. Although efficient algorithms for top-k dominating queries have been studied over centralized databases, there are no studies which deal with top-k dominating queries in distributed environments. The recent data management is becoming increasingly distributed, so it is necessary to support processing of top-k dominating queries in distributed environments. In this paper, we address, for the first time, the challenging problem of processing top-k dominating queries in distributed networks and propose a method for efficient top-k dominating data retrieval, which avoids redundant communication cost and latency. Furthermore, we also propose an approximate version of our proposed method, which further reduces communication cost. Extensive experiments on both synthetic and real data have demonstrated the efficiency and effectiveness of our proposed methods. 相似文献
7.
In this paper, we study a variant of reachability queries, called label-constraint reachability (LCR) queries. Specifically, given a label set S and two vertices u1 and u2 in a large directed graph G, we check the existence of a directed path from u1 to u2, where edge labels along the path are a subset of S. We propose the path-label transitive closure method to answer LCR queries. Specifically, we t4ransform an edge-labeled directed graph into an augmented DAG by replacing the maximal strongly connected components as bipartite graphs. We also propose a Dijkstra-like algorithm to compute path-label transitive closure by re-defining the “distance” of a path. Comparing with the existing solutions, we prove that our method is optimal in terms of the search space. Furthermore, we propose a simple yet effective partition-based framework (local path-label transitive closure+online traversal) to answer LCR queries in large graphs. We prove that finding the optimal graph partition to minimize query processing cost is a NP-hard problem. Therefore, we propose a sampling-based solution to find the sub-optimal partition. Moreover, we address the index maintenance issues to answer LCR queries over the dynamic graphs. Extensive experiments confirm the superiority of our method. 相似文献
8.
LING TokWang 《中国科学F辑(英文版)》2009,52(10):1830-1847
As huge volumes of data are organized or exported in tree-structured form, it is quite necessary to extract useful information from these data collections using effective and efficient query processing methods. A natural way of retrieving desired information from XML documents is using twig pattern (TP), which is, actually, the core component of existing XML query languages. Twig pattern possesses the inherent feature that query nodes on the same path have concrete precedence relationships. It is this featu... 相似文献
9.
In many applications, information is best represented as graphs. In a dynamic world, information changes and so the graphs representing the information evolve with time. We propose that historical graph-structured data be maintained for analytical processing. We call a historical evolving graph sequence an EGS. We observe that in many applications, graphs of an EGS are large and numerous, and they often exhibit much redundancy among them. We study the problem of efficient shortest path query processing on an EGS and put forward a solution framework called FVF. Two algorithms, namely, FVF-F and FVF-H, are proposed. While the FVF-F algorithm works on a sequence of flat graph clusters, the FVF-H algorithm works on a hierarchy of such clusters. Through extensive experiments on both real and synthetic datasets, we show that our FVF framework is highly efficient in shortest query processing on EGSs. Comparing FVF-F and FVF-H, the latter gives a larger speedup, is more flexible in terms of memory requirements, and is far less sensitive to parameter values. 相似文献
10.
Hakan Ferhatosmanoglu Ali Şaman Tosun Guadalupe Canahuate Aravind Ramachandran 《Distributed and Parallel Databases》2006,20(2):117-147
A common technique used to minimize I/O in data intensive applications is data declustering over parallel servers. This technique
involves distributing data among several disks so as to parallelize query retrieval and thus, improve performance. We focus
on optimizing access to large spatial data, and the most common type of queries on such data, i.e., range queries. An optimal
declustering scheme is one in which the processing for all range queries is balanced uniformly among the available disks.
It has been shown that single copy based declustering schemes are non-optimal for range queries. In this paper, we integrate
replication in conjunction with parallel disk declustering for efficient processing of range queries. We note that replication
is largely used in database applications for several purposes like load balancing, fault tolerance and availability of data.
We propose theoretical foundations for replicated declustering and propose a class of replicated declustering schemes, periodic allocations, which are shown to be strictly optimal for a number of disks. We propose a framework for replicated declustering, using a limited amount of replication and provide
extensions to apply it on real data, which include arbitrary grids and a large number of disks. Our framework also provides
an effective indexing scheme that enables fast identification of data of interest in parallel servers. In addition to optimal
processing of single queries, we show that this framework is effective for parallel processing of multiple queries. We present
experimental results comparing the proposed replication scheme to other techniques for both single queries and multiple queries,
on synthetic and real data sets.
Recommended by: Ahmed Elmagarmid
Supported by U.S. Department of Energy (DOE) Award No. DE-FG02-03ER25573, and National Science Foundation (NSF) grant CNS-0403342. 相似文献
11.
12.
Jing Yuan Guangzhong Sun Tao Luo Defu Lian Guoliang Chen 《Journal of Intelligent Information Systems》2012,39(3):687-710
Efficient processing of top-k queries has drawn increasing attention from both industry and academia due to its varied applications. Lower access cost is a crucial concern for a top-k query processing. Typically, when answering a top-k query, there exist two types of accesses: sorted access and random access. In some scenarios, the latter is not supported by the data source. Fagin et al. proposed the No Random Access (NRA) algorithm (Fagin et?al, J Comput Syst Sci 66:614–656, 2003) for this situation. In this paper, we motivate our work by a key observation of the NRA algorithm: the number of accesses could be further reduced by selectively (instead of in parallel) performing sorted accesses to different lists of the dataset. Based on this insight, we propose a Selective NRA (SNRA) algorithm aiming to cut down the unnecessary access cost. Later, we optimize the SNRA algorithm in terms of runtime cost and present the SNRA-opt algorithm. Furthermore, we address the problem of instance optimality theoretically and turn SNRA (and SNRA-opt) into instance optimal algorithms, termed as Hybrid-SNRA (HSNRA) and HSNRA-opt. Extensive experimental results show that our algorithms perform significantly fewer sorted accesses than NRA (and its state-of-the-art variations). In terms of runtime cost, the proposed SNRA-opt and HSNRA-opt algorithms are two orders of magnitude faster than the NRA algorithm. In addition, we discuss the parameter selection problem of the SNRA algorithms, both theoretically and experimentally. 相似文献
13.
Modern database applications are increasingly employing database management systems (DBMS) to store multimedia and other complex data. To adequately support the queries required to retrieve these kinds of data, the DBMS need to answer similarity queries. However, the standard structured query language (SQL) does not provide effective support for such queries. This paper proposes an extension to SQL that seamlessly integrates syntactical constructions to express similarity predicates to the existing SQL syntax and describes the implementation of a similarity retrieval engine that allows posing similarity queries using the language extension in a relational DBMS. The engine allows the evaluation of every aspect of the proposed extension, including the data definition language and data manipulation language statements, and employs metric access methods to accelerate the queries. Copyright © 2008 John Wiley & Sons, Ltd. 相似文献
14.
Due to the pervasive data uncertainty in many real applications, efficient and effective query answering on uncertain data has recently gained much attention from the database community. In this paper, we propose a novel and important query in the context of uncertain databases, namely probabilistic group subspace skyline (PGSS) query, which is useful in applications like sensor data analysis. Specifically, a PGSS query retrieves those uncertain objects that are, with high confidence, not dynamically dominated by other objects, with respect to a group of query points in ad-hoc subspaces. In order to enable fast PGSS query answering, we propose effective pruning methods to reduce the PGSS search space, which are seamlessly integrated into an efficient PGSS query procedure. Furthermore, to achieve low query cost, we provide a cost model, in light of which uncertain data are pre-processed and indexed. Extensive experiments have been conducted to demonstrate the efficiency and effectiveness of our proposed approaches. 相似文献
15.
The flexibility of XML data model allows a more natural representation of uncertain data compared with the relational model. Matching twig pattern against XML data is a fundamental problem in querying information from XML documents. For a probabilistic XML document, each twig answer has a probabilistic value because of the uncertainty of data. The twig answers that have small probabilistic value are useless to the users, and usually users only want to get the answers with the k largest probabilistic values. To this end, existing algorithms for ordinary XML documents cannot be directly applicable due to the need for handling probability distributional nodes and efficient calculation of top-k probabilities of answers in probabilistic XML. In this paper, we address the problem of finding twig answers with top-k probabilistic values against probabilistic XML documents directly. We propose a new encoding scheme called PEDewey for probabilistic XML in this paper. Based on this encoding scheme, we then design two algorithms for finding answers of top-k probabilities for twig queries. One is called ProTJFast, to process probabilistic XML data based on element streams in document order, and the other is called PTopKTwig, based on the element streams ordered by the path probability values. Experiments have been conducted to study the performance of these algorithms. 相似文献
16.
Xiang Zhao Chuan Xiao Xuemin Lin Wei Wang Yoshiharu Ishikawa 《The VLDB Journal The International Journal on Very Large Data Bases》2013,22(6):727-752
Graphs are widely used to model complicated data semantics in many applications in bioinformatics, chemistry, social networks, pattern recognition, etc. A recent trend is to tolerate noise arising from various sources such as erroneous data entries and find similarity matches. In this paper, we study graph similarity queries with edit distance constraints. Inspired by the $q$ -gram idea for string similarity problems, our solution extracts paths from graphs as features for indexing. We establish a lower bound of common features to generate candidates. Efficient algorithms are proposed to handle three types of graph similarity queries by exploiting both matching and mismatching features as well as degree information to improve the filtering and verification on candidates. We demonstrate the proposed algorithms significantly outperform existing approaches with extensive experiments on real and synthetic datasets. 相似文献
17.
HweeHwa Pang Xuhua Ding Baihua Zheng 《The VLDB Journal The International Journal on Very Large Data Bases》2010,19(3):437-456
The top-k query is employed in a wide range of applications to generate a ranked list of data that have the highest aggregate scores
over certain attributes. As the pool of attributes for selection by individual queries may be large, the data are indexed
with per-attribute sorted lists, and a threshold algorithm (TA) is applied on the lists involved in each query. The TA executes
in two phases—find a cut-off threshold for the top-k result scores, then evaluate all the records that could score above the threshold. In this paper, we focus on exact top-k queries that involve monotonic linear scoring functions over disk-resident sorted lists. We introduce a model for estimating
the depths to which each sorted list needs to be processed in the two phases, so that (most of) the required records can be
fetched efficiently through sequential or batched I/Os. We also devise a mechanism to quickly rank the data that qualify for
the query answer and to eliminate those that do not, in order to reduce the computation demand of the query processor. Extensive
experiments with four different datasets confirm that our schemes achieve substantial performance speed-up of between two
times and two orders of magnitude over existing TAs, at the expense of a memory overhead of 4.8 bits per attribute value.
Moreover, our scheme is robust to different data distributions and query characteristics. 相似文献
18.
Kian-Lee Tan Cheng Hian Goh Beng Chin Ooi 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(3):261-278
In many decision-making scenarios, decision makers require rapid feedback to their queries, which typically involve aggregates.
The traditional blocking execution model can no longer meet the demands of these users. One promising approach in the literature, called online aggregation, evaluates an aggregation query progressively as follows: as soon as certain data have been evaluated, approximate answers
are produced with their respective running confidence intervals; as more data are examined, the answers and their corresponding
running confidence intervals are refined. In this paper, we extend this approach to handle nested queries with aggregates
(i.e., at least one inner query block is an aggregate query) by providing users with (approximate) answers progressively as
the inner aggregation query blocks are evaluated. We address the new issues pose by nested queries. In particular, the answer
space begins with a superset of the final answers and is refined as the aggregates from the inner query blocks are refined.
For the intermediary answers to be meaningful, they have to be interpreted with the aggregates from the inner queries. We
also propose a multi-threaded model in evaluating such queries: each query block is assigned to a thread, and the threads can be evaluated concurrently and independently.
The time slice across the threads is nondeterministic in the sense that the user controls the relative rate at which these subqueries are being evaluated. For enumerative nested queries, we propose a priority-based evaluation strategy to present answers that are certainly in the final answer
space first, before presenting those whose validity may be affected as the inner query aggregates are refined. We implemented
a prototype system using Java and evaluated our system. Results for nested queries with a level and multiple levels of nesting
are reported. Our results show the effectiveness of the proposed mechanisms in providing progressive feedback that reduces
the initial waiting time of users significantly without sacrificing the quality of the answers.
Received April 25, 2000 / Accepted June 27, 2000 相似文献
19.
R. B. Buitendijk 《Computational Economics》1988,1(2):79-96
SQL is often used to express questions imposed by users. It is well known that this process is not error-free in that it may result in syntactically or semantically incorrect SQL queries. This article is meant to probe deeper into SQL queries that are incorrect, due to delusion errors. A classification of natural language questions will be introduced. The purpose of this classification is two-fold. Firstly, it gives insight into possible errors and enables us to distinguish between types of errors. Secondly, it is a means of determining the influence of measures (for instance, SQL training) that can be taken to reduce the number of aforementioned errors. For discussion, several of these measures will be presented and related to the classification.Abbreviations RA
Relational algebra
- SQL
structured query language 相似文献
20.
We present an algorithm for converting a semantically meaningful SQL query into an equivalent algebraic expression. The relational algebra we employ consists of the following operators: union, intersection, difference, Cartesian product, selection, and projection. The SQL queries we consider can have an arbitrary level of nesting but are restricted in three ways. We assume that they do not contain an ORDER BY or a GROUP BY clause, all SELECTs are in fact SELECT DISTINCTs, and that no aggregate functions are used. The first two assumptions are made in order to remain faithful to the definition of a relation as a set of rows. The last assumption is needed since there is no standard for incorporating aggregate functions into a relational algebra. The research results in this paper will be useful in implementing an SQL user interface for database management systems that internally employ relational algebra. They can also be used in optimizing the evaluation of SQL queries. 相似文献