期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Extending the data warehouse for service provisioning data

Yannis 《Data & Knowledge Engineering》2006,59(3):700-724

相似文献

2.

Continuous aggregate nearest neighbor queries 总被引：1，自引：0，他引：1

Hicham G. Elmongui Mohamed F. Mokbel Walid G. Aref 《GeoInformatica》2013,17(1):63-95

This paper addresses the problem of continuous aggregate nearest-neighbor (CANN) queries for moving objects in spatio-temporal data stream management systems. A CANN query specifies a set of landmarks, an integer k, and an aggregate distance function f (e.g., min, max, or sum), where f computes the aggregate distance between a moving object and each of the landmarks. The answer to this continuous query is the set of k moving objects that have the smallest aggregate distance f. A CANN query may also be viewed as a combined set of nearest neighbor queries. We introduce several algorithms to continuously and incrementally answer CANN queries. Extensive experimentation shows that the proposed operators outperform the state-of-the-art algorithms by up to a factor of 3 and incur low memory overhead. 相似文献

3.

Designing data warehouses 总被引：9，自引：0，他引：9

Dimitri Timos 《Data & Knowledge Engineering》1999,31(3):279-301

A Data Warehouse (DW) is a database that collects and stores data from multiple remote and heterogeneous information sources. When a query is posed, it is evaluated locally, without accessing the original information sources. In this paper we deal with the issue of designing a DW, in the context of the relational model, by selecting a set of views to materialize in the DW. First, we briefly present a theoretical framework for the DW design problem, which concerns the selection of a set of views that (a) fit in the space allocated to the DW, (b) answer all the queries of interest, and (c) minimize the total query evaluation and view maintenance cost. We then formalize the DW design problem as a state space search problem by taking into account multiquery optimization over the maintenance queries (i.e., queries that compute changes to the materialized views) and the use of auxiliary views for reducing the view maintenance cost. Finally, incremental algorithms and heuristics for pruning the search space are presented. 相似文献

4.

基于不确定数据的查询处理综述 总被引：5，自引：0，他引：5

崔斌卢阳《计算机应用》2008,28(11):2729-2731

不确定数据在一些重要应用领域中是固有存在的,如传感器网络和移动物体追踪。在不确定数据上使用传统的查询方法会使查询结果出现偏差,不能满足用户的需求。因此,基于不确定数据的查询处理受到了越来越多的关注。与在确定数据上查询不同,不确定数据上的研究工作将概率引入到数据模型中来衡量不确定对象成为结果集中元素的可能性。由于问题定义和数据模型的不同,不确定数据上的查询类型也多种多样。从问题定义、数据模型、剪枝策略和算法等角度,对基于不确定数据的范围查询、top-k查询以及skyline查询进行了介绍。相似文献

5.

Optimization in Data Cube System Design

Edward Hung David W. Cheung Ben Kao 《Journal of Intelligent Information Systems》2004,23(1):17-45

The design of an OLAP system for supporting real-time queries is one of the major research issues. One approach is to use data cubes, which are materialized precomputed multidimensional views of data in a data warehouse. We can derive a set of data cubes to answer each frequently asked query directly. However, there are two practical problems: (1) the maintenance cost of the data cubes, and (2) the query cost to answer those queries. Maintaining a data cube requires disk storage and CPU computation, so the maintenance cost is related to the total size as well as the total number of data cubes materialized. In most cases, materializing all data cubes is impractical. The maintenance cost may be reduced by merging some data cubes. However, the resulting larger data cubes will increase the query cost of answering some queries. If the bounds on the maintenance cost and the query cost are too strict, we help the user decide which queries to be sacrificed and not taken into consideration. We have defined an optimization problem in data cube system design. Given a maintenance-cost bound, a query-cost bound and a set of frequently asked queries, it is necessary to determine a set of data cubes such that the system can answer a largest subset of the queries without violating the two bounds. This is an NP-hard problem. We propose approximate Greedy algorithms GR, 2GM and 2GMM, which are shown to be both effective and efficient by experiments done on a census data set and a forest-cover-type data set. 相似文献

6.

Query processing over incomplete autonomous databases: query rewriting using learned data dependencies

Garrett Wolf Aravind Kalavagattu Hemal Khatri Raju Balakrishnan Bhaumik Chokshi Jianchun Fan Yi Chen Subbarao Kambhampati 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(5):1167-1190

Incompleteness due to missing attribute values (aka “null values”) is very common in autonomous web databases, on which user accesses are usually supported through mediators. Traditional query processing techniques that focus on the strict soundness of answer tuples often ignore tuples with critical missing attributes, even if they wind up being relevant to a user query. Ideally we would like the mediator to retrieve such possibleanswers and gauge their relevance by accessing their likelihood of being pertinent answers to the query. The autonomous nature of web databases poses several challenges in realizing this objective. Such challenges include the restricted access privileges imposed on the data, the limited support for query patterns, and the bounded pool of database and network resources in the web environment. We introduce a novel query rewriting and optimization framework QPIAD that tackles these challenges. Our technique involves reformulating the user query based on mined correlations among the database attributes. The reformulated queries are aimed at retrieving the relevant possibleanswers in addition to the certain answers. QPIAD is able to gauge the relevance of such queries allowing tradeoffs in reducing the costs of database query processing and answer transmission. To support this framework, we develop methods for mining attribute correlations (in terms of Approximate Functional Dependencies), value distributions (in the form of Naïve Bayes Classifiers), and selectivity estimates. We present empirical studies to demonstrate that our approach is able to effectively retrieve relevant possibleanswers with high precision, high recall, and manageable cost. 相似文献

7.

Answering Queries Using Limited External Query Processors

《Journal of Computer and System Sciences》1999,58(1):69-82

When answering queries using external information sources, the contents of the queries can be described by views. To answer a query, we must rewrite it using the set of views presented by the sources. When the external information sources also have the ability to answer some (perhaps limited) sets of queries that require performing operations on their data, the set of views presented by the source may be infinite (albeit encoded in some finite fashion). Previous work on answering queries using views has only considered the case where the set of views is finite. In order to exploit the ability of information sources to answer more complex queries, we consider the problem of answering conjunctive queries using infinite sets of conjunctive views. Our first result is that an infinite set of conjunctive views can be partitioned into a finite number of equivalence classes, such that picking one view from every nonempty class is sufficient to determine whether the query can be answered using the views. Second, we show how to compute the set of equivalence classes for sets of conjunctive views encoded by a datalog program. Furthermore, we extend our results to the case when the query and the views use the built-in predicates <, ⩽, =, and ≠, and they are interpreted over a dense domain. Finally, we extend our results to conjunctive queries and views with the built-in predicates <, ⩽, and = interpreted over the integers. In doing so we present a result of independent interest, namely, an algorithm to minimize such queries. 相似文献

8.

SINA: Semantic interpretation of user queries for question answering on interlinked data

《Journal of Web Semantics》2015

The architectural choices underlying Linked Data have led to a compendium of data sources which contain both duplicated and fragmented information on a large number of domains. One way to enable non-experts users to access this data compendium is to provide keyword search frameworks that can capitalize on the inherent characteristics of Linked Data. Developing such systems is challenging for three main reasons. First, resources across different datasets or even within the same dataset can be homonyms. Second, different datasets employ heterogeneous schemas and each one may only contain a part of the answer for a certain user query. Finally, constructing a federated formal query from keywords across different datasets requires exploiting links between the different datasets on both the schema and instance levels. We present Sina, a scalable keyword search system that can answer user queries by transforming user-supplied keywords or natural-languages queries into conjunctive SPARQL queries over a set of interlinked data sources. Sina uses a hidden Markov model to determine the most suitable resources for a user-supplied query from different datasets. Moreover, our framework is able to construct federated queries by using the disambiguated resources and leveraging the link structure underlying the datasets to query. We evaluate Sina over three different datasets. We can answer 25 queries from the QALD-1 correctly. Moreover, we perform as well as the best question answering system from the QALD-3 competition by answering 32 questions correctly while also being able to answer queries on distributed sources. We study the runtime of SINA in its mono-core and parallel implementations and draw preliminary conclusions on the scalability of keyword search on Linked Data. 相似文献

9.

Towards practical private processing of database queries over public data

Shiyuan Wang Divyakant Agrawal Amr El Abbadi 《Distributed and Parallel Databases》2014,32(1):65-89

Privacy is a major concern when users query public online data services. The privacy of millions of people has been jeopardized in numerous user data leakage incidents in many popular online applications. To address the critical problem of personal data leakage through queries, we enable private querying on public data services so that the contents of user queries and any user data are hidden and therefore not revealed to the online service providers. We propose two protocols for private processing of database queries, namely BHE and HHE. The two protocols provide strong query privacy by using Paillier’s homomorphic encryption, and support common database queries such as range and join queries by relying on the bucketization of public data. In contrast to traditional Private Information Retrieval proposals, BHE and HHE only incur one round of client server communication for processing a single query. BHE is a basic private query processing protocol that provides complete query privacy but still incurs expensive computation and communication costs. Built upon BHE, HHE is a hybrid protocol that applies ciphertext computation and communication on a subset of the data, such that this subset not only covers the actual requested data but also resembles some frequent query patterns of common users, thus achieving practical query performance while ensuring adequate privacy levels. By using frequent query patterns and data specific privacy protection, HHE is not vulnerable to the traditional attacks on k-Anonymity that exploit data similarity and skewness. Moreover, HHE consistently protects user query privacy for a sequence of queries in a single query session. 相似文献

10.

Adaptive relaxation for querying heterogeneous XML data sources

Chengfei Liu Jianxin Li Jeffrey Xu Yu Rui Zhou 《Information Systems》2010

Searching XML data with a structured XML query can improve the precision of results compared with a keyword search. However, the structural heterogeneity of the large number of XML data sources makes it difficult to answer the structured query exactly. As such, query relaxation is necessary. Previous work on XML query relaxation poses the problem of unnecessary computation of a big number of unqualified relaxed queries. To address this issue, we propose an adaptive relaxation approach which relaxes a query against different data sources differently based on their conformed schemas. In this paper, we present a set of techniques that supports this approach, which includes schema-aware relaxation rules for relaxing a query adaptively, a weighted model for ranking relaxed queries, and algorithms for adaptive relaxation of a query and top-k query processing. We discuss results from a comprehensive set of experiments that show the effectiveness and the efficiency of our approach. 相似文献

11.

Detecting redundant materialized views in data warehouse evolution

《Information Systems》2001,26(5):363-381

A data warehouse (DW) can be abstractly seen as a set of materialized views defined over a set of remote data sources. A DW is intended to satisfy a set of queries. The views materialized in a DW relate to each other in a complex manner, through common subexpressions, in order to guarantee high query performance and low view maintenance cost. DWs are time varying. As time passes new materialized views are added in order to satisfy new queries, or for performance reasons, while old queries are dropped. The evolution of a DW can result in a redundant set of materialized views. In this paper, we address the problem of detecting redundant materialized views in a given DW view selection, that is, materialized views that can be removed from DW without negatively affecting the query evaluation or the view maintenance process. Using an AND/OR dag representation for multiple queries and views, we first formalize the process of propagating source relation changes to the materialized views by exploiting common subexpressions between views and by using other materialized views that are not affected by these changes. Then, we provide an algorithm for detecting materialized views that are not needed in the process of propagating source relation changes to the DW. We also show how trivially redundant views can be identified in this process. Finally, we use these results to provide a procedure for detecting materialized views that are redundant in a DW. Our approach considers a broad class of views that includes grouping/aggregation views and is not dependent on a specific cost model. 相似文献

12.

Ranking queries on uncertain data

Ming Hua Jian Pei Xuemin Lin 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(1):129-153

Uncertain data is inherent in a few important applications. It is far from trivial to extend ranking queries (also known as top-k queries), a popular type of queries on certain data, to uncertain data. In this paper, we cast ranking queries on uncertain data using three parameters: rank threshold k, probability threshold p, and answer set size threshold l. Systematically, we identify four types of ranking queries on uncertain data. First, a probability threshold top-k query computes the uncertain records taking a probability of at least p to be in the top-k list. Second, a top-(k, l) query returns the top-l uncertain records whose probabilities of being ranked among top-k are the largest. Third, the p-rank of an uncertain record is the smallest number k such that the record takes a probability of at least p to be ranked in the top-k list. A rank threshold top-k query retrieves the records whose p-ranks are at most k. Last, a top-(p, l) query returns the top-l uncertain records with the smallest p-ranks. To answer such ranking queries, we present an efficient exact algorithm, a fast sampling algorithm, and a Poisson approximation-based algorithm. To answer top-(k, l) queries and top-(p, l) queries, we propose PRist+, a compact index. An efficient index construction algorithm and efficacious query answering methods are developed for PRist+. An empirical study using real and synthetic data sets verifies the effectiveness of the probabilistic ranking queries and the efficiency of our methods. 相似文献

13.

Finding an efficient rewriting of OLAP queries using materialized views in data warehouses

Chang-Sup Myoung Ho Yoon-Joon 《Decision Support Systems》2002,32(4)

OLAP queries involve a lot of aggregations on a large amount of data in data warehouses. To process expensive OLAP queries efficiently, we propose a new method to rewrite a given OLAP query using various kinds of materialized views which already exist in data warehouses. We first define the normal forms of OLAP queries and materialized views based on the selection and aggregation granularities, which are derived from the lattice of dimension hierarchies. Conditions for usability of materialized views in rewriting a given query are specified by relationships between the components of their normal forms. We present a rewriting algorithm for OLAP queries that can effectively utilize materialized views having different selection granularities, selection regions, and aggregation granularities together. We also propose an algorithm to find a set of materialized views that results in a rewritten query which can be executed efficiently. We show the effectiveness and performance of the algorithm experimentally. 相似文献

14.

CineCubes: Aiding data workers gain insights from OLAP queries

《Information Systems》2015

In this paper we demonstrate that it is possible to enrich query answering with a short data movie that gives insights to the original results of an OLAP query. Our method, implemented in an actual system, CineCubes, includes the following steps. The user submits a query over an underlying star schema. Taking this query as input, the system comes up with a set of queries complementing the information content of the original query, and executes them. For each of the query results, we execute a set of highlight extraction algorithms that identify interesting patterns and values in the data of the results. Then, the system visualizes the query results and accompanies this presentation with a text commenting on the result highlights. Moreover, via a text-to-speech conversion the system automatically produces audio for the constructed text. Each combination of visualization, text and audio practically constitutes a movie, which is wrapped as a PowerPoint presentation and returned to the user. 相似文献

15.

View selection for designing the global data warehouse 总被引：1，自引：0，他引：1

Dimitri Spyros Timos 《Data & Knowledge Engineering》2001,39(3):219-240

A global data warehouse (DW) integrates data from multiple distributed heterogeneous databases and other information sources. A global DW can be abstractly seen as a set of materialized views. The selection of views for materialization in a DW is an important decision in the design of a DW. Current commercial products do not provide tools for automatic DW design. We provide a general method that, given a set of select-project-join queries to be satisfied by the DW, generates sets of materialized views that satisfy all the input queries. This process is complex since ‘common subexpressions' between the queries need to be detected and exploited. Our method is then applied to solve the problem of selecting such a materialized view set that fits in the space allocated to the DW for materialization and minimizes the combined overall query evaluation and view maintenance cost. We design algorithms which are implemented and we report on their experimental evaluation. 相似文献

16.

Ultrawrap: SPARQL execution on relational data

《Journal of Web Semantics》2013

The Semantic Web’s promise of web-wide data integration requires the inclusion of legacy relational databases,¹ i.e. the execution of SPARQL queries on RDF representation of the legacy relational data. We explore a hypothesis: existing commercial relational databases already subsume the algorithms and optimizations needed to support effective SPARQL execution on existing relationally stored data. The experiment is embodied in a system, Ultrawrap, that encodes a logical representation of the database as an RDF graph using SQL views and a simple syntactic translation of SPARQL queries to SQL queries on those views. Thus, in the course of executing a SPARQL query, the SQL optimizer uses the SQL views that represent a mapping of relational data to RDF, and optimizes its execution. In contrast, related research is predicated on incorporating optimizing transforms as part of the SPARQL to SQL translation, and/or executing some of the queries outside the underlying SQL environment.Ultrawrap is evaluated using two existing benchmark suites that derive their RDF data from relational data through a Relational Database to RDF (RDB2RDF) Direct Mapping and repeated for each of the three major relational database management systems. Empirical analysis reveals two existing relational query optimizations that, if applied to the SQL produced from a simple syntactic translations of SPARQL queries (with bound predicate arguments) to SQL, consistently yield query execution time that is comparable to that of SQL queries written directly for the relational representation of the data. The analysis further reveals the two optimizations are not uniquely required to achieve a successful wrapper system. The evidence suggests effective wrappers will be those that are designed to complement the optimizer of the target database. 相似文献

17.

Query processing over object views of relational data 总被引：2，自引：0，他引：2

Gustav Fahl Tore Risch 《The VLDB Journal The International Journal on Very Large Data Bases》1997,6(4):261-281

This paper presents an approach to object view management for relational databases. Such a view mechanism makes it possible for users to transparently work with data in a relational database as if it was stored in an object-oriented (OO) database. A query against the object view is translated to one or several queries against the relational database. The results of these queries are then processed to form an answer to the initial query. The approach is not restricted to a ‘pure’ object view mechanism for the relational data, since the object view can also store its own data and methods. Therefore it must be possible to process queries that combine local data residing in the object view with data retrieved from the relational database. We discuss the key issues when object views of relational databases are developed, namely: how to map relational structures to sub-type/supertype hierarchies in the view, how to represent relational database access in OO query plans, how to provide the concept of object identity in the view, how to handle the fact that the extension of types in the view depends on the state of the relational database, and how to process and optimize queries against the object view. The results are based on experiences from a running prototype implementation. Edited by: M.T. ?zsu. Received April 12, 1995 / Accepted April 22, 1996 相似文献

18.

CubiST++: Evaluating Ad-Hoc CUBE Queries Using Statistics Trees

Joachim Hammer Lixin Fu 《Distributed and Parallel Databases》2003,14(3):221-254

We report on a new, efficient encoding for the data cube, which results in a drastic speed-up of OLAP queries that aggregate along any combination of dimensions over numerical and categorical attributes. We are focusing on a class of queries called cube queries, which return aggregated values rather than sets of tuples. Our approach, termed CubiST⁺⁺ (Cubing with Statistics Trees Plus Families), represents a drastic departure from existing relational (ROLAP) and multi-dimensional (MOLAP) approaches in that it does not use the view lattice to compute and materialize new views from existing views in some heuristic fashion. Instead, CubiST⁺⁺ encodes all possible aggregate views in the leaves of a new data structure called statistics tree (ST) during a one-time scan of the detailed data. In order to optimize the queries involving constraints on hierarchy levels of the underlying dimensions, we select andmaterialize a family of candidate trees, which represent superviews over the different hierarchical levels of the dimensions. Given a query, our query evaluation algorithm selects the smallest tree in the family, which can provide the answer. Extensive evaluations of our prototype implementation have demonstrated its superior run-time performance and scalability when compared with existing MOLAP and ROLAP systems. 相似文献

19.

Web数据集成系统基于QC模型的物化视图选择 总被引：2，自引：0，他引：2

高军唐世渭杨冬青王腾蛟《计算机研究与发展》2005,42(2):308-314

在Web数据集成系统中,物化视图能够有效地减少网络传输代价,提高系统的查询效率．如何选择查询进行物化,使得选中的查询满足集成层的空间限制,同时获取最大物化收益,成为集成系统中一个迫切需要解决的问题．传统方法没有考虑到海量XML查询之间的包含关系,其选择的物化视图中可能包含冗余的信息．针对上述问题,提出了①Web数据集成系统中海量查询集合的QC(query containment)模型,该模型能够捕捉查询之间最常见的包含关系;②基于QC模型的物化视图选择算法,算法考虑了物化视图选择相关的主要因素,包括查询提交的频率、空间代价、查询重写能力和查询结果的完备性,提出了查询位图的物化视图组织方式,从而获取更加合理的物化视图选择方案．实验结果证明了该方法的有效性．相似文献

20.

Querying Probabilistic Business Processes for Sub-Flows

Daniel Deutch 《Theory of Computing Systems》2013,52(3):367-402

A Business Process (BP for short) consists of a set of activities which, combined in a flow, achieve some business goal. A given BP may have a large, possibly infinite, number of possible execution flows (EX-flows for short), each having some probability to occur at run time. This paper studies query evaluation over such probabilistic BPs. We focus on two important classes of queries, namely boolean queries that compute the probability that a random EX-flow of a BP satisfies a given property, and projection queries focusing on portions of EX-flows that are of interest to the user. For the latter queries the answer consists of the top-k instances of these portions that are most likely to occur at run-time. We study the complexity of query evaluation for both kinds of queries, showing in particular that projection queries may be harder to evaluate than boolean queries. We present a picture of which combinations of BP classes and query features lead to PTIME algorithms and which to NP-hard or infeasible problems. 相似文献