期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Detecting redundant materialized views in data warehouse evolution

《Information Systems》2001,26(5):363-381

A data warehouse (DW) can be abstractly seen as a set of materialized views defined over a set of remote data sources. A DW is intended to satisfy a set of queries. The views materialized in a DW relate to each other in a complex manner, through common subexpressions, in order to guarantee high query performance and low view maintenance cost. DWs are time varying. As time passes new materialized views are added in order to satisfy new queries, or for performance reasons, while old queries are dropped. The evolution of a DW can result in a redundant set of materialized views. In this paper, we address the problem of detecting redundant materialized views in a given DW view selection, that is, materialized views that can be removed from DW without negatively affecting the query evaluation or the view maintenance process. Using an AND/OR dag representation for multiple queries and views, we first formalize the process of propagating source relation changes to the materialized views by exploiting common subexpressions between views and by using other materialized views that are not affected by these changes. Then, we provide an algorithm for detecting materialized views that are not needed in the process of propagating source relation changes to the DW. We also show how trivially redundant views can be identified in this process. Finally, we use these results to provide a procedure for detecting materialized views that are redundant in a DW. Our approach considers a broad class of views that includes grouping/aggregation views and is not dependent on a specific cost model. 相似文献

2.

Incremental Design of a Data Warehouse

Dimitri Theodoratos Timos Sellis 《Journal of Intelligent Information Systems》2000,15(1):7-27

A data warehouse (DW) can be seen as a set of materialized views defined over remote base relations. When a query is posed, it is evaluated locally, using the materialized views, without accessing the original information sources. The DWs are dynamic entities that evolve continuously over time. As time passes, new queries need to be answered by them. Some of these queries can be answered using exclusively the materialized views. In general though new views need to be added to the DW.In this paper we investigate the problem of incrementally designing a DW when new queries need to be answered and possibly extra space is allocated for view materialization. Based on an AND/OR dag representation of multiple queries, we model the problem as a state space search problem. We design incremental algorithms for selecting a set of new views to additionally materialize in the DW that: (a) fits in the extra space, (b) allows a complete rewriting of the new queries over the materialized views, and (c) minimizes the combined new query evaluation and new view maintenance cost. Finally, we discuss methods for pruning the search space so that efficiency is improved. 相似文献

3.

Detecting common subexpressions for multiple query optimization over loosely-coupled heterogeneous data sources

Mahesh B. Chaudhari Suzanne W. Dietrich 《Distributed and Parallel Databases》2016,34(2):119-143

The research presented in this paper supports the identification of common subexpressions as candidates for potential materialized views that form the basis of multiple query optimization in a loosely-coupled distributed system where query expressions access heterogeneous data sources, including relations and data-centric XML. This paper introduces a unifying mixed multigraph formalism to represent SQL, XQuery, and LINQ queries in a common query graph model and a heuristics-based algorithm to detect common subexpressions. The identified common subexpressions represent an opportunity for defining a materialized view to avoid repeating computation. The common subexpressions may access only relations, only XML, or a combination of relations and XML. The mixed multigraph model and the heuristic rules presented in this paper have distinguished advantages over the existing approaches that consider only relational or XML data sources individually. The mixed multigraph model can present SQL, XQuery, and LINQ queries in a single graph model and the heuristic rules are designed to consider the identical and subsumed conditions at the same time. A prototype implementation of the algorithm illustrates the applicability of the approach using various examples from the research literature as well as scenarios over a Criminal Justice enterprise that include common subexpressions across relational and XML data sources. 相似文献

4.

Keyword proximity search in XML trees 总被引：3，自引：0，他引：3

Hristidis V. Koudas N. Papakonstantinou Y. Divesh Srivastava 《Knowledge and Data Engineering, IEEE Transactions on》2006,18(4):525-539

Recent works have shown the benefits of keyword proximity search in querying XML documents in addition to text documents. For example, given query keywords over Shakespeare's plays in XML, the user might be interested in knowing how the keywords cooccur. In this paper, we focus on XML trees and define XML keyword, proximity queries to return the (possibly heterogeneous) set of minimum connecting trees (MCTs) of the matches to the individual keywords in the query. We consider efficiently executing keyword proximity queries on labeled trees (XML) in various settings: 1) when the XML database has been preprocessed and 2) when no indices are available on the XML database. We perform a detailed experimental evaluation to study the benefits of our approach and show that our algorithms considerably outperform prior algorithms and other applicable approaches. 相似文献

5.

SAIL: Structure-aware indexing for effective and progressive top-k keyword search over XML documents

Guoliang Li Chen Li Jianhua Feng Lizhu Zhou 《Information Sciences》2009,179(21):3745-3762

Keyword search in XML documents has recently gained a lot of research attention. Given a keyword query, existing approaches first compute the lowest common ancestors (LCAs) or their variants of XML elements that contain the input keywords, and then identify the subtrees rooted at the LCAs as the answer. In this the paper we study how to use the rich structural relationships embedded in XML documents to facilitate the processing of keyword queries. We develop a novel method, called SAIL, to index such structural relationships for efficient XML keyword search. We propose the concept of minimal-cost trees to answer keyword queries and devise structure-aware indices to maintain the structural relationships for efficiently identifying the minimal-cost trees. For effectively and progressively identifying the top-k answers, we develop techniques using link-based relevance ranking and keyword-pair-based ranking. To reduce the index size, we incorporate a numbering scheme, namely schema-aware dewey code, into our structure-aware indices. Experimental results on real data sets show that our method outperforms state-of-the-art approaches significantly, in both answer quality and search efficiency. 相似文献

6.

基于多维护策略的物化视图选择方法 总被引：1，自引：0，他引：1

崔晓军薛永生张东站黄宗毅《计算机科学》2006,33(2):114-117

物化视图是数据仓库环境中提高OLAP查询效率的重要手段，因此，物化视图的选择是数据仓库设计中重要的决策之一。本文提出的物化视图选择方法目标是选择合适的视图进行物化，使得查询处理的总代价和物化视图的维护代价最低，提出了物化视图收益模型，并在此基础上基于视图的多维护策略提出了物化视图选择的方法：基于增量和重计算的物化视图选择算法IRMVS、基于增量策略的物化视图选择算法IMVS和基于重计算策略的物化视图选择算法RMVs和基于增量策略的物化后代视图选择算法IMDVS，理论分析和实验表明这些算法是有效可行的。相似文献

7.

An Effective Semantic Cache for Exploiting XPath Query/View Answerability

下载免费PDF全文

李国良冯建华《计算机科学技术学报》2010,25(2):347-361

Maintaining a semantic cache of materialized XPath views inside or outside the database is a novel,feasible and efficient approach to facilitating XML query processing.However,most of the existing approaches incur the following disadvantages:1) they cannot discover enough potential cached views sufficiently to effectively answer subsequent queries; or 2) they are inefficient for view selection due to the complexity of XPath expressions.In this paper,we propose SCEND, an effective Semantic Cache based on ... 相似文献

8.

Web数据集成系统基于QC模型的物化视图选择 总被引：2，自引：0，他引：2

高军唐世渭杨冬青王腾蛟《计算机研究与发展》2005,42(2):308-314

在Web数据集成系统中,物化视图能够有效地减少网络传输代价,提高系统的查询效率．如何选择查询进行物化,使得选中的查询满足集成层的空间限制,同时获取最大物化收益,成为集成系统中一个迫切需要解决的问题．传统方法没有考虑到海量XML查询之间的包含关系,其选择的物化视图中可能包含冗余的信息．针对上述问题,提出了①Web数据集成系统中海量查询集合的QC(query containment)模型,该模型能够捕捉查询之间最常见的包含关系;②基于QC模型的物化视图选择算法,算法考虑了物化视图选择相关的主要因素,包括查询提交的频率、空间代价、查询重写能力和查询结果的完备性,提出了查询位图的物化视图组织方式,从而获取更加合理的物化视图选择方案．实验结果证明了该方法的有效性．相似文献

9.

数据仓库中物化视图的选取策略

王云峰张祖平《计算技术与自动化》2004,23(3):43-46

数据仓库中用存储大量的物化视图来加速OLAP的查询响应，物化视图的选取是数据仓库设计中的一个重要问题。论文提出了一个有效的物化视图选取算法，采用基于数据立方体层次搜索的方式选取视图。经分析与测试表明，该算法取得良好的效果和效率。相似文献

10.

Designing data warehouses 总被引：9，自引：0，他引：9

Dimitri Timos 《Data & Knowledge Engineering》1999,31(3):279-301

A Data Warehouse (DW) is a database that collects and stores data from multiple remote and heterogeneous information sources. When a query is posed, it is evaluated locally, without accessing the original information sources. In this paper we deal with the issue of designing a DW, in the context of the relational model, by selecting a set of views to materialize in the DW. First, we briefly present a theoretical framework for the DW design problem, which concerns the selection of a set of views that (a) fit in the space allocated to the DW, (b) answer all the queries of interest, and (c) minimize the total query evaluation and view maintenance cost. We then formalize the DW design problem as a state space search problem by taking into account multiquery optimization over the maintenance queries (i.e., queries that compute changes to the materialized views) and the use of auxiliary views for reducing the view maintenance cost. Finally, incremental algorithms and heuristics for pruning the search space are presented. 相似文献

11.

Algorithms and applications for answering ranked queries using ranked views

Vagelis?Hristidis Email author Yannis?Papakonstantinou 《The VLDB Journal The International Journal on Very Large Data Bases》2004,13(1):49-70

Ranked queries return the top objects of a database according to a preference function. We present and evaluate (experimentally and theoretically) a core algorithm that answers ranked queries in an efficient pipelined manner using materialized ranked views. We use and extend the core algorithm in the described PREFER and MERGE systems. PREFER precomputes a set of materialized views that provide guaranteed query performance. We present an algorithm that selects a near optimal set of views under space constraints. We also describe multiple optimizations and implementation aspects of the downloadable version of PREFER. Then we discuss MERGE, which operates at a metabroker and answers ranked queries by retrieving a minimal number of objects from sources that offer ranked queries. A speculative version of the pipelining algorithm is described.Received: 10 June 2002, Accepted: 11 June 2002, Published online: 30 September 2003Edited by: A. MendelzonWork supported by NSF Grant No. 9734548. 相似文献

12.

为物化视图选择构造搜索空间的新策略IMVPP

夏小玲张红《计算机科学与探索》2010,4(5):473-479

数据仓库中物化视图选择算法的代价与搜索空间的尺寸紧密相关。提出了一种基于输入查询的公共子表达式的候选视图搜索空间构造方法IMVPP,利用算法1计算出的公共子表达式,能被其他查询共享,并可对输入查询进行重写,有利于缩减视图搜索空间,提高查询效率。理论分析与实验结果表明,此方法是有效、可行的。相似文献

13.

Keyword query with structure: towards semantic scoring of XML search results

Xiping Liu Changxuan Wan Dexi Liu 《Information Technology and Management》2016,17(2):151-163

Keyword search is an effective paradigm for information discovery and has been introduced recently to query XML documents. Scoring of XML search results is an important issue in XML keyword search. Traditional “bag-of-words” model cannot differentiate the roles of keywords as well as the relationship between keywords, thus is not proper for XML keyword queries. In this paper, we present a new scoring method based on a novel query model, called keyword query with structure (QWS), which is specially designed for XML keyword query. The method is based on a totally new view taken by the QWS model on a keyword query that, a keyword query is a composition of several query units, each representing a query condition. We believe that this method captures the semantic relevance of the search results. The paper first introduces an algorithm reformulating a keyword query to a QWS. Then, a scoring method is presented which measures the relevance of search results according to how many and how well the query conditions are matched. The scoring method is also extended to clusters of search results. Experimental results verify the effectiveness of our methods. 相似文献

14.

一种实化视图的合并算法 总被引：1，自引：0，他引：1

陈长清程恳《计算机应用》2005,25(4):814-816

对于拥有大量实化视图的实际数据库应用系统,提出了视图合并的方法以减少整个视图的数量,缩减实化视图的搜索空间;还提出了归并树和基于归并树的快速有效的合并算法。实验表明,实化视图的合并是快速寻找可能响应查询的实化视图的一种有效途径,可以显著改进查询处理的性能。相似文献

15.

XML文档过滤算法YFilter的一种改进技术

苏明柿张守志《计算机工程》2005,31(21):63-65

采用索引技术,对输入的XML文档建立一个双索引结构来改进YFilter算法,优化XML文档过滤性能。藉助索引结构,该算法超前搜索元素结点在文档中的结构信息,预先排除不能保证得到任何匹配结果的元素结点,以避免大量不必要的查询处理。实验结果显示,当输入的XML文档较大时,该算法有较好的过滤性能。相似文献

16.

Finding an efficient rewriting of OLAP queries using materialized views in data warehouses

Chang-Sup Myoung Ho Yoon-Joon 《Decision Support Systems》2002,32(4)

OLAP queries involve a lot of aggregations on a large amount of data in data warehouses. To process expensive OLAP queries efficiently, we propose a new method to rewrite a given OLAP query using various kinds of materialized views which already exist in data warehouses. We first define the normal forms of OLAP queries and materialized views based on the selection and aggregation granularities, which are derived from the lattice of dimension hierarchies. Conditions for usability of materialized views in rewriting a given query are specified by relationships between the components of their normal forms. We present a rewriting algorithm for OLAP queries that can effectively utilize materialized views having different selection granularities, selection regions, and aggregation granularities together. We also propose an algorithm to find a set of materialized views that results in a rewritten query which can be executed efficiently. We show the effectiveness and performance of the algorithm experimentally. 相似文献

17.

ELCA evaluation for keyword search on probabilistic XML data

Rui Zhou Chengfei Liu Jianxin Li Jeffrey Xu Yu 《World Wide Web》2013,16(2):171-193

As probabilistic data management is becoming one of the main research focuses and keyword search is turning into a more popular query means, it is natural to think how to support keyword queries on probabilistic XML data. With regards to keyword query on deterministic XML documents, ELCA (Exclusive Lowest Common Ancestor) semantics allows more relevant fragments rooted at the ELCAs to appear as results and is more popular compared with other keyword query result semantics (such as SLCAs). In this paper, we investigate how to evaluate ELCA results for keyword queries on probabilistic XML documents. After defining probabilistic ELCA semantics in terms of possible world semantics, we propose an approach to compute ELCA probabilities without generating possible worlds. Then we develop an efficient stack-based algorithm that can find all probabilistic ELCA results and their ELCA probabilities for a given keyword query on a probabilistic XML document. Finally, we experimentally evaluate the proposed ELCA algorithm and compare it with its SLCA counterpart in aspects of result probability, time and space efficiency, and scalability. 相似文献

18.

关联规则挖掘中一种实视图选择策略

陈佳李敏《计算机工程与应用》2012,48(24):134-138

关联规则挖掘是数据挖掘问题中一个典型任务。其挖掘响应时间是数据挖掘系统中重要的问题之一。为了高效解决这一问题,给出了关联规则实视图的概念以及相应的代价模型;提出了针对数据挖掘环境的实视图选择算法,以便在存储空间约束的条件下,取得较好的查询性能。实验结果表明,该算法能有效地选取实视图,从而大大提高关联规则挖掘算法的效率。相似文献

19.

一种扩充语义的实视图重写查询技术

下载免费PDF全文

荀亚玲张继福刘爱琴《计算机工程与应用》2008,44(12):157-160

分组聚集查询已成为数据仓库领域研究的核心问题之一,实视图是提高分组聚集查询性能的有效手段。利用维属性间的层次关系,对一般意义上的实视图重写查询进行了扩展,讨论了单一视图重写查询的限制条件,并给出重写方法,在此基础上,提出了一种利用多个实视图重写查询的优化选择算法,并通过实验表明,该算法进一步提高了分组聚集查询效率。相似文献

20.

A heuristic approach to selecting views for materialization

Mark Roantree Jun Liu 《Software》2014,44(10):1157-1179

XML data warehouses are becoming more popular as data is harvested from the web or as output from web services. As these warehouses tend to grow significantly over time, various techniques for expediting queries have been developed. One such technique is to materialize some or all of the queries in advance of query processing. These views are then subject to change either when underlying data changes or view definitions themselves are modified by users. The work in this paper focuses on changes to view definitions or view adaptation as it is known. Our approach is to segment the materialized view into fragments to minimize the effect of view changes. One crucial aspect to this approach is how to select the best fragments for materialization. In this paper, we introduce a new approach to selecting fragments based on heuristics derived from costs associated with the view graph. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献