期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

林小静薛永生《计算机工程与设计》2007,28(13):3056-3059

为了提高决策支持和OLAP查询的响应效率,数据仓库多采用物化视图的思想.因此,物化视图的选择策略是数据仓库研究的重要问题之一.其目标是选出一组存储、维护代价与查询代价的总和为最小的物化视图.提出一个以MVPP(multi-view processing plan)为视图选择的搜索空间的物化视图选择新算法--VSMF(views selection base on multi-factor)算法.该算法在存储空间约束下同时实现多查询最优化和视图维护最优化. 相似文献

2.

Efficient approaches for materialized views selection in a data warehouse

Ming-Chuan Hung Man-Lin Huang Nien-Lin Hsueh 《Information Sciences》2007,177(6):1333-1348

View materialization is an effective method to increase query efficiency in a data warehouse and improve OLAP query performance. However, one encounters the problem of space insufficiency if all possible views are materialized in advance. Reducing query time by means of selecting a proper set of materialized views with a lower cost is crucial for efficient data warehousing. In addition, the costs of data warehouse creation, query, and maintenance have to be taken into account while views are materialized. In this paper, we propose efficient algorithms to select a proper set of materialized views, constrained by storage and cost considerations, to help speed up the entire data warehousing process. We derive a cost model for data warehouse query and maintenance as well as efficient view selection algorithms that effectively exploit the gain and loss metrics. The main contribution of our paper is to speed up the selection process of materialized views. Concurrently, this will greatly reduce the overall cost of data warehouse query and maintenance. 相似文献

3.

Detecting redundant materialized views in data warehouse evolution

《Information Systems》2001,26(5):363-381

A data warehouse (DW) can be abstractly seen as a set of materialized views defined over a set of remote data sources. A DW is intended to satisfy a set of queries. The views materialized in a DW relate to each other in a complex manner, through common subexpressions, in order to guarantee high query performance and low view maintenance cost. DWs are time varying. As time passes new materialized views are added in order to satisfy new queries, or for performance reasons, while old queries are dropped. The evolution of a DW can result in a redundant set of materialized views. In this paper, we address the problem of detecting redundant materialized views in a given DW view selection, that is, materialized views that can be removed from DW without negatively affecting the query evaluation or the view maintenance process. Using an AND/OR dag representation for multiple queries and views, we first formalize the process of propagating source relation changes to the materialized views by exploiting common subexpressions between views and by using other materialized views that are not affected by these changes. Then, we provide an algorithm for detecting materialized views that are not needed in the process of propagating source relation changes to the DW. We also show how trivially redundant views can be identified in this process. Finally, we use these results to provide a procedure for detecting materialized views that are redundant in a DW. Our approach considers a broad class of views that includes grouping/aggregation views and is not dependent on a specific cost model. 相似文献

4.

数据仓库中物化视图选择的一种混合算法 总被引：2，自引：1，他引：2

徐海涛郑宁《计算机工程与设计》2005,26(10):2752-2755

物化视图是数据仓库中提高查询效率的有效方法,物化视图选择问题是数据仓库设计时期最重要的决定之一。通过研究和实验,提出了一种结合遗传算法和模拟退火算法的混合算法,用于解决物化视图的选择。理论分析和实验结果表明,该混合算法的搜索性能优于传统的遗传算法,能够提供更高质量的解。相似文献

5.

数据仓库自维护下视图分解系统的设计与实现

毛莉潘久辉《计算机工程与设计》2007,28(15):3800-3802

数据仓库自维护实质上是通过维护实化视图实现,然而现有的实化视图自维护策略不能有效的减少数据仓库集成端和数据源监视端的多余数据,从而影响数据仓库环境的整体响应速度.一种基于数据仓库自维护方法的视图分解系统改进了现有的视图分解模式,将全局定义的实化视图分解成局部定义的单源视图集来减少存在数据仓库中不必要的数据,实现了现有实化视图自维护策略的分解和重写,提高数据仓库自维护效率. 相似文献

6.

应用数据仓库技术实现决策支持系统 总被引：13，自引：2，他引：11

曹重英陈洛资肖锋单莹《计算机系统应用》2000,9(1):10-12

本文简述了数据仓库、联机分析处理、数据挖掘的概念和技术。提出和实现了一种利用数据仓库技术及其工具 ,结合传统DSS的四库结构 ,设计和实现决策支持系统的新方法。相似文献

7.

Applying evolutionary algorithms to materialized view selection in a data warehouse 总被引：8，自引：0，他引：8

J.-T.?Horng Email author Y.-J.?Chang B.-J.?Liu 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2003,7(8):574-581

Effective analysis of genome sequences and associated functional data requires access to many different kinds of biological information. A data warehouse [14,16] plays an important role for storage and analysis for genome sequence and functional data. A data warehouse stores lots of materialized views to provide an efficient decision-support or OLAP queries. The view-selection problem addresses to select a fittest set of materialized views from a variety of MVPPs 0 forms a challenge in data warehouse research. In this paper, we present genetic algorithm to choose materialized views. We also use experiments to demonstrate the power of our approach. We would like to thank the authors, i.e. J. Yang, K. Karlapalem, and Q. Li, of the paper [15]. In this study, we borrow their mathematical model of the work in [15]. 相似文献

8.

Atrak: a MapReduce-based data warehouse for big data

Mohammadhossein Barkhordari Mahdi Niamanesh 《The Journal of supercomputing》2017,73(10):4596-4610

As warehouse data volumes expand, single-node solutions can no longer analyze the immense volume of data. Therefore, it is necessary to use shared nothing architectures such as MapReduce. Inter-node data segmentation in MapReduce creates node connectivity issues, network congestion, improper use of node memory capacity and inefficient processing power. In addition, it is not possible to change dimensions and measures without changing previously stored data and big dimension management. In this paper, a method called Atrak is proposed, which uses a unified data format to make Mapper nodes independent to solve the data management problem mentioned earlier. The proposed method can be applied to star schema data warehouse models with distributive measures. Atrak increases query execution speed by employing node independence and the proper use of MapReduce. The proposed method was compared to established methods such as Hive, Spark-SQL, HadoopDB and Flink. Simulation results confirm improved query execution speed of the proposed method. Using data unification in MapReduce can be used in other fields, such as data mining and graph processing. 相似文献

9.

Mapping external views to a common data model

G. Pelagatti P. Paolini G. Bracchi 《Information Systems》1978,3(2):141-151

In the context of a multilevel database management system architecture, the problem arises of translating, or mapping, operations at the user's representation level (the External Schema) into operations at the system's common logical level (the Conceptual Schema).In order to support different structured user's data models, the need for an unstructured, view-independent and yet carefully defined data model at the Conceptual Schema level is recognized.In this paper a binary model for the Conceptual Schema is illustrated through the specification of a set of primitives, and the elements of a language for the definition of a binary Schema and of the corresponding operations are given.Procedures are then illustrated for translating external into conceptual operations through exploitation of the primitives of the binary model. Two types of mapping specifications are illustrated: the operational mapping, in which the translation is explicitly defined, and the structural mapping, in which only the structural correspondences between elements of the External and the Conceptual Schema are defined. The automatic mapping between n-ary relational views and the binary Conceptual Schema is finally discussed. 相似文献

10.

Seven steps to optimizing data warehouse performance

《Computer》2001,34(12):76-79

Most operational systems store data in a normalized model in which certain rules eliminate redundancy and simplify data relationships. While beneficial for the online transaction processing workload, this model can inhibit those same OLTP databases from running analytical queries effectively. Because the analytical systems did not need to support the OLTP workload, many developers began preplanning for the answer sets. Preplanning, however, created problems in four areas: creating summary tables of preaggregated data, placing indexes in the system to eliminate scanning large data volumes, putting data into one table instead of having tables that join together, and storing the data in sorted order. All these activities require prior knowledge of the analysis and reports being requested. Unfortunately, most data warehouse implementations ignore the longer-term goals of analysis and flexibility in the rush to provide initial value. Taking time to consider the project's real purpose, then building a correct foundation for it, can assure a better future for the data warehouse. To meet user demands for more timely and flexible analysis, companies can use a step-by-step approach to move from maintaining detailed information to using summary-level data 相似文献

11.

The case for building a data warehouse

《IT Professional》2001,3(4):31-34

Unfortunately, IT shops in many organizations lack data warehousing expertise because they have deployed existing IT staff resources to address operational source systems and enterprise resource planning (ERP) systems. Because many strategic business solutions depend heavily on a solid data warehouse foundation, many organizations will find themselves lagging behind or out of business, if they do not implement a data warehouse. The paper discusses the benefits and disadvantages of data warehousing. It considers the keys to implementation success along with a few real world examples 相似文献

12.

数据仓库中重复记录清理算法研究

钟嘉庆张义芳卢志刚《微型机与应用》2009,28(7)

针对重复记录清理中的"排序、识别、合并"算法存在的问题进行了改进.改进后的重复记录清理算法在保证记录匹配率的情况下有效地提高了记录排序的效率;在重复记录识别时,考虑了匹配字段的文字数量、在2个字段中出现的频率、在记录中各字段的重要性(权重)、中文字段的语义和语义重点偏后等5个因素;合并重复记录时采用了聚类和实用算法并用的策略,有效地提高了数据仓库中重复记录清理算法的准确性和健壮性. 相似文献

13.

Safeguarding the data warehouse

Christopher Harmon 《Computer Fraud & Security》1998,1998(6):16-19

Data warehousing and the Internet are regarded as the two most important changes to the information systems landscape in the past few years. Although they are agents of powerful change, their deployment presents many challenges and opportunities. Here is outlined one set of challenges that has emerged with the convergence of the Internet and data warehousing security. 相似文献

14.

Evaluating data warehouse toolkits

Oates J. 《Software, IEEE》1998,15(1):52-54

Historically, building a data warehouse has been a risky proposition. According to the Gartner Group and similar sources, the failure rate for data warehouse projects runs as high as 60 percent. The dilemma facing businesses today is this: although implementing an enterprise-wide database is risky, an organization that does so gains a significant business advantage over competitors who do not have data warehouse capability or who continue to rely on localized, group-specific data marts. The author discusses the evaluation of data warehouse toolkits and considers critical toolkit components 相似文献

15.

数据仓库的样本模型 总被引：1，自引：1，他引：1

宋卫林徐惠民《计算机工程与设计》2004,25(2):220-222

提出了数据仓库即样本库的观点,给出了用于数据仓库的概念设计样本模型(SM)。SM以事件为中心,每类事件即是数据仓库的一个主题,通过对参与事件的实体及其上关系(层次、分类和组合)的描述定义事件的表示空间。相似文献

16.

构建数据仓库实例 总被引：7，自引：0，他引：7

王骏《计算机工程与设计》2006,27(19):3663-3665,3712

介绍了数据仓库技术的概念和内容,以数据仓库D系统的构建为实例,对建立数据仓库系统的理论和实践上的研究.分析了W企业的实际情况,设计出了适合企业需求的系统整体结构.对确定系统开发模式,以及设计数据模型的方法和过程加以描述.以数据仓库系统建设中的数据的提取、加载和联机分析处理（OLAP）为重点,对数据仓库的设计开发过程进行了详细的阐述. 相似文献

17.

Decision support queries on a tape-resident data warehouse

《Information Systems》2005,30(2):133-149

Data warehouses collect masses of operational data, allowing analysts to extract information by issuing decision support queries on the otherwise discarded data. In many application areas (e.g. telecommunications), the warehoused data sets are multiple terabytes in size. Parts of these data sets are stored on very large disk arrays, while the remainder is stored on tape-based tertiary storage (which is one to two orders of magnitude less expensive than on-line storage). However, the inherently sequential nature of access to tape-based tertiary storage makes the efficient access to tape-resident data difficult to accomplish through conventional databases.In this paper, we present a way to make access to a massive tape-resident data warehouse easy and efficient. Ad hoc decision support queries usually involve large scale and complex aggregation over the detail data. These queries are difficult to express in SQL, and frequently require self-joins on the detail data (which are prohibitively expensive on the disk-resident data and infeasible to compute on tape-resident data), or unnecessary multiple passes through the detail data. An extension to SQL, the extended multi feature SQL (EMF SQL) expresses complex aggregation computations in a clear manner without using self-joins. The detail data in a data warehouse usually represents a record of past activities, and therefore is temporal. We show that complex queries involving sequences can be easily expressed in EMF SQL. An EMF SQL query can be optimized to minimize the number of passes through the detail data required to evaluate the query, in many cases to only one pass. We describe an efficient query evaluation algorithm along with a query optimization algorithm that minimizes the number of passes through the detail data, and which minimizes the amount of main memory required to evaluate the query. These algorithms are useful not only in the context of tape-resident data warehouses but also in data stream systems which require similar processing techniques. 相似文献

18.

谈数据仓库建设中的ETL过程 总被引：2，自引：0，他引：2

张云《计算机系统应用》2005,14(8):77-79

本文介绍了数据仓库建设中的ETL过程,包括ETL的概念、目标以及如何正确实施ETL以保证数据仓库成功。相似文献

19.

数据仓库技术在呼叫中心中的应用

谢燕郑有才张立勇杜军朝《计算机工程与设计》2006,27(21):4150-4152,4160

针对现有的呼叫中心中存在历史数据量大、分析和处理数据能力不足、导致企业决策缺乏数据支持的问题,在对呼叫中心和数据仓库技术研究的基础上,结合呼叫中心的呼叫管理系统的特点,设计了呼叫中心数据仓库的体系结构,详细论述了对该数据仓库的体系结构、逻辑模型、物理模型以及联机分析处理（OLAP）系统的设计和具体实现方案。相似文献

20.

数据仓库技术在工程项目中的实现

项天成崔德光《计算机工程与设计》2002,23(5):21-23,32

简单介绍了数据仓库的基本概念和特点，结合华北空管局流量管理系统，对数据仓库的体系结构作了剖析，并在总结系统开发时碰到的技术问题的基础上，对一些问题提出了解决方案。相似文献