首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The complexity of the data warehouse (DW) development process requires to follow a methodological approach in order to be successful. A widely accepted approach for this development is the hybrid one, in which requirements and data sources must be accommodated to a new DW model. The main problem is that we lose the relationships between requirements, elements in the multidimensional (MD) conceptual models and data sources in the process, since no traceability is explicitly specified. Therefore, this hurts requirements validation capability and increases the complexity of Extraction, Transformation and Loading processes. In this paper, we propose a novel trace metamodel for DWs and focus on the relationships between requirements and MD conceptual models. We propose a set of Query/View/Transformation rules to include traceability in DWs in an automatic way, allowing us to obtain a MD conceptual model of the DW, as well as a trace model. Therefore, we are able to trace every requirement to the MD elements, further increasing user satisfaction. Finally, we show the implementation in our Lucentia BI tool.  相似文献   

2.
Successful data warehouse (DW) design needs to be based upon a requirement analysis phase in order to adequately represent the information needs of DW users. Moreover, since the DW integrates the information provided by data sources, it is also crucial to take these sources into account throughout the development process to obtain a consistent reconciliation of data sources and information needs. In this paper, we start by summarizing our approach to specify user requirements for data warehouses and to obtain a conceptual multidimensional model capturing these requirements. Then, we make use of the multidimensional normal forms to define a set of Query/View/Transformation (QVT) relations to assure that the conceptual multidimensional model obtained from user requirements agrees with the available data sources that will populate the DW. Thus, we propose a hybrid approach to develop DWs, i.e., we firstly obtain the conceptual multidimensional model of the DW from user requirements and then we verify and enforce its correctness against data sources by using a set of QVT relations based on multidimensional normal forms. Finally, we provide some snapshots of the CASE tool we have used to implement our QVT relations.  相似文献   

3.
Successful data warehouse (DW) design needs to be based upon a requirement analysis phase in order to adequately represent the information needs of DW users. Moreover, since the DW integrates the information provided by data sources, it is also crucial to take these sources into account throughout the development process to obtain a consistent reconciliation of data sources and information needs. In this paper, we start by summarizing our approach to specify user requirements for data warehouses and to obtain a conceptual multidimensional model capturing these requirements. Then, we make use of the multidimensional normal forms to define a set of Query/View/Transformation (QVT) relations to assure that the conceptual multidimensional model obtained from user requirements agrees with the available data sources that will populate the DW. Thus, we propose a hybrid approach to develop DWs, i.e., we firstly obtain the conceptual multidimensional model of the DW from user requirements and then we verify and enforce its correctness against data sources by using a set of QVT relations based on multidimensional normal forms. Finally, we provide some snapshots of the CASE tool we have used to implement our QVT relations.  相似文献   

4.
ContextData warehouses are systems which integrate heterogeneous sources to support the decision making process. Data from the Web is becoming increasingly more important as sources for these systems, which has motivated the extensive use of XML to facilitate data and metadata interchange among heterogeneous data sources from the Web and the data warehouse. However, the business information that data warehouses manage is highly sensitive and must, therefore, be carefully protected. Security is thus a key issue in the design of data warehouses, regardless of the implementation technology. It is important to note that the idiosyncrasy of the unstructured and semi-structured data requires particular security rules that have been specifically tailored to these systems in order to permit their particularities to be captured correctly. Unfortunately, although security issues have been considered in the development of traditional data warehouses, current research lacks approaches with which to consider security when the target platform is based on XML technology.ObjectiveWe shall focus on defining transformations to obtain a secure XML Schema from the conceptual multidimensional model of a data warehouse.MethodWe have first defined the rationale behind the transformation rules and how they have been developed in natural language, and we have then established them clearly and formally by using the QVT language. Finally, in order to validate our proposal we have carried out a case study.ResultsWe have proposed an approach for the model driven development of Secure XML Data Warehouses, defining a set of QVT transformation rules.ConclusionThe main benefit of our proposal is that it is possible to model security requirements together with the conceptual model of the data warehouse during the early stages of a project, and automatically obtain the corresponding implementation for XML.  相似文献   

5.
Data warehouse loading and refreshment is typically performed by means of complex software processes called extraction–transformation–loading (ETL). In this paper, we propose a system based on a suite of visual languages for mastering several aspects of the ETL development process, turning it into a visual programming task. The approach can be easily generalized and applied to other data integration contexts beyond data warehouses. It introduces two new visual languages that are used to specify the ETL process, which can also be represented by means of UML activity diagrams. In particular, the first visual language supports data manipulation activities, whereas the second one provides traceability information of attributes to highlight the impact of potential transformations on integrated schemas depending on them. Once the whole ETL process has been visually specified, the designer might invoke the automatic generation of an activity diagram representing a possible orchestration of it based on its dependencies. The designer can edit such a diagram to modify the proposed orchestration provided that changes do not alter data dependencies. The final specification can be translated into code that is executable on the data sources. Finally, the effectiveness of the proposed approach has been validated through a user study in which we have compared the effort needed to design an ETL process in our approach with respect to the one required with main visual approaches described in the literature.Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

6.
A federation of data warehouses is understood as a set of data warehouses, which can be processed as a whole in the logic level. Physically, the federation does not gather data into one place. This paper presents a formal framework for data and knowledge processing in data warehouse federations. The management system for a data warehouse federation consists of an user interface enabling presentation of user queries, a program for query decomposition and a program for integrating knowledge coming from different data warehouses as the answers to a user query. We propose a model for query decomposition process and knowledge integration. It contains also the algorithm for knowledge inconstancy processing. This kind of inconsistency often occurs since very often the knowledge extracted from different data warehouses refers to the same subject, but is not consistent.  相似文献   

7.
The development of data warehouses begins with the definition of multidimensional models at the conceptual level in order to structure data, which will facilitate decision makers with an easier data analysis. Current proposals for conceptual multidimensional modelling focus on the design of static data warehouse structures, but few approaches model the queries which the data warehouse should support by means of OLAP (on-line analytical processing) tools. OLAP queries are, therefore, only defined once the rest of the data warehouse has been implemented, which prevents designers from verifying from the very beginning of the development whether the decision maker will be able to obtain the required information from the data warehouse. This article presents a solution to this drawback consisting of an extension to the object constraint language (OCL), which has been developed to include a set of predefined OLAP operators. These operators can be used to define platform-independent OLAP queries as a part of the specification of the data warehouse conceptual multidimensional model. Furthermore, OLAP tools require the implementation of queries to assure performance optimisations based on pre-aggregation. It is interesting to note that the OLAP queries defined by our approach can be automatically implemented in the rest of the data warehouse, in a coherent and integrated manner. This implementation is supported by a code-generation architecture aligned with model-driven technologies, in particular the MDA (model-driven architecture) proposal. Finally, our proposal has been validated by means of a set of sample data sets from a well-known case study.  相似文献   

8.

To make informed decisions, managers establish data warehouses that integrate multiple data sources. However, the outcomes of the data warehouse-based decisions are not always satisfactory due to low data quality. Although many studies focused on data quality management, little effort has been made to explore effective data quality control strategies for the data warehouse. In this study, we propose a chance-constrained programming model that determines the optimal strategy for allocating the control resources to mitigate the data quality problems of the data warehouse. We develop a modified Artificial Bee Colony algorithm to solve the model. Our work contributes to the literature on evaluation of data quality problem propagation in data integration process and data quality control on the data sources that make up the data warehouse. We use a data warehouse in the healthcare organization to illustrate the model and the effectiveness of the algorithm.

  相似文献   

9.
Data warehouses are based on multidimensional modeling. Using On-Line Analytical Processing (OLAP) tools, decision makers navigate through and analyze multidimensional data. Typically, users need to analyze data at different aggregation levels (using roll-up and drill-down functions). Therefore, aggregation knowledge should be adequately represented in conceptual multidimensional models, and mapped in subsequent logical and physical models. However, current conceptual multidimensional models poorly represent aggregation knowledge, which (1) has a complex structure and dynamics and (2) is highly contextual. In order to account for the characteristics of this knowledge, we propose to represent it with objects (UML class diagrams) and rules in the Production Rule Representation language (PRR). Static aggregation knowledge is represented in the class diagrams, while rules represent the dynamics (i.e. how aggregation may be performed depending on context). We present the class diagrams, and a typology and examples of associated rules. We argue that this representation of aggregation knowledge enables an early modeling of user requirements in a data warehouse project. A prototype has been developed based on the Java Expert System Shell (Jess).  相似文献   

10.
Automated requirements traceability methods that utilize Information Retrieval (IR) methods to generate and maintain traceability links are often more efficient than traditional manual approaches, however the traces they generate are imprecise and significant human effort is needed to evaluate and filter the results. This paper investigates and compares three term-based enhancement methods that are designed to improve the performance of a probabilistic automated tracing tool. Empirical studies show that the enhancement methods can be effective in increasing the accuracy of the retrieved traces; however the effectiveness of each method varies according to specific project characteristics. The analysis of such characteristics has lead to the development of two new project-level metrics which can be used to predict the effectiveness of each enhancement method for a given data set. A procedure to automatically extract critical keywords and phrases from a set of traceable artifacts is also presented to enhance the automated trace retrieval algorithm. The procedure is tested on two new datasets.  相似文献   

11.
Converting XML DTDs to UML diagrams for conceptual data integration   总被引:2,自引:0,他引:2  
Extensible Markup Language (XML) is fast becoming the new standard for data representation and exchange on the World Wide Web, e.g., in B2B e-commerce. Modern enterprises need to combine data from many sources in order to answer important business questions, creating a need for integration of web-based XML data. Previous web-based data integration efforts have focused almost exclusively on the logical level of data models, creating a need for techniques that focus on the conceptual level in order to communicate the structure and properties of the available data to users at a higher level of abstraction. The most widely used conceptual model at the moment is the Unified Modeling Language (UML).

This paper presents algorithms for automatically constructing UML diagrams from XML DTDs, enabling fast and easy graphical browsing of XML data sources on the web. The algorithms capture important semantic properties of the XML data such as precise cardinalities and aggregation (containment) relationships between the data elements. As a motivating application, it is shown how the generated diagrams can be used for the conceptual design of data warehouses based on web data, and an integration architecture is presented. The choice of data warehouses and On-Line Analytical Processing as the motivating application is another distinguishing feature of the presented approach.  相似文献   


12.
Web数据仓库的异步迭代查询处理方法   总被引:2,自引:0,他引:2  
何震瀛  李建中  高宏 《软件学报》2002,13(2):214-218
数据仓库信息量的飞速膨胀对数据仓库提出了巨大挑战.如何提高Web环境下数据仓库的查询效率成为数据仓库研究领域重要的研究问题.对Web数据仓库的体系结构和查询方法进行了研究和探讨.在分析几种Web数据仓库实现方法的基础上,提出了一种Web数据仓库的层次体系结构,并在此基础上提出了Web数据仓库的异步迭代查询方法.该方法充分利用了流水线并行技术,在Web数据仓库的查询处理过程中不同层次的结点以流水线方式运行,并行完成查询的处理,提高了查询效率.理论分析表明,该方法可以有效地提高Web数据仓库的查询效率.  相似文献   

13.
Yao Liu  Hui Xiong 《Information Sciences》2006,176(9):1215-1240
A data warehouse stores current and historical records consolidated from multiple transactional systems. Securing data warehouses is of ever-increasing interest, especially considering areas where data are sold in pieces to third parties for data mining practices. In this case, existing data warehouse security techniques, such as data access control, may not be easy to enforce and can be ineffective. Instead, this paper proposes a data perturbation based approach, called the cubic-wise balance method, to provide privacy preserving range queries on data cubes in a data warehouse. This approach is motivated by the following observation: analysts are usually interested in summary data rather than individual data values. Indeed, our approach can provide a closely estimated summary data for range queries without providing access to actual individual data values. As demonstrated by our experimental results on APB benchmark data set from the OLAP council, the cubic-wise balance method can achieve both better privacy preservation and better range query accuracy than random data perturbation alternatives.  相似文献   

14.
为了构建支持企业决策分析的数据仓库,分析了传统数据仓库模型的局限性,提出了一个基于统一视图模型的数据仓库体系结构。该体系结构是在传统数据仓库模型的数据源和数据仓库之间增加一个统一标准层,并利用统一视图—资源数据和数据仓库—统一视图的两级映射,保证了数据的透明访问和模型本身良好的可用性,进而支持灵活的多数据仓库的构建。基于该体系结构,给出了统一视图模型的建立和数据仓库三层之间两级映射的方法,提出了一种新的基于统一视图模型的数据映射—抽取—装载数据仓库ETL建模过程,并开发了相应的数据仓库构建系统。应用表明,  相似文献   

15.
Providing integrated access to multiple, distributed, heterogeneous databases and other information sources has become one of the leading issues in database research and the industry. One of the most effective approaches is to extract and integrate information of interest from each source in advance and store them in a centralized repository (known as a data warehouse). When a query is posed, it is evaluated directly at the warehouse without accessing the original information sources. One of the techniques that this approach uses to improve the efficiency of query processing is materialized view(s). Essentially, materialized views are used for data warehouses, and various methods for relational databases have been developed. In this paper, we first discuss an object deputy approach to realize materialized object views for data warehouses which can also incorporate object-oriented databases. A framework has been developed using Smalltalk to prepare data for data warehousing, in which an object deputy model and database connecting tools have been implemented. The object deputy model can provide an easy-to-use way to resolve inconsistency and conflicts while preparing data for data warehousing, as evidenced by our empirical study.  相似文献   

16.
Schema Evolution in Data Warehouses   总被引:2,自引:0,他引:2  
In this paper, we address the issues related to the evolution and maintenance of data warehousing systems, when underlying data sources change their schema capabilities. These changes can invalidate views at the data warehousing system. We present an approach for dynamically adapting views according to schema changes arising on source relations. This type of maintenance concerns both the schema and the data of the data warehouse. The main issue is to avoid the view recomputation from scratch especially when views are defined from multiple sources. The data of the data warehouse is used primarily in organizational decision-making and may be strategic. Therefore, the schema of the data warehouse can evolve for modeling new requirements resulting from analysis or data-mining processing. Our approach provides means to support schema evolution of the data warehouse independently of the data sources. Received 20 March 2000 / Revised 5 January 2001 / Accepted in revised form 20 April 2001  相似文献   

17.
数据仓库系统正广泛用于联机分析处理系统,为了能将多个数据仓库集成到一起,需要解决技术上和语义上的一些问题。一种基本的解决方法是建立一种标准化的、独立于各供应商的多维数据描述格式。本文介绍一个基于XML的文档模板集——xCube,它可在任何网络上交换数据仓库数据。由于XCube被组织成模块化的形式,所以立方体的多维模式、维数据和事实数据能分步传输,因此立方体都能很容易从一个数据仓库传输到另一个数据仓库。  相似文献   

18.
19.
The traditional economic order quantity model assumes that the retailer's storage capacity is unlimited. However, as we all know, the capacity of any warehouse is limited. In practice, there usually exist various factors that induce the decision-maker of the inventory system to order more items than can be held in his/her own warehouse. Therefore, for the decision-maker, it is very practical to determine whether or not to rent other warehouses. In this article, we try to incorporate two levels of trade credit and two separate warehouses (own warehouse and rented warehouse) to establish a new inventory model to help the decision-maker to make the decision. Four theorems are provided to determine the optimal cycle time to generalise some existing articles. Finally, the sensitivity analysis is executed to investigate the effects of the various parameters on ordering policies and annual costs of the inventory system.  相似文献   

20.

Context

Data warehouse conceptual design is based on the metaphor of the cube, which can be derived from either requirement-driven or data-driven methodologies. Each methodology has its own advantages. The first allows designers to obtain a conceptual schema very close to the user needs but it may be not supported by the effective data availability. On the contrary, the second ensures a perfect traceability and consistence with the data sources—in fact, it guarantees the presence of data to be used in analytical processing—but does not preserve from missing business user needs. To face this issue, the necessity emerged in the last years to define hybrid methodologies for conceptual design.

Objective

The objective of the paper is to use a hybrid methodology based on different multidimensional models in order to gather all advantages of each of them.

Method

The proposed methodology integrates the requirement-driven strategy with the data-driven one, in that order, possibly performing alterations of functional dependencies on UML multidimensional schemas reconciled with data sources.

Results

As case study, we illustrate how our methodology can be applied to the university environment. Furthermore, we evaluate quantitatively the benefits of this methodology by comparing it with some popular and conventional methodologies.

Conclusion

In conclusion, we highlight how the hybrid methodology improves the conceptual schema quality. Finally, we outline our present work devoted to introduce automatic design techniques in the methodology on the basis of the logical programming.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号