期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Reconciling requirement-driven data warehouses with data sources via multidimensional normal forms

Jose-Norberto Juan Jens 《Data & Knowledge Engineering》2007,63(3):725-751

Successful data warehouse (DW) design needs to be based upon a requirement analysis phase in order to adequately represent the information needs of DW users. Moreover, since the DW integrates the information provided by data sources, it is also crucial to take these sources into account throughout the development process to obtain a consistent reconciliation of data sources and information needs. In this paper, we start by summarizing our approach to specify user requirements for data warehouses and to obtain a conceptual multidimensional model capturing these requirements. Then, we make use of the multidimensional normal forms to define a set of Query/View/Transformation (QVT) relations to assure that the conceptual multidimensional model obtained from user requirements agrees with the available data sources that will populate the DW. Thus, we propose a hybrid approach to develop DWs, i.e., we firstly obtain the conceptual multidimensional model of the DW from user requirements and then we verify and enforce its correctness against data sources by using a set of QVT relations based on multidimensional normal forms. Finally, we provide some snapshots of the CASE tool we have used to implement our QVT relations. 相似文献

2.

A trace metamodel proposal based on the model driven architecture framework for the traceability of user requirements in data warehouses

Alejandro Maté Juan Trujillo 《Information Systems》2012

The complexity of the data warehouse (DW) development process requires to follow a methodological approach in order to be successful. A widely accepted approach for this development is the hybrid one, in which requirements and data sources must be accommodated to a new DW model. The main problem is that we lose the relationships between requirements, elements in the multidimensional (MD) conceptual models and data sources in the process, since no traceability is explicitly specified. Therefore, this hurts requirements validation capability and increases the complexity of Extraction, Transformation and Loading processes. In this paper, we propose a novel trace metamodel for DWs and focus on the relationships between requirements and MD conceptual models. We propose a set of Query/View/Transformation rules to include traceability in DWs in an automatic way, allowing us to obtain a MD conceptual model of the DW, as well as a trace model. Therefore, we are able to trace every requirement to the MD elements, further increasing user satisfaction. Finally, we show the implementation in our Lucentia BI tool. 相似文献

3.

Tracing conceptual models' evolution in data warehouses by using the model driven architecture

《Computer Standards & Interfaces》2014,36(5):831-843

Developing a data warehouse is an ongoing task where new requirements are constantly being added. A widely accepted approach for developing data warehouses is the hybrid approach, where requirements and data sources must be accommodated to a reconciliated data warehouse model. During this process, relationships between conceptual elements specified by user requirements and those supplied by the data sources are lost, since no traceability mechanisms are included. As a result, the designer wastes additional time and effort to update the data warehouse whenever user requirements or data sources change. In this paper, we propose an approach to preserve traceability at conceptual level for data warehouses. Our approach includes a set of traces and their formalization, in order to relate the multidimensional elements specified by user requirements with the concepts extracted from data sources. Therefore, we can easily identify how changes should be incorporated into the data warehouse, and derive it according to the new configuration. In order to minimize the effort required, we define a set of general Query/View/Transformation rules to automate the derivation of traces along with data warehouse elements. Finally, we describe a CASE tool that supports our approach and provide a detailed case study to show the applicability of the proposal. 相似文献

4.

Development of Secure XML Data Warehouses with QVT

《Information and Software Technology》2013,55(9):1651-1677

ContextData warehouses are systems which integrate heterogeneous sources to support the decision making process. Data from the Web is becoming increasingly more important as sources for these systems, which has motivated the extensive use of XML to facilitate data and metadata interchange among heterogeneous data sources from the Web and the data warehouse. However, the business information that data warehouses manage is highly sensitive and must, therefore, be carefully protected. Security is thus a key issue in the design of data warehouses, regardless of the implementation technology. It is important to note that the idiosyncrasy of the unstructured and semi-structured data requires particular security rules that have been specifically tailored to these systems in order to permit their particularities to be captured correctly. Unfortunately, although security issues have been considered in the development of traditional data warehouses, current research lacks approaches with which to consider security when the target platform is based on XML technology.ObjectiveWe shall focus on defining transformations to obtain a secure XML Schema from the conceptual multidimensional model of a data warehouse.MethodWe have first defined the rationale behind the transformation rules and how they have been developed in natural language, and we have then established them clearly and formally by using the QVT language. Finally, in order to validate our proposal we have carried out a case study.ResultsWe have proposed an approach for the model driven development of Secure XML Data Warehouses, defining a set of QVT transformation rules.ConclusionThe main benefit of our proposal is that it is possible to model security requirements together with the conceptual model of the data warehouse during the early stages of a project, and automatically obtain the corresponding implementation for XML. 相似文献

5.

A requirement-driven approach to the design and evolution of data warehouses

《Information Systems》2014

Designing data warehouse (DW) systems in highly dynamic enterprise environments is not an easy task. At each moment, the multidimensional (MD) schema needs to satisfy the set of information requirements posed by the business users. At the same time, the diversity and heterogeneity of the data sources need to be considered in order to properly retrieve needed data. Frequent arrival of new business needs requires that the system is adaptable to changes. To cope with such an inevitable complexity (both at the beginning of the design process and when potential evolution events occur), in this paper we present a semi-automatic method called ORE, for creating DW designs in an iterative fashion based on a given set of information requirements. Requirements are first considered separately. For each requirement, ORE expects the set of possible MD interpretations of the source data needed for that requirement (in a form similar to an MD schema). Incrementally, ORE builds the unified MD schema that satisfies the entire set of requirements and meet some predefined quality objectives. We have implemented ORE and performed a number of experiments to study our approach. We have also conducted a limited-scale case study to investigate its usefulness to designers. 相似文献

6.

Pre-physical data base design heuristics

《Information & Management》1995,28(6):351-359

Data base design encompasses both business and technical aspects. Conceptual data modelling creates a conceptual schema to abstract the user view of data within the business context. This conceptual schema is mapped to the logical data model structure in order to obtain a set of normalized relations. These ensure the integrity of the data by avoiding update anomalies. Regardless of the methodology used, the process or transaction model will impact on the data base design process and its refinement. Since the process or transaction model will reflect the business requirements and specific users view of the data, it will determine the relevance of having a fully normalized set of data, or the need for trading off some degree of normalization, with the aim of improving performance in the query process. The model will also indicate the activity requirements on the relations in the logical data model, their frequency, and characteristics. A pre-physical design step is described, and a set of heuristics is proposed in order to obtain a refined data base design. 相似文献

7.

A UML profile for multidimensional modeling in data warehouses

Sergio Juan Il-Yeol 《Data & Knowledge Engineering》2006,59(3):725-769

The multidimensional (MD) modeling, which is the foundation of data warehouses (DWs), MD databases, and On-Line Analytical Processing (OLAP) applications, is based on several properties different from those in traditional database modeling. In the past few years, there have been some proposals, providing their own formal and graphical notations, for representing the main MD properties at the conceptual level. However, unfortunately none of them has been accepted as a standard for conceptual MD modeling.

In this paper, we present an extension of the Unified Modeling Language (UML) using a UML profile. This profile is defined by a set of stereotypes, constraints and tagged values to elegantly represent main MD properties at the conceptual level. We make use of the Object Constraint Language (OCL) to specify the constraints attached to the defined stereotypes, thereby avoiding an arbitrary use of these stereotypes. We have based our proposal in UML for two main reasons: (i) UML is a well known standard modeling language known by most database designers, thereby designers can avoid learning a new notation, and (ii) UML can be easily extended so that it can be tailored for a specific domain with concrete peculiarities such as the multidimensional modeling for data warehouses. Moreover, our proposal is Model Driven Architecture (MDA) compliant and we use the Query View Transformation (QVT) approach for an automatic generation of the implementation in a target platform. Throughout the paper, we will describe how to easily accomplish the MD modeling of DWs at the conceptual level. Finally, we show how to use our extension in Rational Rose for MD modeling. 相似文献

8.

An architecture for automatically developing secure OLAP applications from models

《Information and Software Technology》2015

ContextDecision makers query enterprise information stored in Data Warehouses (DW) by using tools (such as On-Line Analytical Processing (OLAP) tools) which use specific views or cubes from the corporate DW or Data Marts, based on the multidimensional modeling. Since the information managed is critical, security constraints have to be correctly established in order to avoid unauthorized accesses.ObjectiveIn previous work we have defined a Model-Driven based approach for developing a secure DWs repository by following a relational approach. Nevertheless, is also important to define security constraints in the metadata layer that connects the DWs repository with the OLAP tools, that is, over the same multidimensional structures that final users manage. This paper defines a proposal to develop secure OLAP applications and incorporates it into our previous approach.MethodOur proposal is composed of models and transformations. Our models have been defined using the extension capabilities from UML (conceptual model) and extending the OLAP package of CWM with security (logical model). Transformations have been defined by using a graphical notation and implemented into QVT and MOFScript. Finally, this proposal has been evaluated through case studies.ResultsA complete MDA architecture for developing secure OLAP applications. The main contributions of this paper are: improvement of a UML profile for conceptual modeling; definition of a logical metamodel for OLAP applications; and definition and implementation of transformations from conceptual to logical models, and from logical models to the secure implementation into a specific OLAP tool (SSAS).ConclusionOur proposal allows us to develop secure OLAP applications, providing a complete MDA architecture composed of several security models and automatic transformations towards the final secure implementation. Security aspects are early identified and fitted into a most robust solution that provides us a better information assurance and a saving of time in maintenance. 相似文献

9.

Hybrid methodology for data warehouse conceptual design by UML schemas

Francesco Di Tria Ezio Lefons Filippo Tangorra 《Information and Software Technology》2012,54(4):360-379

Context

Data warehouse conceptual design is based on the metaphor of the cube, which can be derived from either requirement-driven or data-driven methodologies. Each methodology has its own advantages. The first allows designers to obtain a conceptual schema very close to the user needs but it may be not supported by the effective data availability. On the contrary, the second ensures a perfect traceability and consistence with the data sources—in fact, it guarantees the presence of data to be used in analytical processing—but does not preserve from missing business user needs. To face this issue, the necessity emerged in the last years to define hybrid methodologies for conceptual design.

Objective

The objective of the paper is to use a hybrid methodology based on different multidimensional models in order to gather all advantages of each of them.

Method

The proposed methodology integrates the requirement-driven strategy with the data-driven one, in that order, possibly performing alterations of functional dependencies on UML multidimensional schemas reconciled with data sources.

Results

As case study, we illustrate how our methodology can be applied to the university environment. Furthermore, we evaluate quantitatively the benefits of this methodology by comparing it with some popular and conventional methodologies.

Conclusion

In conclusion, we highlight how the hybrid methodology improves the conceptual schema quality. Finally, we outline our present work devoted to introduce automatic design techniques in the methodology on the basis of the logical programming. 相似文献

10.

Extending OCL for OLAP querying on conceptual multidimensional models of data warehouses

Jesús Pardillo Jose-Norberto Mazón 《Information Sciences》2010,180(5):584-5028

The development of data warehouses begins with the definition of multidimensional models at the conceptual level in order to structure data, which will facilitate decision makers with an easier data analysis. Current proposals for conceptual multidimensional modelling focus on the design of static data warehouse structures, but few approaches model the queries which the data warehouse should support by means of OLAP (on-line analytical processing) tools. OLAP queries are, therefore, only defined once the rest of the data warehouse has been implemented, which prevents designers from verifying from the very beginning of the development whether the decision maker will be able to obtain the required information from the data warehouse. This article presents a solution to this drawback consisting of an extension to the object constraint language (OCL), which has been developed to include a set of predefined OLAP operators. These operators can be used to define platform-independent OLAP queries as a part of the specification of the data warehouse conceptual multidimensional model. Furthermore, OLAP tools require the implementation of queries to assure performance optimisations based on pre-aggregation. It is interesting to note that the OLAP queries defined by our approach can be automatically implemented in the rest of the data warehouse, in a coherent and integrated manner. This implementation is supported by a code-generation architecture aligned with model-driven technologies, in particular the MDA (model-driven architecture) proposal. Finally, our proposal has been validated by means of a set of sample data sets from a well-known case study. 相似文献

11.

Model-driven multidimensional modeling of secure data warehouses

Eduardo Fernández-Medina Juan Trujillo Mario Piattini 《欧洲信息系统杂志》2007,16(4):374-389

Data Warehouses (DW), Multidimensional (MD) databases, and On-Line Analytical Processing (OLAP) applications provide companies with many years of historical information for the decision-making process. Owing to the relevant information managed by these systems, they should provide strong security and confidentiality measures from the early stages of a DW project in the MD modeling and enforce them. In the last years, there have been some proposals to accomplish the MD modeling at the conceptual level. Nevertheless, none of them considers security measures as an important element in their models, and therefore, they do not allow us to specify confidentiality constraints to be enforced by the applications that will use these MD models. In this paper, we present an Access Control and Audit (ACA) model for the conceptual MD modeling. Then, we extend the Unified Modeling Language (UML) with this ACA model, representing the security information (gathered in the ACA model) in the conceptual MD modeling, thereby allowing us to obtain secure MD models. Moreover, we use the OSCL (Object Security Constraint Language) to specify our ACA model constraints, avoiding in this way an arbitrary use of them. Furthermore, we align our approach with the Model-Driven Architecture, the Model-Driven Security and the Model-Driven Data Warehouse, offering a proposal highly compatible with the more recent technologies. 相似文献

12.

基于元路径嵌入的移动应用需求偏好分析方法

宋蕊李童董鑫丁治明《计算机研究与发展》2021,58(4):749-762

随着互联网和移动应用平台的快速发展,围绕移动应用所产生的海量用户数据已经成为精确分析用户需求偏好的重要数据源.尽管已有不少学者从这些数据中分析和挖掘用户需求,但现有的方法通常只研究了数据的少数维度的特征,未能有效地挖掘多维移动应用信息以及他们之间的关联.提出一种基于元路径嵌入的移动应用需求偏好分析方法,能够为用户进行个性化移动应用推荐.具体地,首先分析移动应用的文本信息中的语义主题,挖掘用户需求偏好的分析维度.其次,将移动应用信息的语义特征构建了一个融合移动应用多维信息的概念模型,涵盖了能够表征用户需求偏好的多维度数据.基于概念模型的语义,设计了一组有意义的元路径集合,以精确地捕捉用户需求偏好的语义.最后,通过使用元路径嵌入技术进行用户行为画像,进而实现个性化的移动应用推荐.使用苹果应用商店包括1507个移动应用和153501条用户评论的真实数据集进行实验评估.实验结果表明所提的方法在各指标上均优于现有模型,其中平均F1值提升0.02,平均归一化折损累计增益(normalized discounted cumulative gain,NDCG)提升0.1. 相似文献

13.

Designing data warehouses 总被引：9，自引：0，他引：9

Dimitri Timos 《Data & Knowledge Engineering》1999,31(3):279-301

A Data Warehouse (DW) is a database that collects and stores data from multiple remote and heterogeneous information sources. When a query is posed, it is evaluated locally, without accessing the original information sources. In this paper we deal with the issue of designing a DW, in the context of the relational model, by selecting a set of views to materialize in the DW. First, we briefly present a theoretical framework for the DW design problem, which concerns the selection of a set of views that (a) fit in the space allocated to the DW, (b) answer all the queries of interest, and (c) minimize the total query evaluation and view maintenance cost. We then formalize the DW design problem as a state space search problem by taking into account multiquery optimization over the maintenance queries (i.e., queries that compute changes to the materialized views) and the use of auxiliary views for reducing the view maintenance cost. Finally, incremental algorithms and heuristics for pruning the search space are presented. 相似文献

14.

The data warehouse virtualization framework for operational business intelligence

Farrah Farooq 《Expert Systems》2013,30(5):451-472

In order to explore the most current information and react faster to changing business conditions, organizations consider real‐time data warehousing a powerful technique to achieve operational business intelligence (BI). We propose in this paper a novel real‐time data warehouse (RTDW) framework based on the virtualization concept. Our approach introduces a conceptual modelling technique, known as ring modelling, for real‐time data management and multidimensional analysis. This technique produces a flexible semi‐structured data model that accommodates unknown business process data and relationships as they evolve, handles schema changes and aggregate‐management efficiently, and scales well with the large size of increasing data volumes. With the help of a telecommunication business example, We evaluated our proposed approach in an extensive experimental study where we compared our approach Ring Model with existing structured multidimensional conceptual models (MCMs), i.e. relational OLAP and multidimensional OLAP, and with semi‐structured MCM, i.e. XML Cubes, in terms of scalability, data storage estimations, data updates loading time, and query response times. Our performance results show that encouraging speedups are achieved. 相似文献

15.

Developing secure data warehouses with a UML extension

Eduardo Fernández-Medina Juan Trujillo Rodolfo Villarroel Mario Piattini 《Information Systems》2007

Data Warehouses (DWs), Multidimensional (MD) Databases, and On-Line Analytical Processing Applications are used as a very powerful mechanism for discovering crucial business information. Considering the extreme importance of the information managed by these kinds of applications, it is essential to specify security measures from the early stages of the DW design in the MD modeling process, and enforce them. In the past years, some proposals for representing main MD modeling properties at the conceptual level have been stated. Nevertheless, none of these proposals considers security issues as an important element in its model, so they do not allow us to specify confidentiality constraints to be enforced by the applications that will use these MD models. In this paper, we will discuss the specific confidentiality problems regarding DWs as well as present an extension of the Unified Modeling Language for specifying security constraints in the conceptual MD modeling, thereby allowing us to design secure DWs. One key advantage of our approach is that we accomplish the conceptual modeling of secure DWs independently of the target platform where the DW has to be implemented, allowing the implementation of the corresponding DWs on any secure commercial database management system. Finally, we will present a case study to show how a conceptual model designed with our approach can be directly implemented on top of Oracle 10g. 相似文献

16.

Supporting the design of data integration requirements during the development of data warehouses: a communication theory-based approach

Christoph Rosenkranz Roland Holten Marc Räkers Wolf Behrmann 《欧洲信息系统杂志》2017,26(1):84-115

Data warehouses (DW) form the backbone of data integration that is necessary for analytical applications, and play important roles in the information technology landscape of many industries. We introduce an approach for addressing the fundamental problem of semantic heterogeneity in the design of data integration requirements during DW development. In contrast to ontology-driven or schema-matching approaches, which propose the automatic resolution of differences ex-post, our approach addresses the core problem of data integration requirements: understanding and resolving different contextual meanings of data fields. We ground the approach firmly in communication theory and build on practices from agile software development. Besides providing relevant insights for the design of data integration requirements, our findings point to communication theory as a sound underlying foundation for a design theory of information systems development. 相似文献

17.

Efficient biased sampling for approximate clustering and outlier detection in large data sets 总被引：7，自引：0，他引：7

Kollios G. Gunopulos D. Koudas N. Berchtold S. 《Knowledge and Data Engineering, IEEE Transactions on》2003,15(5):1170-1187

We investigate the use of biased sampling according to the density of the data set to speed up the operation of general data mining tasks, such as clustering and outlier detection in large multidimensional data sets. In density-biased sampling, the probability that a given point will be included in the sample depends on the local density of the data set. We propose a general technique for density-biased sampling that can factor in user requirements to sample for properties of interest and can be tuned for specific data mining tasks. This allows great flexibility and improved accuracy of the results over simple random sampling. We describe our approach in detail, we analytically evaluate it, and show how it can be optimized for approximate clustering and outlier detection. Finally, we present a thorough experimental evaluation of the proposed method, applying density-biased sampling on real and synthetic data sets, and employing clustering and outlier detection algorithms, thus highlighting the utility of our approach. 相似文献

18.

Conceptual data modelling in theory and practice

D. Batra G.M. Marakas 《欧洲信息系统杂志》1995,4(3):185-193

Conceptual data modelling (CDM) refers to the phase of the information systems development process that involves the abstraction and representation of the real world data pertinent to an organization. When CDM is properly and rigorously performed, the delivered system is expected to be functionally richer, less error-prone, more fully attuned to meet user needs, more able to adjust to changing user requirements and less expensive. However, there is little evidence that conceptual data modelling for the enterprise is actually conducted. There is the feeling that the ‘corporate reality’ is much different. In many organizations, CDM is never employed. In others, it is applied in a haphazard, project-to-project basis, thus leading to considerable redundancy. The academic community has mainly focused on proposing semantic data models but has not demonstrated a rigorous basis for conceptual data modelling. Specifically, the community has failed to show how a conceptual data model can map to an accurate logical data model. It is the purpose of this paper to discuss and compare the perspectives of academic and practitioner communities regarding the application of conceptual data modelling. 相似文献

19.

Adding semantic modules to improve goal-oriented analysis of data warehouses using I-star

《Journal of Systems and Software》2014

The success rate of data warehouse (DW) development is improved by performing a requirements elicitation stage in which the users’ needs are modeled. Currently, among the different proposals for modeling requirements, there is a special focus on goal-oriented models, and in particular on the i* framework. In order to adapt this framework for DW development, we previously developed a UML profile for DWs. However, as the general i* framework, the proposal lacks modularity. This has a specially negative impact for DW development, since DW requirement models tend to include a huge number of elements with crossed relationships between them. In turn, the readability of the models is decreased, harming their utility and increasing the error rate and development time. In this paper, we propose an extension of our i* profile for DWs considering the modularization of goals. We provide a set of guidelines in order to correctly apply our proposal. Furthermore, we have performed an experiment in order to assess the validity our proposal. The benefits of our proposal are an increase in the modularity and scalability of the models which, in turn, increases the error correction capability, and makes complex models easier to understand by DW developers and non expert users. 相似文献

20.

Modeling multiple interactions with a Markov random field in query expansion for session search

下载免费PDF全文

Jingfei Li Xiaozhao Zhao Peng Zhang Dawei Song 《Computational Intelligence》2018,34(1):345-362

How to automatically understand and answer users' questions (eg, queries issued to a search engine) expressed with natural language has become an important yet difficult problem across the research fields of information retrieval and artificial intelligence. In a typical interactive Web search scenario, namely, session search, to obtain relevant information, the user usually interacts with the search engine for several rounds in the forms of, eg, query reformulations, clicks, and skips. These interactions are usually mixed and intertwined with each other in a complex way. For the ideal goal, an intelligent search engine can be seen as an artificial intelligence agent that is able to infer what information the user needs from these interactions. However, there still exists a big gap between the current state of the art and this goal. In this paper, in order to bridge the gap, we propose a Markov random field–based approach to capture dependence relations among interactions, queries, and clicked documents for automatic query expansion (as a way of inferring the information needs of the user). An extensive empirical evaluation is conducted on large‐scale web search data sets, and the results demonstrate the effectiveness of our proposed models. 相似文献