期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

: towards the modeling, design, control and execution of ETL processes

Panos Vassiliadis Zografoula Vagena Spiros Skiadopoulos Nikos Karayannidis Timos Sellis 《Information Systems》2001,26(8)

Extraction-Transformation-loading (ETL) tools are pieces of software responsible for the extraction of data from several sources, their cleansing, customization and insertion into a data warehouse. Literature and personal experience have guided us to conclude that the problems concerning the ETL tools are primarily problems of complexity, usability and price. To deal with these problems we provide a uniform metamodel for ETL processes, covering the aspects of data warehouse architecture, activity modeling, contingency treatment and quality management. The ETL tool we have developed, namely , is capable of modeling and executing practical ETL scenarios by providing explicit primitives for the capturing of common tasks. provides three ways to describe an ETL scenario: a graphical point-and-click front end and two declarative languages: XADL (an XML variant), which is more verbose and easy to read and SADL (an SQL-like language) which has a quite compact syntax and is, thus, easier for authoring. 相似文献

2.

: towards the modeling, design, control and execution of ETL processes

Panos Vassiliadis Zografoula Vagena Spiros Skiadopoulos Nikos Karayannidis Timos Sellis 《Information Systems》2001,26(8):537-561

Extraction-Transformation-loading (ETL) tools are pieces of software responsible for the extraction of data from several sources, their cleansing, customization and insertion into a data warehouse. Literature and personal experience have guided us to conclude that the problems concerning the ETL tools are primarily problems of complexity, usability and price. To deal with these problems we provide a uniform metamodel for ETL processes, covering the aspects of data warehouse architecture, activity modeling, contingency treatment and quality management. The ETL tool we have developed, namely , is capable of modeling and executing practical ETL scenarios by providing explicit primitives for the capturing of common tasks. provides three ways to describe an ETL scenario: a graphical point-and-click front end and two declarative languages: XADL (an XML variant), which is more verbose and easy to read and SADL (an SQL-like language) which has a quite compact syntax and is, thus, easier for authoring. 相似文献

3.

A visual language‐based system for extraction–transformation–loading development

Vincenzo Deufemia Massimiliano Giordano Giuseppe Polese Genoveffa Tortora 《Software》2014,44(12):1417-1440

Data warehouse loading and refreshment is typically performed by means of complex software processes called extraction–transformation–loading (ETL). In this paper, we propose a system based on a suite of visual languages for mastering several aspects of the ETL development process, turning it into a visual programming task. The approach can be easily generalized and applied to other data integration contexts beyond data warehouses. It introduces two new visual languages that are used to specify the ETL process, which can also be represented by means of UML activity diagrams. In particular, the first visual language supports data manipulation activities, whereas the second one provides traceability information of attributes to highlight the impact of potential transformations on integrated schemas depending on them. Once the whole ETL process has been visually specified, the designer might invoke the automatic generation of an activity diagram representing a possible orchestration of it based on its dependencies. The designer can edit such a diagram to modify the proposed orchestration provided that changes do not alter data dependencies. The final specification can be translated into code that is executable on the data sources. Finally, the effectiveness of the proposed approach has been validated through a user study in which we have compared the effort needed to design an ETL process in our approach with respect to the one required with main visual approaches described in the literature.Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

4.

A Domain-Specific Metamodel for Multimedia Processing Systems 总被引：1，自引：0，他引：1

Amatriain X. 《Multimedia, IEEE Transactions on》2007,9(6):1284-1298

In this paper, we introduce 4MPS, a metamodel for multimedia processing systems. The goal of 4MPS is to offer a generic system metamodel that can be instantiated to describe any multimedia processing design. The metamodel combines the advantages of the object-oriented paradigm and metamodeling techniques with system engineering principles and graphical models of computation. 4MPS is based on the classification of multimedia processing objects into two main categories: Processing objects that operate on data and controls, and Data objects that passively hold media content. Processing objects encapsulate a method or algorithm. They also include support for synchronous data processing and asynchronous event-driven Controls as well as a configuration mechanism and an explicit life cycle state model. Data input to and output from Processing objects is done through Ports. Data objects offer a homogeneous interface to media data, and support for metaobject-like facilities such as reflection and serialization. The metamodel can be expressed in the language of graphical models of computation such as the Dataflow Networks and presents a comprehensive conceptual framework for media signal processing applications. 4MPS has its practical validation in several existing environments, including the author's CLAM framework. 相似文献

5.

A formalism to describe design patterns based on role concepts 总被引：1，自引：0，他引：1

Soon-Kyeong Kim David Carrington 《Formal Aspects of Computing》2009,21(5):397-420

相似文献

6.

A family of experiments to validate measures for UML activity diagrams of ETL processes in data warehouses

Lilia Muñoz Jose-Norberto Mazón Juan Trujillo 《Information and Software Technology》2010,52(11):1188-1203

In data warehousing, Extract, Transform, and Load (ETL) processes are in charge of extracting the data from the data sources that will be contained in the data warehouse. Their design and maintenance is thus a cornerstone in any data warehouse development project. Due to their relevance, the quality of these processes should be formally assessed early in the development in order to avoid populating the data warehouse with incorrect data. To this end, this paper presents a set of measures with which to evaluate the structural complexity of ETL process models at the conceptual level. This study is, moreover, accompanied by the application of formal frameworks and a family of experiments whose aim is to theoretical and empirically validate the proposed measures, respectively. Our experiments show that the use of these measures can aid designers to predict the effort associated with the maintenance tasks of ETL processes and to make ETL process models more usable. Our work is based on Unified Modeling Language (UML) activity diagrams for modeling ETL processes, and on the Framework for the Modeling and Evaluation of Software Processes (FMESP) framework for the definition and validation of the measures. 相似文献

7.

大数据环境下基于元模型控制的数据质量保障技术研究

杨冬菊徐晨阳《计算机工程与科学》2019,41(2):197-206

数据集成环节,越来越丰富的异构源数据给集成后数据质量的提升带来了新的挑战和困难。针对传统ETL模型在数据集成后出现的数据冗余、无效、重复、缺失、不一致、错误值及格式出错等数据质量问题,提出了基于元数据模型控制的ETL集成模型,并对数据集成过程中的各种映射规则进行了详细的定义,通过将抽取、转换、加载环节的元模型和映射机制相结合,能够有效地保证集成后数据的数据质量。提出的元模型已经应用到科技资源管理数据集成业务中。通过科技资源管理数据集成实例分析,验证了此数据集成方案能够有效地支撑大数据环境下数据仓库的构建和集成后数据质量的提升。相似文献

8.

Data generator for evaluating ETL process quality

《Information Systems》2017

Obtaining the right set of data for evaluating the fulfillment of different quality factors in the extract-transform-load (ETL) process design is rather challenging. First, the real data might be out of reach due to different privacy constraints, while manually providing a synthetic set of data is known as a labor-intensive task that needs to take various combinations of process parameters into account. More importantly, having a single dataset usually does not represent the evolution of data throughout the complete process lifespan, hence missing the plethora of possible test cases. To facilitate such demanding task, in this paper we propose an automatic data generator (i.e., Bijoux). Starting from a given ETL process model, Bijoux extracts the semantics of data transformations, analyzes the constraints they imply over input data, and automatically generates testing datasets. Bijoux is highly modular and configurable to enable end-users to generate datasets for a variety of interesting test scenarios (e.g., evaluating specific parts of an input ETL process design, with different input dataset sizes, different distributions of data, and different operation selectivities). We have developed a running prototype that implements the functionality of our data generation framework and here we report our experimental findings showing the effectiveness and scalability of our approach. 相似文献

9.

Modelling and querying geographical data warehouses

Joel da Silva Anjolina G. de Oliveira Robson N. Fidalgo Ana Carolina Salgado Valéria C. Times 《Information Systems》2010

A number of proposals for integrating geographical (Geographical Information Systems—GIS) and multidimensional (data warehouse—DW and online analytical processing—OLAP) processing are found in the database literature. However, most of the current approaches do not take into account the use of a GDW (geographical data warehouse) metamodel or query language to make available the simultaneous specification of multidimensional and spatial operators. To address this, this paper discusses the UML class diagram of a GDW metamodel and proposes its formal specifications. We then present a formal metamodel for a geographical data cube and propose the Geographical Multidimensional Query Language (GeoMDQL) as well. GeoMDQL is based on well-known standards such as the MultiDimensional eXpressions (MDX) language and OGC simple features specification for SQL and has been specifically defined for spatial OLAP environments based on a GDW. We also present the GeoMDQL syntax and a discussion regarding the taxonomy of GeoMDQL query types. Additionally, aspects related to the GeoMDQL architecture implementation are described, along with a case study involving the Brazilian public healthcare system in order to illustrate the proposed query language. 相似文献

10.

一种基于Agent和活动优先度的ETL过程并行方法

陈刚杜鑫霖曾司凤安宝冉《计算机工程与科学》2017,39(9):1594-1601

ETL是数据仓库获得高质量数据的关键环节,在数据仓库的构建和实施中占有重要地位。针对传统ETL串行执行方式的不足,提出一种基于Agent和活动优先度相结合的ETL并行执行方法。该方法计算ETL执行过程中各个活动的优先度,利用Agent理论和多线程并行计算技术实现并行执行具有相同优先度且相互间没有依赖关系的ETL活动。实验结果表明,该方法在数据量较大时具有较好的加速比,提高了ETL过程的执行效率。相似文献

11.

An algebraic semantics for MOF 总被引：1，自引：0，他引：1

Artur Boronat José Meseguer 《Formal Aspects of Computing》2010,22(3-4):269-296

In model-driven development, software artifacts are represented as models in order to improve productivity, quality, and cost effectiveness. In this area, the meta-object facility (MOF) standard plays a crucial role as a generic framework within which a wide range of modeling languages can be defined. The MOF standard aims at offering a good basis for model-driven development, providing some of the building concepts that are needed: what is a model, what is a metamodel, what is reflection in the MOF framework, and so on. However, most of these concepts are not yet fully formally defined in the current MOF standard. In this paper we define a reflective, algebraic, executable framework for precise metamodeling based on membership equational logic (mel) that supports the MOF standard. Our framework provides a formal semantics of the following notions: metamodel, model, and conformance of a model to its metamodel. Furthermore, by using the Maude language, which directly supports mel specifications, this formal semantics is executable. This executable semantics has been integrated within the Eclipse modeling framework as a plugin tool called MOMENT2. In this way, formal analyses, such as semantic consistency checks, model checking of invariants and LTL model checking, become available within Eclipse to provide formal support for model-driven development processes. 相似文献

12.

A multi-surrogate approximation method for metamodeling 总被引：2，自引：0，他引：2

Dong Zhao Deyi Xue 《Engineering with Computers》2011,27(2):139-153

Metamodeling methods have been widely used in engineering applications to create surrogate models for complex systems. In the past, the input–output relationship of the complex system is usually approximated globally using only a single metamodel. In this research, a new metamodeling method, namely multi-surrogate approximation (MSA) metamodeling method, is developed using multiple metamodels when the sample data collected from different regions of the design space are of different characteristics. In this method, sample data are first classified into clusters based on their similarities in the design space, and a local metamodel is identified for each cluster of the sample data. A global metamodel is then built using these local metamodels considering the contributions of these local metamodels in different regions of the design space. Compared with the traditional approach of global metamodeling using only a single metamodel, this MSA metamodeling method can improve the modeling accuracy considerably. Applications of this metamodeling method have also been demonstrated in this research. 相似文献

13.

移动经营分析系统中ETL的分析和设计

张勇杨昆锦王文杰《计算机工程与应用》2006,42(3):202-204,232

文章通过描述ETL原理,对比流行的各种ETL工具,针对移动数据的特点,提出了一种ETL平台的设计思想;指出了ETL平台要重点考虑的功能,并在ETL建设原则的指导下,结合移动业务实际分析和设计移动ETL平台,解决了移动ETL过程的瓶颈问题,在运行效率,日常开发和维护的方便性,应用扩展的灵活性方面得到大大的提高。相似文献

14.

结构图ETL概念模型的设计方法 总被引：2，自引：0，他引：2

下载免费PDF全文

张忠平赵瑞珍《计算机工程与应用》2009,45(6):161-164

ETL过程是数据仓库获得高质量数据的重要环节,是任何数据仓库工程不可缺少的成功因素。为了便于ETL过程的设计和维护,降低ETL过程的设计、维护代价,提出一种基于结构图的ETL概念模型设计方法,给出一个用于描述ETL过程的模型,并基于CWM完成用以存储元数据的ETL元模型的设计。通过图形化ETL过程中的元素和关联,该模型清晰直观地反映了各个源数据库与目标数据仓库的内部结构和组成、数据的来源与流向、源数据和目标数据之间的映射和转换关系,辅助设计人员更好地进行ETL过程的设计和编码实现,使整个ETL的设计与维护过程更加方便、灵活、有效。相似文献

15.

Scheduling strategies for efficient ETL execution

Anastasios Karagiannis Panos Vassiliadis Alkis Simitsis 《Information Systems》2013

Extract-transform-load (ETL) workflows model the population of enterprise data warehouses with information gathered from a large variety of heterogeneous data sources. ETL workflows are complex design structures that run under strict performance requirements and their optimization is crucial for satisfying business objectives. In this paper, we deal with the problem of scheduling the execution of ETL activities (a.k.a. transformations, tasks, operations), with the goal of minimizing ETL execution time and allocated memory. We investigate the effects of four scheduling policies on different flow structures and configurations and experimentally show that the use of different scheduling policies may improve ETL performance in terms of memory consumption and execution time. First, we examine a simple, fair scheduling policy. Then, we study the pros and cons of two other policies: the first opts for emptying the largest input queue of the flow and the second for activating the operation (a.k.a. activity) with the maximum tuple consumption rate. Finally, we examine a fourth policy that combines the advantages of the latter two in synergy with flow parallelization. 相似文献

16.

ETL workflow reparation by means of case-based reasoning

Artur?Wojciechowski Email author View author&#;s OrcID profile 《Information Systems Frontiers》2018,20(1):21-43

Data sources (DSs) being integrated in a data warehouse frequently change their structures/schemas. As a consequence, in many cases, an already deployed ETL workflow stops its execution, yielding errors. Since in big companies the number of ETL workflows may reach dozens of thousands and since structural changes of DSs are frequent, an automatic repair of an ETL workflow after such changes is of high practical importance. In our approach, we developed a framework, called E-ETL, for handling the evolution of an ETL layer. In the framework, an ETL workflow is semi-automatically or automatically (depending on a case) repaired as the result of structural changes in DSs, so that it works with the changed DSs. E-ETL supports two different repair methods, namely: (1) user defined rules, (2) and Case-Based Reasoning. In this paper, we present how Case-Based Reasoning may be applied to repairing ETL workflows. In particular, we contribute an algorithm for selecting the most suitable case for a given ETL evolution problem. The algorithm applies a technique for reducing cases in order to make them more universal and capable of solving more problems. The algorithm has been implemented in prototype E-ETL and evaluated experimentally. The obtained results are also discussed in this paper. 相似文献

17.

Assessing the use of slicing-based visualizing techniques on the understanding of large metamodels

《Information and Software Technology》2015

ContextMetamodels are cornerstones of various metamodeling activities. Such activities consist of, for instance, transforming models into code or comparing metamodels. These activities thus require a good understanding of a metamodel and/or its parts. Current metamodel editing tools are based on standard interactive visualization features, such as physical zooms.ObjectiveHowever, as soon as metamodels become large, navigating through large metamodels becomes a tedious task that hinders their understanding. So, a real need to support metamodel comprehension appears.MethodIn this work we promote the use of model slicing techniques to build interactive visualization tools for metamodels. Model slicing is a model comprehension technique inspired by program slicing. We show how the use of Kompren, a domain-specific language for defining model slicers, can ease the development of such interactive visualization features.ResultsWe specifically make four main contributions. First, the proposed interactive visualization techniques permit users to focus on metamodel elements of interest, which aims at improving the understandability. Second, these proposed techniques are developed based on model slicing, a model comprehension technique that involves extracting a subset of model elements of interest. Third, we develop a metamodel visualizer, called Explen, embedding the proposed interactive visualization techniques. Fourth, we conducted experiments. showing that Explen significantly outperforms EcoreTools, in terms of time, correctness, and navigation effort, on metamodeling tasks.ConclusionThe results of the experiments, in favor of Explen, show that improving metamodel understanding can be done using slicing-based interactive navigation features. 相似文献

18.

Weaving variability into domain metamodels

Gilles Perrouin Gilles Vanwormhoudt Brice Morin Philippe Lahire Olivier Barais Jean-Marc Jézéquel 《Software and Systems Modeling》2012,11(3):361-383

Domain-specific modeling languages (DSMLs) are the essence of MDE. A DSML describes the concepts of a particular domain in a metamodel, as well as their relationships. Using a DSML, it is possible to describe a wide range of different models that often share a common base and vary on some parts. On the one hand, some current approaches tend to distinguish the variability language from the DSMLs themselves, implying greater learning curve for DSMLs stakeholders and a significant overhead in product line engineering. On the other hand, approaches integrating variability in DSMLs lack generality and tool support. We argue that aspect-oriented modeling techniques enabling flexible metamodel composition and results obtained by the software product line community to manage and resolve variability form the pillars for a solution for integrating variability into DSMLs. In this article, we consider variability as an independent and generic aspect to be woven into the DSML. In particular, we detail how variability is woven and how to perform product line derivation. We validate our approach through the weaving of variability into two different metamodels: Ecore??widely used for DSML definition??and SmartAdapters, our aspect model weaver. These results emphasize how new abilities of the language can be provided by this means. 相似文献

19.

A schema aware ETL workflow generator

Naiqiao Du Xiaojun Ye Jianmin Wang 《Information Systems Frontiers》2014,16(3):453-471

Extract, Transform and Load (ETL) processes organized as workflows play an important role in data warehousing. As ETL workflows are usually complex, various ETL facilities have been developed to address their control-flow process modeling and execution control. To evaluate the quality of ETL facilities, Synthetic ETL workflow test cases, consisting of control-flow and data-flow aspects are needed to check ETL facility functionalities at construction time and to validate the correctness and performance of ETL facilities at run time. Although there are some synthetic workflow and data set test case generation approaches existed in literatures, little work is done to consider both aspects at the same time specifically for ETL workflow generators. To address this issue, this paper proposes a schema aware ETL workflow generator with which users can characterize their ETL workflows by various parameters and get ETL workflow test cases with control-flow of ETL activities, complied schemas and associated recordsets. Our generator consists of three steps. First, with type and ratio of individual activities and their connection characteristic parameter specification, the generator will produce ETL activities and form ETL skeleton which determine how generated activities are cooperated with each other. Second, with schema transformation characteristic parameter specification, e.g. ranges of numbers of attributes, the generator will resolve attribute dependencies and refine input/output schemas with complied attributes and their data types. In the last step, recordsets are generated following cardinality specifications. ETL workflows in specific patterns are produced in the experiment in order to show the ability of our generator. Also experiments to generate thousands of ETL workflow test cases in seconds have been done to verify the usability of the generator. 相似文献

20.

Metamodeling semantics of multiple inheritance 总被引：1，自引：0，他引：1

Roland Ducournau Jean Privat 《Science of Computer Programming》2011,76(7):555-586

Inheritance provides object-oriented programming with much of its great reusability power. When inheritance is single, its specifications are simple and everybody roughly agrees on them. In contrast, multiple inheritance yields ambiguities that have prompted long-standing debates, and no two languages agree on its specifications. In this paper, we present a semantics of multiple inheritance based on metamodeling. A metamodel is proposed which distinguishes the “identity” of properties from their “values” or “implementations”. It yields a clear separation between syntactic and semantic conflicts. The former can be solved in any language at the expense of a common syntactic construct, namely full name qualification. However, semantic conflicts require a programmer’s decision, and the programming language must help the programmer to some extent. This paper surveys the approach based on linearizations, which has been studied in depth, and proposes some extensions. As it turns out that only static typing takes full advantage of the metamodel, the interaction between multiple inheritance and static typing is also considered, especially in the context of virtual types. The solutions proposed by the various languages with multiple inheritance are compared with the metamodel results. Throughout the paper, difficulties encountered under the open-world assumption are stressed. 相似文献