面向数据质量的ETL框架的设计与实现   总被引:1,自引:0,他引:1  
针对传统抽取-转换-装载(ETL)架构在数据质量控制方面的不足,提出一种面向数据质量管理的ETL架构.根据ETL过程的特点,设计多数据源接口模块、ETL元数据描述模块、ETL任务描述模块和数据质量控制模块等.该架构以数据质量为核心,通过建立数据分析模型,利用规则推导引擎对数据分析结果生成数据清洗方案,从而有效地对数据流进行质量评估和管理.基于该设计思想开发一个ETL工具-DQETL.DQETL采用统一建模语言进行设计,并提供友好界面对ETL过程进行集中管理.最后,结合实例阐述了在该框架下进行数据质量管理的一般步骤.  相似文献   

如何在开放、动态、复杂的Internet环境下开发网构软件是软件技术领域一个挑战性课题。从网构软件整个生命周期入手,对网构软件的形式化模型,在简单介绍抽象状态机(ASM)的基础理论之后,刻画了网构软件的构件模型,并对构件模型进行了基于ASM的形式化描述,在此基础上,将粗粒度抽象构件的精化问题转换为求解构件组合方案的问题,并在体系结构元层,提出一种双向验证方法对不同抽象程度的组合方案进行横向和纵向的验证,以保证目标系统的正确性和求精过程的正确性。以上形式化建模和双向验证方法尽可能地避免和消除了软件设计早期的错误。通过系统实验验证可以看出,该方法对网构软件的开发具有一定指导意义。  相似文献   

Computer architects have been constantly looking for new approaches to design high-performance machines. Data flow and VLSI offer two mutually supportive approaches towards a promising design for future super-computers. When very high speed computations are needed, data flow machines may be relied upon as an adequate solution in which extremely parallel processing is achieved.

This paper presents a formal analysis for data flow machines. Moreover, the following three machines are considered: (1) MIT static data flow machine; (2) TI's DDP static data flow machine; (3) LAU data flow machine.

These machines are investigated by making use of a reference model. The contributions of this paper include: (1) Developing a Data Flow Random Access Machine model (DFRAM), for first time, to serve as a formal modeling tool. Also, by making use of this model one can calculate the time cost of various static data machines, as well as the performance of these machines. (2) Constructing a practical Data Flow Simulator (DFS) on the basis of the DFRAM model. Such DFS is modular and portable and can be implemented with less sophistication. The DFS is used not only to study the performance of the underlying data flow machines but also to verify the DFRAM model.  相似文献   

薛欣  贺国平 《计算机工程与设计》2007,28(13):3031-3032,3050
层次结构设计是层次支持向量机应用中的关键问题,不同层次结构下,层次支持向量机的分类性能有很大差别.分析了层次支持向量机及其存在的问题,提出了一种有效判断类间空间分布情形的简单方法.仿真试验表明,基于该方法设计的层次支持向量机与基于传统方法设计的层次支持向量机相比,具有较高的分类精度.  相似文献   

在分析建立面向电子政务需求的数据挖掘系统必要性的基础上,从数据来源、数据结构和服务对象等方面探讨了系统的特点及设计要求;与数据挖掘流程结合,构建了一个面向电子政务数据挖掘系统的框架体系,系统主要具有数据管理、数据预处理、数据挖掘、用户界面等几大功能模块;最后分析了传统的C/S两层软件体系结构的缺点,提出了一个基于多层体系结构的系统实现方案.  相似文献   

飞行显示器数据处理单元设计与实现   总被引:1,自引:0,他引:1  
为了满足小型通用飞机对飞行显示器性能、功耗、体积、成本等多方面的要求,实现了一种基于双处理器的飞行显示器数据处理单元。首先介绍了小型通用飞机对飞行显示器的设计需求,其次详细描述了飞行显示器的系统结构和数据处理单元的软硬件实现,最后通过典型主飞行显示界面的实现,验证了数据处理单元的功能。验证结果表明,该数据处理单元具有处理能力强、集成度高、功耗低和扩展性强的特点,具有广泛的应用前景。  相似文献   

针对基于软件通信体系结构(SCA)的通信系统存在着实时性不高、冗余度较大、无法故障恢复等不足,将数据分发服务(DDS)技术引入到SCA架构中,形成了一种扩展方案.在完全保持SCA规范兼容的前提下,用DDS作为波形组件间的传输手段,有效地提高了基于SCA的通信系统的实时性,降低了系统的冗余度,实现了单个软件模块的动态重构.此方案已经在某仿真通信系统中得到了成功的应用.  相似文献   

李保珲  徐克付  张鹏  郭莉  胡玥  方滨兴 《软件学报》2016,27(6):1384-1401
虚拟机自省技术是备受学术界和工业界关注的安全方法,在入侵检测、内核完整性保护等多方面发挥了重要作用.该技术在实现过程中面临的一个核心难题是底层状态数据与所需高层语义之间的“语义鸿沟”,该难题限制了虚拟机自省技术的发展与广泛应用.为此,本文基于语义重构方式的不同将现有的虚拟机自省技术分为四类,并针对每一类自省技术中的关键问题及其相关工作进行了梳理;然后,在安全性、性能及可获取的高层语义信息量等方面对这四类方法进行了比较分析,结果显示不同方法在指定比较维度上均有较大波动范围,安全研究人员需综合考虑四类方法的特点设计满足自身需求的虚拟机自省方案.最后,本文详细介绍了虚拟机自省技术在安全领域的应用情况,并指出了该技术在安全性、实用性及透明性等方面需深入研究的若干问题.  相似文献   

介绍一种面向大数据处理数据中心应用的计算/控制/网络存储的路由交换阵列节点芯片及其所组成的安全交换阵列原型机的设计与实现;该路由交换阵列系统通过因特网远程使用软件定义网络(SDN)方式对高速安全交换网络的内部路由控制和安全等模块进行集中编程控制,满足数据中心对数据传输带宽容量的需求;同时并行计算过程中消除网络传输瓶颈,避免了数据中心网络等资源的长期占用浪费,为下一代数据中心解决方案的形成打下基础。另外还简述了其在金融交易系统领域大数据应用尝试的研究近况。  相似文献   

基于EMD与LS-SVM的风电场短期风速预测   总被引:2,自引:0,他引:2  
为了提高风电场风速短期预测的精度,提出了将经验模式分解与数据挖掘方法相结合对风速时间序列进行建模预测.对风速时间序列进行经验模式分解,使之分解为若干不同频带的本征模式分量.对不同频带的平稳分量建立相应的最小二乘支持向量机预测模型,将各模型的预测值等权求和得到最终预测值.仿真实验结果表明,风电场短期风速预测的MAPE为1.507%,提高了此类预测的精度,表明了该方法的有效性.  相似文献   

In an iterative design process, there is a large amount of engineering data to be processed. Well-managed engineering data can ensure the competitiveness of companies in the competitive market. It has been recognized that a product data model is the basis for establishing engineering database. To fully support the complete product data representation in its life cycle, an international product data representation and exchange standard, STEP, is applied to model the representation of a product. In this paper, the architecture of an engineering data management (EDM) system is described, which consists of an integrated product database. There are six STEP-compatible data models constructed to demonstrate the integratibility of EDM system using common data modeling format. These data models are product definition, product structure, shape representation, engineering change, approval, and production scheduling. These data models are defined according to the integrated resources of STEP/ISO 10303 (Parts 41-44), which support a complete product information representation and a standard data format. Thus, application systems, such as CAD/CAM and MRP systems, can interact with the EDM system by accessing the database based on the STEP data exchange standard.  相似文献   

Traditional methods on creating diesel engine models include the analytical methods like multi-zone models and the intelligent based models like artificial neural network (ANN) based models. However, those analytical models require excessive assumptions while those ANN models have many drawbacks such as the tendency to overfitting and the difficulties to determine the optimal network structure. In this paper, several emerging advanced machine learning techniques, including least squares support vector machine (LS-SVM), relevance vector machine (RVM), basic extreme learning machine (ELM) and kernel based ELM, are newly applied to the modelling of diesel engine performance. Experiments were carried out to collect sample data for model training and verification. Limited by the experiment conditions, only 24 sample data sets were acquired, resulting in data scarcity. Six-fold cross-validation is therefore adopted to address this issue. Some of the sample data are also found to suffer from the problem of data exponentiality, where the engine performance output grows up exponentially along the engine speed and engine torque. This seriously deteriorates the prediction accuracy. Thus, logarithmic transformation of dependent variables is utilized to pre-process the data. Besides, a hybrid of leave-one-out cross-validation and Bayesian inference is, for the first time, proposed for the selection of hyperparameters of kernel based ELM. A comparison among the advanced machine learning techniques, along with two traditional types of ANN models, namely back propagation neural network (BPNN) and radial basis function neural network (RBFNN), is conducted. The model evaluation is made based on the time complexity, space complexity, and prediction accuracy. The evaluation results show that kernel based ELM with the logarithmic transformation and hybrid inference is far better than basic ELM, LS-SVM, RVM, BPNN and RBFNN, in terms of prediction accuracy and training time.  相似文献   

The deterministic and probabilistic prediction of ship motion is important for safe navigation and stable real-time operational control of ships at sea. However, the volatility and randomness of ship motion, the non-adaptive nature of single predictors and the poor coverage of quantile regression pose serious challenges to uncertainty prediction, making research in this field limited. In this paper, a multi-predictor integration model based on hybrid data preprocessing, reinforcement learning and improved quantile regression neural network (QRNN) is proposed to explore the deterministic and probabilistic prediction of ship pitch motion. To validate the performance of the proposed multi-predictor integrated prediction model, an experimental study is conducted with three sets of actual ship longitudinal motions during sea trials in the South China Sea. The experimental results indicate that the root mean square errors (RMSEs) of the proposed model of deterministic prediction are 0.0254°, 0.0359°, and 0.0188°, respectively. Taking series #2 as an example, the prediction interval coverage probabilities (PICPs) of the proposed model of probability predictions at 90%, 95%, and 99% confidence levels (CLs) are 0.9400, 0.9800, and 1.0000, respectively. This study signifies that the proposed model can provide trusted deterministic predictions and can effectively quantify the uncertainty of ship pitch motion, which has the potential to provide practical support for ship early warning systems.  相似文献   

