数据仓库中数据质量控制研究   总被引:18,自引:1,他引:18  
随着数据仓库的深入应用,数据质量问题成为关系到数据仓库建设成败和数据能否有效应用的重要关键问题。该文首先讨论数据仓库环境下存在的数据质量问题以及保证数据质量的重要性,然后提出数据质量的度量和评价指标,最后给出了数据仓库实施和运行过程中数据质量控制的数据质量成熟度模型和保证仓库数据质量的方法。  相似文献   

Value creation is a major factor not only in the sustainability of organizations but also in the maximization of profit, customer retention, business goals fulfillment, and revenue. When the value is intended to be created from Big Data scenarios, value creation entails being understood over a broader range of complexity. A question that arises here is how organizations can use this massive quantity of data and create business value? The present study seeks to provide a model for creating organizational value using Big Data Analytics (BDA). To this end, after reviewing the related literature and interviewing experts, the BDA-based organizational value creation model is developed. Accordingly, five hypotheses are formulated, and a questionnaire is prepared. Then, the respective questionnaire is given to the research statistical population (i.e., IT managers and experts, particularly those specializing in data analysis) to test the research hypotheses. In next phase, connections between model variables are scrutinized using the structural equation modeling (measurement and structural models). The results of the study indicate that investigating the infrastructures of the Big Data Analytics, as well as the capabilities of the organization and those of Big Data Analytics is the initial requirement to create organizational value using BDA. Thereby, the Big Data Analytics strategy is formulated, and ultimately, the organizational value is created as well.  相似文献   

Research associated with Big Data in the Cloud will be important topic over the next few years. The topic includes work on demonstrating architectures, applications, services, experiments and simulations in the Cloud to support the cases related to adoption of Big Data. A common approach to Big Data in the Cloud to allow better access, performance and efficiency when analysing and understanding the data is to deliver Everything as a Service. Organisations adopting Big Data this way find the boundaries between private clouds, public clouds and Internet of Things (IoT) can be very thin. Volume, variety, velocity, veracity and value are the major factors in Big Data systems but there are other challenges to be resolved.The papers of this special issue address a variety of issues and concerns in Big Data, including: searching and processing Big Data, implementing and modelling event and workflow systems, visualisation modelling and simulation and aspects of social media.  相似文献   

大数据时代的到来引起了教育领域人士的关注,各高校逐步开始重视相关教学及研究.根据大数据概念及计算机学科联系紧密的特点,能够帮助学习者整理和分析信息,结合实际教学经验,引入计算机专业教学的教育模式、创新教学方法、与时俱进的学习意识等内容来探索大数据时代对计算机专业教学作用及方法,为该专业教学研究人员提供一定参考,也为培养大数据研究人才奠定基础.  相似文献   

大数据的重大意义正逐步被人们认识到。简要介绍大数据,从技术和工具、解决方案和应用案例等方面对大数据进行研究。并对大数据给计算机科学带来的若干问题进行探讨。  相似文献   

大数据具有规模大、种类多、生成速度快、价值巨大但密度低的特点。民政大数据应用就是利用数据分析的方法,从大数据中挖掘有效信息,为民政提供辅助决策,实现大数据价值的过程。主要介绍民政公共服务模型、技术框架、大数据的联机分析和大数据挖掘模型,对民政公共服务数据处理具有一定的参考价值。  相似文献   

数据是信息系统运行的基础和核心,是机构稳定发展的宝贵资源。随着信息系统数据量成几何级数增加,特别是在当前大数据环境和信息技术快速发展情况下,海量数据迁移是企业解决存储空间不足、新老系统切换和信息系统升级改造等过程中必须面对的一个现实问题。如何在业务约束条件下,快速、正确、完整地实现海量数据迁移,保障数据的完整性、一致性和继承性,是一个关键研究课题。从海量数据管理的角度,阐述了海量数据迁移方法,比较了不同数据迁移的方案特点。  相似文献   

The rapid development and extensive application of geographic information system (GIS) and the advent of the age of big data bring about the generation of multi-resources spatial data, which makes data integration and fusion share more difficult due to the differences on data source, data accuracy and data modal. Meanwhile, study for multi-resources spatial data fusion methods has an important practical significance for reducing the production cost of geographic data, accelerating the updating speed of existing geographical information and improving the quality of GIS big data. To expound the formation and developing trends of multi-resources spatial data fusion methods systematically, and on the basis of referring to lots of related technical documents both at home and abroad, this paper makes a conclusion and discussion about multi-resources spatial data fusion methods, and foresees the prospects of data fusion in big data environment, which has certain reference value for the related research work.  相似文献   

The quality of the data is directly related to the quality of the models drawn from that data. For that reason, many research is devoted to improve the quality of the data and to amend errors that it may contain. One of the most common problems is the presence of noise in classification tasks, where noise refers to the incorrect labeling of training instances. This problem is very disruptive, as it changes the decision boundaries of the problem. Big Data problems pose a new challenge in terms of quality data due to the massive and unsupervised accumulation of data. This Big Data scenario also brings new problems to classic data preprocessing algorithms, as they are not prepared for working with such amounts of data, and these algorithms are key to move from Big to Smart Data. In this paper, an iterative ensemble filter for removing noisy instances in Big Data scenarios is proposed. Experiments carried out in six Big Data datasets have shown that our noise filter outperforms the current state-of-the-art noise filter in Big Data domains. It has also proved to be an effective solution for transforming raw Big Data into Smart Data.  相似文献   

大数据的价值不仅仅局限于它的初始收集目的,而在于收集后可以用于其他用途并可重复使用。目前,包括美国在内的许多国家,都将大数据分析管理上升到国家战略层面,从国家层面通盘考虑其发展战略。  相似文献   

水塘采样算法构建了一个样本集合,很好地解决了对未知的数据元素等概率采样的问题。然而当数据量较大时,单机的水塘采样算法时间复杂度较高、机器负载加重。面向大数据采样的需求,提出了分布式水塘采样算法,通过增加采样比例的方法实现了多机器并行采样,而且从理论上证明了分布式水塘采样的等概率性。为处理对数据元素贡献度不同的数据集,将该算法改进为加权重分布式水塘采样算法。基于水塘采样,提出了一种适用于大数据的分布式采样算法,只依据线性时间和样本大小的空间,实验验证了该算法的可行性和有效性。  相似文献   

伴随智慧城市的发展,多源、异构、冗余的大数据应运而生。利用数据挖掘、决策分析等技术,实现大数据转化为数据资产,必将成为大数据时代智慧城市智能化、互联化的必要手段。阐述了智慧城市概念、智慧城市数据资源及大数据处理技术,分析了智慧城市中的大数据综合应用,阐明了大数据在智慧城市重点行业领域的应用现状,并对其进行展望,为未来智慧城市中大数据的深入应用提供解决方案和思路。  相似文献   

随着各种医疗信息系统的广泛应用与深入发展,医疗数据快速增长.医疗信息进八“大数据”时代,大数据的产生带来了较为突出的安全问题。针对提高医疗信息系统的大数据安全管理策略进行探讨,着重讨论了系统架构、备份机制、数据库纵深防御体系等方面的内容。  相似文献   

目前,我国数据科学与大数据技术专业的建设已成为新的热点话题。在系统调研世界一流大学数据科学专业建设现状的基础上,从特色课程的视角重点分析加州大学伯克利分校、约翰·霍普金斯大学、华盛顿大学、纽约大学、斯坦福大学、卡内基梅隆大学、哥伦比亚大学、伦敦城市大学共8所大学的数据科学专业,提出了数据科学与大数据技术这一新专业应重视的10门特色课程,并分析了现阶段我国数据科学教育中普遍存在的8种曲解现象及对策建议。  相似文献   

Recent evolutions in computing science and web technology provide the environmental community with continuously expanding resources for data collection and analysis that pose unprecedented challenges to the design of analysis methods, workflows, and interaction with data sets. In the light of the recent UK Research Council funded Environmental Virtual Observatory pilot project, this paper gives an overview of currently available implementations related to web-based technologies for processing large and heterogeneous datasets and discuss their relevance within the context of environmental data processing, simulation and prediction. We found that, the processing of the simple datasets used in the pilot proved to be relatively straightforward using a combination of R, RPy2, PyWPS and PostgreSQL. However, the use of NoSQL databases and more versatile frameworks such as OGC standard based implementations may provide a wider and more flexible set of features that particularly facilitate working with larger volumes and more heterogeneous data sources.  相似文献   

近年来,对水资源评价的时效性与准确性要求不断提高,数据量随之不断增长,对数据处理技术要求越来越高。为解决水资源动态评价大量数据的汇集、存储、计算等需求,深入分析水资源动态评价大数据的数据和计算需求,提出数据采集—存储计算—数据应用的分层大数据计算框架,并着重研究水资源动态评价数据的多源实时汇集技术,实现多源数据统一采集处理。针对数据分析的计算算法模型,研究 CPU 和 GPU 混合计算架构,实现机器和深度学习计算支撑。目前,研究成果已经在水利部完成部署,运行良好,可为水资源动态评价工作提供计算支撑。  相似文献   

随着大型零售企业的规模越来越大,数据传输成为大型零售企业MIS系统的核心模块,为了满足大型零售企业数据传输的不同要求,本文提出并实现了一种扩展性很强的数据传输模型.  相似文献   

Recent developments like the movements of open access and open data and the unprecedented growth of data, which has come forward as Big Data, have shifted focus to methods to effectively handle such data for use in agro-environmental research. Big Data technologies, together with the increased use of cloud based and high performance computing, create new opportunities for data intensive science in the multi-disciplinary agro-environmental domain. A theoretical framework is presented to structure and analyse data-intensive cases and is applied to three case studies, together covering a broad range of technologies and aspects related to Big Data usage. The case studies indicate that most persistent issues in the area of data-intensive research evolve around capturing the huge heterogeneity of interdisciplinary data and around creating trust between data providers and data users. It is therefore recommended that efforts from the agro-environmental domain concentrate on the issues of variety and veracity.  相似文献   

随着Web2.0技术的快速发展,社交网络、物联网、移动互联网等新兴服务行业日益涌现,Web数据呈爆炸式增长,成为炙手可热的“大数据”。Web大数据巨大的价值使得越来越多的人开始关注,如何获取Web数据并进行挖掘利用。在大数据的环境下,Web数据呈现出规模大、种类多、数据流高速性等特点,使得Web数据抽取与集成,数据分析,数据解释等方面的研究更加深入,与此同时,Web大数据的集成与挖掘仍存在着数据规模、数据多样性、数据时效性、隐私保护等方面的挑战。  相似文献   

为了解决实际问题,大数据分析处理系统需要获取数据,然而实际场景中收集到的实际数据通常不完备.另外,大多数问题的解决方案通常是由问题引导或者仅仅进行数据分析,运行参数调整和设定带有较大的盲目性,难以达到应用的智能性.为此,文中提出平行数据的概念和框架,根据实际数据经计算实验产生真正的虚拟大数据,结合默顿定律,以期待的解决方案与问题进行广义对偶,引导大数据聚焦到实际问题.实际数据与虚拟数据动态互动,平行演化,形成一个虚实相生、数据动态变化的过程,最终使数据具备智能,进而解决未知的问题.平行数据不但是一种数据表示形式,更是一种数据演化机制与方式,其特色是虚实互动,所有数据的动力学轨迹构成了数据动力学系统.平行数据为数据处理、表示、挖掘和应用提供了一个新的范式.  相似文献   

