首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
大数据时代的到来,更强的计算机和更成熟的大数据平台工具让企业从海量数据中挖掘数据价值成为了可能,尤其是基于Hadoop的大数据平台,甚至利用廉价的商业硬件处理TB、PB级别的数据. 在最初Hadoop大数据平台落地建设的过程中,往往功能先行,而忽略了安全的管控策略,直到2009年Yahoo团队提出了基于Kerberos的身份验证方案,才带动了Hadoop大数据平台安全管控工作的全面开展. 本文介绍了Hadoop大数据平台的基本历程,描述了2009年之前Hadoop大数据平台存在的传统安全问题,并尝试着将目前行业内Hadoop生态系统组件的安全性和每个组件的安全解决方案做一次系统的梳理,希望为构建Hadoop大数据平台管控方案时提供参考意见,以便合理利用先进的安全管控方案保护好企业、用户的隐私数据.  相似文献   

2.
Beyond the hype of Big Data, something within business intelligence projects is indeed changing. This is mainly because Big Data is not only about data, but also about a complete conceptual and technological stack including raw and processed data, storage, ways of managing data, processing and analytics. A challenge that becomes even trickier is the management of the quality of the data in Big Data environments. More than ever before the need for assessing the Quality-in-Use gains importance since the real contribution–business value–of data can be only estimated in its context of use. Although there exists different Data Quality models for assessing the quality of regular data, none of them has been adapted to Big Data. To fill this gap, we propose the “3As Data Quality-in-Use model”, which is composed of three Data Quality characteristics for assessing the levels of Data Quality-in-Use in Big Data projects: Contextual Adequacy, Operational Adequacy and Temporal Adequacy. The model can be integrated into any sort of Big Data project, as it is independent of any pre-conditions or technologies. The paper shows the way to use the model with a working example. The model accomplishes every challenge related to Data Quality program aimed for Big Data. The main conclusion is that the model can be used as an appropriate way to obtain the Quality-in-Use levels of the input data of the Big Data analysis, and those levels can be understood as indicators of trustworthiness and soundness of the results of the Big Data analysis.  相似文献   

3.
在结合了入侵检测系统和犯罪取证系统的基础上,提出了一种基于数据挖掘的入侵检测取证系统。当遭受入侵时,即起到防护作用,同时又实时地收集证据。数据挖掘技术分别应用于入侵检测部分和取证部分,使得检测模型的生成、分发自动化、智能化,提高了检测效率;使得取证系统数据分析速度得到了提高。  相似文献   

4.
朱慧雯  田骏  张涛  蒋卫祥 《软件》2020,(3):99-101
针对目前互联网招聘市场的兴起,大数据分析可以有效的帮助用户了解目前最热门的职业以及与职业相关的信息,提出了利用Hadoop-SpringMVC-Vue前后端分离的技术架构,进行技术方案的设计。本文首先分析了互联网大数据的招聘数据智能分析平台的需求;其次应用Hadoop集群搭建大数据平台,SpringMVC框架设计系统的架构,Vue框架设计项目前端;最后介绍了系统的实现。SpringMVC框架能简化互联网大数据的招聘数据智能分析平台的开发,可以有效的降低各层之间的耦合度,提高系统的可维护性。  相似文献   

5.
海量结构化数据存储检索系统   总被引:4,自引:0,他引:4  
Big Data是近年在云计算领域中出现的一种新型数据,传统关系型数据库系统在数据存储规模、检索效率等方面不再适用.目前的分布式No-SQL数据库可以提供分布式数据存储环境,但是无法支持多列查询.设计并实现分布式海量结构化数据存储检索系统(MDSS).系统采用列存储结构,采用集中分布式B+Tree索引和局部索引相结合的方法提高检索效率.在此基础上讨论复杂查询条件的任务分解机制,支持大数据的多属性检索、模糊检索以及统计分析等查询功能.实验结果表明,提出的分布式结构化数据管理技术和查询任务分解机制可以显著提高分布式条件下大数据集的查询效率,适合应用在日志类数据、流记录数据等海量结构化数据的存储应用场合.  相似文献   

6.
7.
李敏  倪少权  邱小平  黄强 《计算机应用》2015,35(5):1267-1272
针对物联网环境下异构大数据处理实时性低的问题,探讨了基于Hadoop框架实现数据处理与持久化的方法,提出了一种基于"上下文"的Hadoop大数据处理系统模型HDS,HDS利用Hadoop框架完成数据并行处理与持久化,将物联网环境下异构数据抽象为"上下文"作为HDS处理对象;并提出了"上下文距离"上下文邻域系统(CNS)"的定义;对于Hadoop框架本身数据处理实时性不高的问题,HDS在设计上增加了"上下文队列(CQ)"作为辅助存储来提高数据处理实时性;利用"上下文"的时空特性,建立了用户请求"上下文邻域系统"对任务进行重组.以成品油配送车辆调度问题为例,利用MapReduce并行实验对HDS的数据处理与实时性能进行了验证与分析.实验结果表明,在物联网环境下,HDS不仅在大数据处理性能上较传统单点处理模型(SDS)具有明显优势,在实验环境中10台服务器的情况下,其计算性能能够超过SDS 200倍以上;同时也验证了CQ作为辅助存储能够有效提高数据处理实时性,在10台服务器环境下,其数据处理实时性能够提高270倍以上.  相似文献   

8.
In this paper, we present a Big Data analysis paradigm related to smart cities using cloud computing infrastructures. The proposed architecture follows the MapReduce parallel model implemented using the Hadoop framework. We analyse two case studies: a quality-of-service assessment of public transportation system using historical bus location data, and a passenger-mobility estimation using ticket sales data from smartcards. Both case studies use real data from the transportation system of Montevideo, Uruguay. The experimental evaluation demonstrates that the proposed model allows processing large volumes of data efficiently.  相似文献   

9.
地理信息数据仓库的技术研究   总被引:21,自引:0,他引:21       下载免费PDF全文
分析了地理信息系统(GIS)的数据管理,在描述数据仓库技术和实质和特征的基础上,论述地理信息系统数据仓库的功能,特点,给出GIS数据仓库基本体系结构及基于数据仓库的GIS资源共享模式,着重叙述其关键技术。  相似文献   

10.
Query optimization in Big Data becomes a promising research direction due to the popularity of massive data analytical systems such as Hadoop system. The query optimization is getting hard to efficiently execute JOIN queries on top of Hadoop query language, Hive, over limited Big Data storages. According to our previous work, HiveQL Optimization for JOIN query over Multi-session Environment (HOME) system has been introduced over Hadoop system to improve its performance by storing the intermediate results to avoid repeated computations. Time overheads and Big Data storages limitation are considered the main drawback of the HOME system, especially in the case of using additional physical storages or renting extra virtualized storages. In this paper, an index-based system for reusing data called indexing HiveQL Optimization for JOIN over Multi-session Big Data Environment (iHOME) is proposed to overcome HOME overheads by storing only the indexes of the joined rows instead of storing the full intermediate results directly. Moreover, the proposed iHOME system addresses eight cases of JOIN queries which classified into three groups; Similar-to-iHOME, Compute-on-iHOME, and Filter-of-iHOME. According to the experimental results of the iHOME system using TPC-H benchmark, it is found that the execution time of eight JOIN queries using iHOME on Hive has been reduced. Also, the stored data size in the iHOME system is reduced relative to the HOME system, as well as, the Big Data storage is saved. So, by increasing stored data size, the iHOME system guarantees the space scalability and overcomes the storage limitation.  相似文献   

11.
Cloud computing offers massive scalability and elasticity required by many scientific and commercial applications. Combining the computational and data handling capabilities of clouds with parallel processing also has the potential to tackle Big Data problems efficiently. Science gateway frameworks and workflow systems enable application developers to implement complex applications and make these available for end-users via simple graphical user interfaces. The integration of such frameworks with Big Data processing tools on the cloud opens new opportunities for application developers. This paper investigates how workflow systems and science gateways can be extended with Big Data processing capabilities. A generic approach based on infrastructure aware workflows is suggested and a proof of concept is implemented based on the WS-PGRADE/gUSE science gateway framework and its integration with the Hadoop parallel data processing solution based on the MapReduce paradigm in the cloud. The provided analysis demonstrates that the methods described to integrate Big Data processing with workflows and science gateways work well in different cloud infrastructures and application scenarios, and can be used to create massively parallel applications for scientific analysis of Big Data.  相似文献   

12.
The quality of the data is directly related to the quality of the models drawn from that data. For that reason, many research is devoted to improve the quality of the data and to amend errors that it may contain. One of the most common problems is the presence of noise in classification tasks, where noise refers to the incorrect labeling of training instances. This problem is very disruptive, as it changes the decision boundaries of the problem. Big Data problems pose a new challenge in terms of quality data due to the massive and unsupervised accumulation of data. This Big Data scenario also brings new problems to classic data preprocessing algorithms, as they are not prepared for working with such amounts of data, and these algorithms are key to move from Big to Smart Data. In this paper, an iterative ensemble filter for removing noisy instances in Big Data scenarios is proposed. Experiments carried out in six Big Data datasets have shown that our noise filter outperforms the current state-of-the-art noise filter in Big Data domains. It has also proved to be an effective solution for transforming raw Big Data into Smart Data.  相似文献   

13.
ABSTRACT

e-crime is increasing and e-criminals are becoming better at masking their activities. The task of forensic data analysis is becoming more difficult and a systematic approach towards evidence validation is necessary. With no standard validation framework, the skills and interpretation of forensic examiners are unchecked. Standard practices in forensics have emerged in recent years, but none has addressed the development of a model of valid digital evidence. Various security and forensic models exist, but they do not address the validity of the digital evidence collected. Research has addressed the issues of validation and verification of forensic software tools but failed to address the validation of forensic evidence. The forensic evidence collected using forensic software tools can be questioned using an anti-forensic approach. The research presented in this paper is not intended to question the skills of forensic examiners in using forensic software tools but rather to guide forensic examiners to look at evidence in an anti-forensic way. This paper proposes a formal procedure to validate evidence of computer crime.  相似文献   

14.
Value creation is a major factor not only in the sustainability of organizations but also in the maximization of profit, customer retention, business goals fulfillment, and revenue. When the value is intended to be created from Big Data scenarios, value creation entails being understood over a broader range of complexity. A question that arises here is how organizations can use this massive quantity of data and create business value? The present study seeks to provide a model for creating organizational value using Big Data Analytics (BDA). To this end, after reviewing the related literature and interviewing experts, the BDA-based organizational value creation model is developed. Accordingly, five hypotheses are formulated, and a questionnaire is prepared. Then, the respective questionnaire is given to the research statistical population (i.e., IT managers and experts, particularly those specializing in data analysis) to test the research hypotheses. In next phase, connections between model variables are scrutinized using the structural equation modeling (measurement and structural models). The results of the study indicate that investigating the infrastructures of the Big Data Analytics, as well as the capabilities of the organization and those of Big Data Analytics is the initial requirement to create organizational value using BDA. Thereby, the Big Data Analytics strategy is formulated, and ultimately, the organizational value is created as well.  相似文献   

15.
The tremendous growth of data being generated today is making storage and computing a mammoth task. With its distributed processing capability Hadoop gives an efficient solution for such large data. Hadoop’s default data placement strategy places the data blocks randomly across the nodes without considering the execution parameters resulting in several lacunas such as increased execution time, query latency etc., Also, most of the data required for a task execution may not be locally available which creates data-locality problem. Hence we propose an innovative data placement strategy based on dependency of data blocks across the nodes. Our strategy dynamically analyses the history log and establishes relationship between various tasks and blocks required for each task through Block Dependency Graph (BDG). Then Our CORE-Algorithm re-organizes the HDFS layout by redistributing the data blocks to give an optimal data placement, resulting in improved performance for Big Data sets in distributed environment. This strategy is tested in 20-node cluster with different real-world MR applications. The results conclude that proposed strategy reduces the query execution time by 23%, improves the data locality by 50.7%, compared to default.  相似文献   

16.
Today, a paradigm shift is being observed in science, where the focus is gradually shifting away from operation to data, which is greatly influencing the decision making also. The data is being inundated proactively from several sources in various forms; especially social media and in modern data science vocabulary is being recognized as Big Data. Today, Big Data is permeating through the bigger aspect of human life for scientific and commercial dependencies, especially for massive scale data analytics of beyond the exabyte magnitude. As the footprint of Big Data applications is continuously expanding, the reliability on cloud environments is also increasing to obtain appropriate, robust and affordable services to deal with Big Data challenges. Cloud computing avoids any need to locally maintain the overly scaled computing infrastructure that include not only dedicated space, but the expensive hardware and software also. Several data models to process Big Data are already developed and a number of such models are still emerging, potentially relying on heterogeneous underlying storage technologies, including cloud computing. In this paper, we investigate the growing role of cloud computing in Big Data ecosystem. Also, we propose a novel XCLOUDX {XCloudX, X…X}classification to zoom in to gauge the intuitiveness of the scientific name of the cloud-assisted NoSQL Big Data models and analyze whether XCloudX always uses cloud computing underneath or vice versa. XCloudX symbolizes those NoSQL Big Data models that embody the term “cloud” in their name, where X is any alphanumeric variable. The discussion is strengthen by a set of important case studies. Furthermore, we study the emergence of as-a-Service era, motivated by cloud computing drive and explore the new members beyond traditional cloud computing stack, developed in the past couple of years.  相似文献   

17.
Recent developments like the movements of open access and open data and the unprecedented growth of data, which has come forward as Big Data, have shifted focus to methods to effectively handle such data for use in agro-environmental research. Big Data technologies, together with the increased use of cloud based and high performance computing, create new opportunities for data intensive science in the multi-disciplinary agro-environmental domain. A theoretical framework is presented to structure and analyse data-intensive cases and is applied to three case studies, together covering a broad range of technologies and aspects related to Big Data usage. The case studies indicate that most persistent issues in the area of data-intensive research evolve around capturing the huge heterogeneity of interdisciplinary data and around creating trust between data providers and data users. It is therefore recommended that efforts from the agro-environmental domain concentrate on the issues of variety and veracity.  相似文献   

18.
面向流数据的数据管理系统的研究   总被引:2,自引:1,他引:1  
传统关系数据库系统通常用来存储没有时间概念的相对静止的数据, 对于一些新的应用领域, 信息是以数据序列的形式产生并且需要实时地、持续地进行处理, 这就超出了传统系统的解决能力。数据流数据管理系统是面向流数据而设计的数据管理系统, 它能有效地处理输入流数据并提供持续检索的功能。从整体上分析数据流管理系统的体系结构, 重点讨论基于流数据的数据模型和流查询。  相似文献   

19.
大数据、云计算技术的迅猛发展为挖掘气象数据丰富的科研和经济价值提供了技术支撑,促进了Hadoop及其包含的文件存储系统(HDFS,Hadoop Distributed File System)和分布式计算模型在气象数据处理领域广泛应用。由于气象数据具有大数据的4V特征,还需要引入新的数据处理算法来提高气象数据处理效率。通过对决策树算法原理的研究,基于Hadoop云平台,创建随机森林模型,为数据挖掘算法在云平台上的应用提供一种新的可能性。基于决策树(CART,Classification And Regression Trees)挖掘算法的气象大数据云平台设计,采用Hadoop系统架构和MapReduce工作流程,对气象大数据云平台采用集群部署。平台总体架构分为基础设施层、数据管理与处理层、应用层,减少了决策树建立的时间,实现了气象数据高效加工和挖掘分析等平台功能。  相似文献   

20.
基于数据仓库决策分析的电力系统应用研究   总被引:2,自引:0,他引:2  
杨静 《微机发展》2002,12(5):31-33
首先讨论了基于数据仓库的决策分析系统的优越性,给出了一种实现模型。并就建立数据仓库的关键技术进行讨论,提出了一种新的实现数据清洗和过渡的方法,并且比较数据立方体的三种存储模型,最后给出了量化结果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号