首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
频繁闭项集的挖掘是发现数据项之间关联规则的一种有效方式。当前以MapReduce模式为基础的云计算平台为解决海量数据中的关联规则挖掘问题提供新的解决思路。文中提出并实现一种基于Hadoop云计算平台的频繁闭项集的并行挖掘算法。该算法主要包括并行计数、构造全局频繁项表、并行挖掘局部频繁闭项集和并行筛选全局频繁闭项集四个步骤。在多个数据集上的实验表明,该方法能较大提高数据挖掘的效率,具有较好的加速比。  相似文献   

2.
基于Hadoop云计算模型探究   总被引:1,自引:0,他引:1  
云计算是并行计算、分布式计算和网格计算的发展。文中详细地阐述了MapReduce的编程思想、工作原理、步骤和方法。探讨了来自Apache开源的分布式计算平台Hadoop的核心设计MapReduce编程模型,并通过算法实验分析和研究了MapReduce模型的工作方式和应用方法。  相似文献   

3.
Cloud computing offers massive scalability and elasticity required by many scientific and commercial applications. Combining the computational and data handling capabilities of clouds with parallel processing also has the potential to tackle Big Data problems efficiently. Science gateway frameworks and workflow systems enable application developers to implement complex applications and make these available for end-users via simple graphical user interfaces. The integration of such frameworks with Big Data processing tools on the cloud opens new opportunities for application developers. This paper investigates how workflow systems and science gateways can be extended with Big Data processing capabilities. A generic approach based on infrastructure aware workflows is suggested and a proof of concept is implemented based on the WS-PGRADE/gUSE science gateway framework and its integration with the Hadoop parallel data processing solution based on the MapReduce paradigm in the cloud. The provided analysis demonstrates that the methods described to integrate Big Data processing with workflows and science gateways work well in different cloud infrastructures and application scenarios, and can be used to create massively parallel applications for scientific analysis of Big Data.  相似文献   

4.
Nowadays, many organizations analyze their data with the MapReduce paradigm, most of them using the popular Apache Hadoop framework. As the data size managed by MapReduce applications is steadily increasing, the need for improving the Hadoop performance also grows. Existing modifications of Hadoop (e.g., Mellanox Unstructured Data Accelerator) attempt to improve performance by changing some of its underlying subsystems. However, they are not always capable to cope with all its performance bottlenecks or they hinder its portability. Furthermore, new frameworks like Apache Spark or DataMPI can achieve good performance improvements, but they do not keep compatibility with existing MapReduce applications. This paper proposes Flame-MR, a new event-driven MapReduce architecture that increases Hadoop performance by avoiding memory copies and pipelining data movements, without modifying the source code of the applications. The performance evaluation on two representative systems (an HPC cluster and a public cloud platform) has shown experimental evidence of significant performance increases, reducing the execution time by up to 54% on the Amazon EC2 cloud.  相似文献   

5.
Internet protocol television viewers spend considerable time browsing through the many existing channels, which is inefficient and time consuming. Although the recommendation system can solve the channel-switching problem, its performance is slow unless it is adapted to read a large amount of data sets. This study proposes a novel cloud-assisted channel-recommendation system under a cloud computing environment, channel association rules (CARs), to speed up the performance of channel switching, thereby help users to find their favorite channels in less time. The CARs algorithm is compared with the conventional (COV) solution and the most frequently selected (MFS) algorithm based on MovieLens data sets. The experimental results indicate that the predictive accuracy of CARs is superior to that of the COV and MFS algorithms. In addition, CARs use parallel computing in MapReduce to distribute large amounts of user history logs across multiple computers for processing. The experimental results show that the proposed algorithm can be employed to efficiently handle big data in a finite time when a huge of cloud servers are rented from commercial cloud providers such as Amazon Elastic Compute Cloud (EC2), Microsoft HDinsight.  相似文献   

6.
Adapting scientific computing problems to clouds using MapReduce   总被引:1,自引:0,他引:1  
Cloud computing, with its promise of virtually infinite resources, seems to suit well in solving resource greedy scientific computing problems. To study this, we established a scientific computing cloud (SciCloud) project and environment on our internal clusters. The main goal of the project is to study the scope of establishing private clouds at the universities. With these clouds, students and researchers can efficiently use the already existing resources of university computer networks, in solving computationally intensive scientific, mathematical, and academic problems. However, to be able to run the scientific computing applications on the cloud infrastructure, the applications must be reduced to frameworks that can successfully exploit the cloud resources, like the MapReduce framework. This paper summarizes the challenges associated with reducing iterative algorithms to the MapReduce model. Algorithms used by scientific computing are divided into different classes by how they can be adapted to the MapReduce model; examples from each such class are reduced to the MapReduce model and their performance is measured and analyzed. The study mainly focuses on the Hadoop MapReduce framework but also compares it to an alternative MapReduce framework called Twister, which is specifically designed for iterative algorithms. The analysis shows that Hadoop MapReduce has significant trouble with iterative problems while it suits well for embarrassingly parallel problems, and that Twister can handle iterative problems much more efficiently. This work shows how to adapt algorithms from each class into the MapReduce model, what affects the efficiency and scalability of algorithms in each class and allows us to judge which framework is more efficient for each of them, by mapping the advantages and disadvantages of the two frameworks. This study is of significant importance for scientific computing as it often uses complex iterative methods to solve critical problems and adapting such methods to cloud computing frameworks is not a trivial task.  相似文献   

7.
首先介绍了云计算产生的背景、概念、基本原理和体系结构,然后以Google系统为例详细阐述了云计算的实现机制。云计算是并行计算、分布式计算和网格计算等计算机科学概念的商业实现。Google拥有自己云计算平台,提供了云计算的实现机制和基础构架模式。该文阐述了Google云计算平台:GFS分布式文件、分布式数据库BigTable及Map/Reduce编程模式。最后分析了云计算发展所面临的挑战。  相似文献   

8.
The needs for efficient and scalable community health awareness model become a crucial issue in today’s health care applications. Many health care service providers need to provide their services for long terms, in real time and interactively. Many of these applications are based on the emerging Wireless Body Area networks (WBANs) technology. WBANs have developed as an effective solution for a wide range of healthcare, military, sports, general health and social applications. On the other hand, handling data in a large scale (currently known as Big Data) requires an efficient collection and processing model with scalable computing and storage capacity. Therefore, a new computing paradigm is needed such as Cloud Computing and Internet of Things (IoT). In this paper we present a novel cloud supported model for efficient community health awareness in the presence of a large scale WBANs data generation. The objective is to process this big data in order to detect the abnormal data using MapReduce infrastructure and user defined functions with minimum processing delay. The goal is to have a large monitored data of WBANs to be available to the end user or to the decision maker in reliable manner. While reducing data packet processing energy, the proposed work is minimizing the data processing delay by choosing cloudlet or local cloud model and MapReduce infrastructure. So, the overall delay is minimized, thus leading to detect the abnormal data in the cloud in real time mode. In this paper we present a multi-layer computing model composed of Local Cloud (LC) layer and Enterprise Cloud (EP) layer that aim to process the collected data from Monitored Subjects (MSs) in a large scale to generate useful facts, observations or to find abnormal phenomena within the monitored data. Performance results show that integrating the MapReduce capabilities with cloud computing model will reduce the processing delay. The proposed MapReduce infrastructure has also been applied in lower layer, such as LC in order to reduce the amount of communications and processing delay. Performance results show that applying MapReduce infrastructure in lower tire will significantly decrease the overall processing delay.  相似文献   

9.
蔡键  王树梅 《数字社区&智能家居》2009,5(9):7093-7095,7107
先介绍了云计算产生的背景、概念、基本原理和体系结构,然后以Google系统为例详细阐述了云计算的实现机制。云计算是并行计算、分布式计算和网格计算等计算机科学概念的商业实现。Google拥有自己云计算平台,提供了云计算的实现机制和基础构架模式。该文阐述了Google云计算平台:GFS分布式文件、分布式数据库BigTable及Map/Reduce编程模式。最后分析了云计算发展所面临的挑战。  相似文献   

10.
MapReduce:新型的分布式并行计算编程模型   总被引:3,自引:0,他引:3  
MapReduce是Google提出的分布式并行计算编程模型,用于大规模数据的并行处理。Ma-pReduce模型受函数式编程语言的启发,将大规模数据处理作业拆分成若干个可独立运行的Map任务,分配到不同的机器上去执行,生成某种格式的中间文件,再由若干个Reduce任务合并这些中间文件获得最后的输出文件。用户在使用MapReduce模型进行大规模数据处理时,可以将主要精力放在如何编写Map和Reduce函数上,其它并行计算中的复杂问题诸如分布式文件系统、工作调度、容错、机器间通信等都交给MapReduce系统处理,在很大程度上降低了整个编程难度。MapReduce日益成为云计算平台的主流编程模型。Apache Hadoop项目提供开源的MapReduce系统还有待进一步完善。  相似文献   

11.
Cloud computing techniques take the form of distributed computing by utilizing multiple computers to execute computing simultaneously on the service side. To process the increasing quantity of multimedia data, numerous large-scale multimedia data storage computing techniques in the cloud computing have been developed. Of all the techniques, Hadoop plays a key role in the cloud computing. Hadoop, a computing cluster formed by low-priced hardware, can conduct the parallel computing of petabytes of multimedia data. Hadoop features high-reliability, high-efficiency, and high-scalability. The numerous large-scale multimedia data computing techniques include not only the key core techniques, Hadoop and MapReduce, but also the data collection techniques, such as File Transfer Protocol and Flume. In addition, distributed system configuration allocation, automatic installation, and monitoring platform building and management techniques are all included. As a result, only with the integration of all the techniques, a reliable large-scale multimedia data platform can be offered. In this paper, we introduce how cloud computing can make a breakthrough by proposing a multimedia social network dataset on Hadoop platform and implementing a prototype version. Detailed specifications and design issues are discussed as well. An important finding of this article is that we can save more time if we conduct the multimedia social networking analysis using Cloud Hadoop Platform rather than using a single computer. The advantages of cloud computing over the traditional data processing practices are fully demonstrated in this article. The applicable framework designs and the tools available for the large-scale data processing are also proposed. We show the experimental multimedia data including data sizes and processing time.  相似文献   

12.
针对Hadoop平台MapReduce分布式计算模型运行机制中的顺序制约而产生的计算资源浪费问题,从提高平台中每个执行节点的细粒度并行数据处理角度出发,结合Java共享内存多线程编程技术,对该模型进行了优化,提出一种MapReduce+OpenMP粗细粒度相结合的分布式并行计算模型。并在由四个节点组成的Hadoop集群环境下对不同规模大小的出租车GPS轨迹数据分析处理,验证该模型的性能和效率,实验结果证明MapReduce+OpenMP分布式并行计算模型确实能够提高针对大数据集的计算效率,是对Hadoop平台大数据分析处理模型有效的完善和优化。  相似文献   

13.
基于Hadoop MapReduce模型的应用研究   总被引:4,自引:0,他引:4  
MapReduce是一种简化并行计算的分布式编程模型,是Google的一项重要技术,通常被用于数据密集型的分布式并行计算.探讨了来自Apache开源的分布式计算平台Hadoop的核心设计MapReduce编程模型,并通过算法实验分析和研究了MapReduce模型的工作方式和应用方法.  相似文献   

14.
云计算研究     
王倩  曹彦 《软件》2013,34(5):116-118
云计算模式是在基础设施即服务(IaaS)、平台即服务(PaaS)、软件即服务(SaaS)、分布式计算、并行计算和网格计算等概念演进并产生的结果。云计算模式是一种全新的计算应用模式,将会成为人们获取服务的主导方式。本文首先介绍云计算在业界中的概念,接着分析云计算和相关计算,最后关于云计算发展的前景进行展望。  相似文献   

15.
云计算依托计算机网络系统,目前已经成为人们生活的重要部分,随着网络化、虚拟化生活的加速发展,诸如Google、Microsoft、Apple、Amazon、IBM等互联网IT和手机、网络运营商巨头开始重新定位企业发展的战略核心.云计算作为IT商业计算模型,它将计算任务分布在各种类型的广域网络和局域网络组成计算机网络系统,使用户能够借助网络按需获取计算力、存储空间和信息服务.云计算的用户通过PC、手机以及其他终端连接到网络使用云资源;随着云计算的广泛应用,云计算的环境安全环境、数据安全成为突出问题,如何保障云计算的安全成为当前急需解决的问题.本文介绍了云计算相关概念,以及对云计算数据安全风险进行分析,并提出了防范策略.  相似文献   

16.
In the recent years the problems of using generic storage (i.e., relational) techniques for very specific applications have been detected and outlined and, as a consequence, some alternatives to Relational DBMSs (e.g., HBase) have bloomed. Most of these alternatives sit on the cloud and benefit from cloud computing, which is nowadays a reality that helps us to save money by eliminating the hardware as well as software fixed costs and just pay per use. On top of this, specific querying frameworks to exploit the brute force in the cloud (e.g., MapReduce) have also been devised. The question arising next tries to clear out if this (rather naive) exploitation of the cloud is an alternative to tuning DBMSs or it still makes sense to consider other options when retrieving data from these settings.In this paper, we study the feasibility of solving OLAP queries with Hadoop (the Apache project implementing MapReduce) while benefiting from secondary indexes and partitioning in HBase. Our main contribution is the comparison of different access plans and the definition of criteria (i.e., cost estimation) to choose among them in terms of consumed resources (namely CPU, bandwidth and I/O).  相似文献   

17.
面向云计算的网络化平台研究与实现   总被引:14,自引:1,他引:13  
云计算提供三种类型的服务:基础设施即服务、平台即服务和软件即服务。很多云实例都采用高性能计算结点构建基础设施,而高性能计算机的传统使用方式制约了云平台型服务的发展。本文设计并实现了基于高性能计算机的面向云计算的网络化平台NPCC,这是尝试解决高性能计算环境支持提供云平台型服务存在问题的一种探索性研究。NPCC采用了高性能虚拟域HPVZ技术和多目标协同的并行工作负载调度策略等,改变了传统高性能计算机的共享使用方式,为用户提供了具有易用性、通用性、安全性、可定制化和图形化的面向云计算的网络化平台环境。  相似文献   

18.
云计算为存储和分析海量数据提供了高效的解决方案,对数据挖掘算法的研究具有重要的理论意义和应用价值。SLIQ算法采用逐一遍历并计算伸缩性指标的方法来寻找最佳分裂点,这种方法过于消耗时间,当数据量增大时,算法的执行效率很低。本文针对云计算环境下的决策规则挖掘算法展开研究,介绍了Map Reduce编程模型,在此基础上,以实现云计算环境下SLIQ并行化挖掘为目的,给出了改进后的SLIQ算法在Map Reduce编程模型上的应用过程。  相似文献   

19.
基于Hadoop的高性能海量数据处理平台研究   总被引:2,自引:0,他引:2  
海量数据高性能计算蕴藏着巨大的应用价值,但是目前云计算体系只具有海量数据处理能力,而不具有足够的高性能计算能力。将具有超强并行计算能力的CPU与云计算相融合,提出了基于CPU/GPU协同的异构高性能云计算体系结构。以开源Hadoop为基础,采用注释码的形式对MapReduce函数中需要并行的部分进行标记。通过 定制GPU类加载器,将被标记代码转换为CUDA代码并动态编译运行。该平台将GPU的计算能力融合到MapReduce框架中,可高效处理海量数据。  相似文献   

20.
首先,介绍了西部地区中小企业信息化现状分析、企业云计算的发展的需求与动因、云计算特征与分类、云服务模式,其次,阐述了西部地区中小企业云信息服务平台的分析,涉及云信息服务系统体系架构、云平台软硬件资源分析、云基础设施管理平台数据关联访问及促进西部地区中小企业云电子商务应用等.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号