期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

万聪王翠荣王聪吕艳霞贾朔《计算机工程与科学》2014,36(12):2286-2295

MapReduce是一个能够对大规模数据进行分布式处理的框架,目前被各个领域广泛应用。在提供MapReduce服务的集群中,如何保证不同优先级用户的截止时间限定是MapReduce作业调度问题的一个挑战。针对这一问题,提出了一个基于排队网络的多优先级作业调度算法（MPSA）。首先分析和归纳了基于MapReduce模型的算法,提出了三种常见模式,采用Jackson排队网络对基于MapReduce模型的算法建立了数学模型,应用该网络模型可以求出不同优先级队列对资源的需求;随后使用AR(1)模型进行预测,使算法可以动态地适应不同的用户访问量;利用二分查找算法,分步计算出不同优先级在map阶段和reduce阶段分配的槽位数;最后实现了在MapReduce模型中应用的实时调度算法。实验结果表明,与传统的FIFO和公平调度算法相比,本文提出的算法在用户到达率和任务规模变化的情况下,可以更加有效地满足不同优先级用户的截止时间限定。相似文献

2.

Online And Offline Scheduling Schemes to Maximize the Weighted Delivered Video Packets Towards Maritime Cpss

Tingting Yang Hailong Feng Chengming Yang Ge Guo Tieshan Li 《计算机系统科学与工程》2018,33(2):157-164

In this paper, the online and offline scheduling schemes towards maritime Cyber Physical Systems (CPSs), to transmit video packets generating from the interior of vessel. During the sailing from the origin port to destination port, the video packets could be delivered via the infostations shoreside. The video packets have their respective release times, deadlines, weights and processing time. The video packets only could be successfully transmitted before their deadlines. A mathematic job-machine problem is mapped. Facing distinguished challenges with unique characteristics imposed in maritime scenario, we focus on the heterogeneous networking and resource optimal scheduling technology to provide valuable insights on the data transmission scheduling via this system. We aim to maximize the weight of delivered packets totally, three algorithms, an offline algorithm, an online ADMISSION Algorithm with no bounded processing times, as well as Exponential-Capacity Algorithm with bounded processing times are developed. Moreover, we induct the approximation ratio and competitive ratios of the proposed algorithms respectively. Finally, we verify the performance of the potential solutions for resource scheduling through comparison simulation. 相似文献

3.

Designing a MapReduce performance model in distributed heterogeneous platforms based on benchmarking approach

Gandomi Abolfazl Movaghar Ali Reshadi Midia Khademzadeh Ahmad 《The Journal of supercomputing》2020,76(9):7177-7203

MapReduce framework is an effective method for big data parallel processing. Enhancing the performance of MapReduce clusters, along with reducing their job execution time, is a fundamental challenge to this approach. In fact, one is faced with two challenges here: how to maximize the execution overlap between jobs and how to create an optimum job scheduling. Accordingly, one of the most critical challenges to achieving these goals is developing a precise model to estimate the job execution time due to the large number and high volume of the submitted jobs, limited consumable resources, and the need for proper Hadoop configuration. This paper presents a model based on MapReduce phases for predicting the execution time of jobs in a heterogeneous cluster. Moreover, a novel heuristic method is designed, which significantly reduces the makespan of the jobs. In this method, first by providing the job profiling tool, we obtain the execution details of the MapReduce phases through log analysis. Then, using machine learning methods and statistical analysis, we propose a relevant model to predict runtime. Finally, another tool called job submission and monitoring tool is used for calculating makespan. Different experiments were conducted on the benchmarks under identical conditions for all jobs. The results show that the average makespan speedup for the proposed method was higher than an unoptimized case.

相似文献

4.

MapReduce in MPI for Large-scale graph algorithms 总被引：1，自引：0，他引：1

Steven J. PlimptonKaren D. Devine 《Parallel Computing》2011,37(9):610-632

相似文献

5.

面向多源大数据云端处理的成本最小化方法

肖文华包卫东朱晓敏邵屹杨陈超 Jianhong Wu 《软件学报》2017,28(3):544-562

云计算为大数据处理提供了一种强大而高效的解决方案.在此模式下,数据管理者（Data Manager,DM）可以租用多个数据中心以实时处理地理分散的数据.然而,由于数据产生的动态性以及资源价格的波动性,将数据迁移至哪些数据中心并提供合适的计算资源来处理它们成为DM低成本处理多源数据的一大问题.本文首先将以上问题转换成联合随机优化问题,然后利用李雅普诺夫（Lyapunov）优化框架将原问题分解成两个独立的子问题进行求解,最后基于求解结果设计在线算法.理论分析表明,所提算法可不断趋近线下最优解并能够保证数据处理时延.在WorldCup98和Youtube数据集上的实验验证了理论分析结果的正确性以及本方法的优越性. 相似文献

6.

最小化多MapReduce任务总完工时间的分析模型及其应用

田文洪陈瑜王心阳薛瑞尼赵勇《计算机工程与科学》2014,36(4):571-578

随着大规模的MapReduce集群广泛地用于大数据处理,特别是当有多个任务需要使用同一个Hadoop集群时,一个关键问题是如何最大限度地减少集群的工作时间,提高MapReduce作业的服务效率。可将多个MapReduce作业当做一个调度任务建模,观察发现多个任务的总完工时间和任务的执行顺序有密切关系。研究目标是设计作业调度系统分析模型,最小化一批MapReduce作业的总完工时间。提出一个更好的调度策略和实现方法, 使整个调度系统符合经典Johnson算法的条件, 从而可使用经典Johnson算法在线性时间内获取总完工时间的最优解。同时,针对需要使用两个或多个资源池进行平衡的问题, 提出了一种线性时间解决方案, 优于已知的近似模拟方案。该理论模型可应用于提高系统响应速度、节能和负载均衡等方面, 对应的应用实例提供了证实。相似文献

7.

Adapting scientific computing problems to clouds using MapReduce 总被引：1，自引：0，他引：1

Satish Narayana Srirama Author VitaePelle JakovitsAuthor Vitae Eero Vainikko Author Vitae 《Future Generation Computer Systems》2012,28(1):184-192

Cloud computing, with its promise of virtually infinite resources, seems to suit well in solving resource greedy scientific computing problems. To study this, we established a scientific computing cloud (SciCloud) project and environment on our internal clusters. The main goal of the project is to study the scope of establishing private clouds at the universities. With these clouds, students and researchers can efficiently use the already existing resources of university computer networks, in solving computationally intensive scientific, mathematical, and academic problems. However, to be able to run the scientific computing applications on the cloud infrastructure, the applications must be reduced to frameworks that can successfully exploit the cloud resources, like the MapReduce framework. This paper summarizes the challenges associated with reducing iterative algorithms to the MapReduce model. Algorithms used by scientific computing are divided into different classes by how they can be adapted to the MapReduce model; examples from each such class are reduced to the MapReduce model and their performance is measured and analyzed. The study mainly focuses on the Hadoop MapReduce framework but also compares it to an alternative MapReduce framework called Twister, which is specifically designed for iterative algorithms. The analysis shows that Hadoop MapReduce has significant trouble with iterative problems while it suits well for embarrassingly parallel problems, and that Twister can handle iterative problems much more efficiently. This work shows how to adapt algorithms from each class into the MapReduce model, what affects the efficiency and scalability of algorithms in each class and allows us to judge which framework is more efficient for each of them, by mapping the advantages and disadvantages of the two frameworks. This study is of significant importance for scientific computing as it often uses complex iterative methods to solve critical problems and adapting such methods to cloud computing frameworks is not a trivial task. 相似文献

8.

Joint routing, scheduling, and power control for multichannel wireless sensor networks with physical interference 总被引：1，自引：0，他引：1

Xiaoling ZHANG Haibin YU Wei LIANG Meng ZHENG 《控制理论与应用(英文版)》2011,9(1):093-105

Reliability and real-time requirements bring new challenges to the energy-constrained wireless sensor networks, especially to the industrial wireless sensor networks. Meanwhile, the capacity of wireless sensor networks can be substantially increased by operating on multiple nonoverlapping channels. In this context, new routing, scheduling, and power control algorithms are required to achieve reliable and real-time communications and to fully utilize the increased bandwidth in multichannel wireless sensor networks. In this paper, we develop a distributed and online algorithm that jointly solves multipath routing, link scheduling, and power control problem, which can adapt automatically to the changes in the network topology and offered load. We particularly focus on finding the resource allocation that realizes trade-off among energy consumption, end-to-end delay, and network throughput for multichannel networks with physical interference model. Our algorithm jointly considers 1) delay and energy-aware power control for optimal transmission radius and rate with physical interference model, 2) throughput efficient multipath routing based on the given optimal transmission rate between the given source-destination pairs, and 3) reliable-aware and throughput efficient multichannel maximal link scheduling for time slots and channels based on the designated paths, and the new physical interference model that is updated by the optimal transmission radius. By proving and simulation, we show that our algorithm is provably efficient compared with the optimal centralized and offline algorithm and other comparable algorithms. 相似文献

9.

MapReduce大数据处理平台与算法研究进展 总被引：1，自引：1，他引：0

宋杰孙宗哲毛克明鲍玉斌于戈《软件学报》2017,28(3):514-543

本文综述了近年来基于MapReduce编程模型的大数据处理平台与算法的研究进展。首先介绍了12个典型的基于MapReduce的大数据处理平台,分析对比它们的实现原理和适用场景,抽象它们的共性。随后介绍基于MapReduce的大数据分析算法,包括搜索算法、数据清洗/变换算法、聚集算法、连接算法、排序算法、偏好查询、最优化算法、图算法、数据挖掘算法。将这些算法按MapReduce实现方式分类,分析影响这算法性能的因素。最后,将大数据处理算法抽象为外存算法,并对外存算法的特征加以梳理,提出了普适的外存算法性能优化方法的研究思路和研究问题,以供研究人员参考。具体包括优化外存算法的磁盘I/O,优化外存算法的局部性,以及设计增量式迭代算法。现有大数据处理平台和算法研究多集中在基于资源分配和任务调度的平台动态性能优化、特定算法并行化、特定算法性能优化等领域,本文提出的外存算法性能优化属于静态优化方法,是现有研究的良好补充,为研究人员提供了广阔的研究空间。相似文献

10.

用户QoS感知的GPU集群深度学习任务动态调度

罗磊陈照云王俪璇《计算机工程与科学》2021,43(8):1331-1340

提出一种GPU集群下用户服务质量QoS感知的深度学习研发平台上的动态任务调度方法.采用离线评估模块对深度学习任务进行离线评测并构建计算性能预测模型.在线调度模块基于性能预测模型,结合任务的预期QoS,共同开展任务放置和任务执行顺序的调度.在一个分布式GPU集群实例上的实验表明,该方法相比其他基准策略能够实现更高的QoS保证率和集群资源利用率. 相似文献

11.

混合存储模式下MapReduce作业调度

杨振宇牛天洋吕敏《计算机系统应用》2023,32(3):70-85

在异构Hadoop集群场景中, 为了缓和由于纠删码和副本存储模式混合使用, 以及服务器节点本身实时算力差异造成的MapReduce作业处理效率低下的问题, 本文实现了一种根据数据存储情况和节点实时负载来在多并发场景下动态调节MapReduce作业任务分配情况的调度策略. 该策略通过修改当前Hadoop框架中的数据存储选址策略并对节点任务并发量进行动态控制, 在多作业并发时实现更加均衡的作业间资源分配. 实验结果表明, 相较于Hadoop默认的两种作业调度策略, 本文提出的调度模式能够将作业完成时间缩短约17%, 并有效避免部分作业面临的饥饿现象. 相似文献

12.

MapReduce scheduling algorithms: a review

Hashem Ibrahim Abaker Targio Anuar Nor Badrul Marjani Mohsen Ahmed Ejaz Chiroma Haruna Firdaus Ahmad Abdullah Muhamad Taufik Alotaibi Faiz Ali Waleed Kamaleldin Mahmoud Yaqoob Ibrar Gani Abdullah 《The Journal of supercomputing》2020,76(7):4915-4945

Recent trends in big data have shown that the amount of data continues to increase at an exponential rate. This trend has inspired many researchers over the past few years to explore new research direction of studies related to multiple areas of big data. The widespread popularity of big data processing platforms using MapReduce framework is the growing demand to further optimize their performance for various purposes. In particular, enhancing resources and jobs scheduling are becoming critical since they fundamentally determine whether the applications can achieve the performance goals in different use cases. Scheduling plays an important role in big data, mainly in reducing the execution time and cost of processing. This paper aims to survey the research undertaken in the field of scheduling in big data platforms. Moreover, this paper analyzed scheduling in MapReduce on two aspects: taxonomy and performance evaluation. The research progress in MapReduce scheduling algorithms is also discussed. The limitations of existing MapReduce scheduling algorithms and exploit future research opportunities are pointed out in the paper for easy identification by researchers. Our study can serve as the benchmark to expert researchers for proposing a novel MapReduce scheduling algorithm. However, for novice researchers, the study can be used as a starting point.

相似文献

13.

Reducing partition skew on MapReduce: an incremental allocation approach

Zhuo WANG Qun CHEN Bo SUO Wei PAN Zhanhuai LI 《Frontiers of Computer Science》2019,13(5):960

MapReduce, a parallel computational model, has been widely used in processing big data in a distributed cluster. Consisting of alternate map and reduce phases, MapReduce has to shuffle the intermediate data generated by mappers to reducers. The key challenge of ensuring balanced workload on MapReduce is to reduce partition skew among reducers without detailed distribution information on mapped data. In this paper, we propose an incremental data allocation approach to reduce partition skew among reducers on MapReduce. The proposed approach divides mapped data into many micro-partitions and gradually gathers the statistics on their sizes in the process of mapping. The micropartitions are then incrementally allocated to reducers in multiple rounds. We propose to execute incremental allocation in two steps, micro-partition scheduling and micro-partition allocation. We propose a Markov decision process (MDP) model to optimize the problem of multiple-round micropartition scheduling for allocation commitment. We present an optimal solution with the time complexity of O(K · N²), in which K represents the number of allocation rounds and N represents the number of micro-partitions. Alternatively, we also present a greedy but more efficient algorithm with the time complexity of O(K · N ln N). Then, we propose a minmax programming model to handle the allocation mapping between micro-partitions and reducers, and present an effective heuristic solution due to its NP-completeness. Finally, we have implemented the proposed approach on Hadoop, an open-source MapReduce platform, and empirically evaluated its performance. Our extensive experiments show that compared with the state-of-the-art approaches, the proposed approach achieves considerably better data load balance among reducers as well as overall better parallel performance. 相似文献

14.

Enabling soft queries for data retrieval

Hwanjo Yu Seung-won Hwang Kevin Chen-Chuan Chang 《Information Systems》2007

相似文献

15.

一种MapReduce实时调度算法设计及实现

刘吉陈香兰代栋孙明明周学海《计算机系统应用》2013,22(8):113-119

MapReduce是云计算中重要的批数据处理框架,多任务共享MapReduce机群并满足任务实时性要求是调度算法急需解决的问题。提出两阶段实时调度算法,将调度划分为任务间调度和任务内调度。对于任务间调度,使用抽样法和经验值法确定子任务执行时间,利用该参数建立资源分配模型,动态确定任务优先级进行调度;对于子任务使用延迟调度策略进行调度,保证计算的本地性。实验结果显示,两阶段实时调度算法相比公平调度算法和FIFO算法,在保证吞吐量的同时能够满足任务实时性要求。相似文献

16.

SLA-aware energy-efficient scheduling scheme for Hadoop YARN

Xiaojun Cai Feng Li Ping Li Lei Ju Zhiping Jia 《The Journal of supercomputing》2017,73(8):3526-3546

Apache Hadoop becomes ubiquitous for cloud computing which provides resources as services for multi-tenant applications. YARN (a.k.a. MapReduce 2.0) is one of the key features in the second-generation Hadoop, which provides resource management and scheduling for large-scale MapReduce environments. Two enormous challenges in the YARN scheduler are the abilities to automatically tailor and control resource allocations to different jobs for achieving their Service Level Agreements (SLAs), and minimize energy consumption of the overall cloud computing system. In this work, we propose an SLA-aware energy-efficient scheduling scheme which allocates appropriate amount of resources to MapReduce applications with YARN architecture. In our task scheduling policy, We consider the data locality information to save the MapReduce network traffic. Furthermore, the slack time between the actual execution time of completed tasks and expected completion time of the application is utilized to improve the energy-efficiency of the system. An online userspace governor-based dynamic voltage and frequency scaling (DVFS) scheme is designed in the YARN per-application ApplicationMaster to dynamically change the CPU frequency for upcoming tasks given the slack time from previous completed tasks. Experimental evaluation shows that our proposed scheme outperforms the existing MapReduce scheduling policies in terms of both resource ultization and energy-efficiency. 相似文献

17.

Context‐aware scheduling in MapReduce: a compact review

Muhammad Idris Shujaat Hussain Maqbool Ali Arsen Abdulali Muhammad Hameed Siddiqi Byeong Ho Kang Sungyoung Lee 《Concurrency and Computation》2015,27(17):5332-5349

It is a fact that the attention of research community in computer science, business executives, and decision makers is drastically drawn by big data. As the volume of data becomes bigger, it needs performance‐oriented data‐intensive processing frameworks such as MapReduce, which can scale computation on large commodity clusters. Hadoop MapReduce processes data in Hadoop Distributed File System as jobs scheduled according to YARN fair scheduler and capacity scheduler. However, with advancement and dynamic changes in hardware and operating environments, the performance of clusters is greatly affected. Various efforts in literature have been made to address the issues of heterogeneity (i.e., clusters consisting of virtual machines and machines with different hardware), network communication, data locality, better resource utilization, and run‐time scheduling. In this paper, we present a survey to discuss various research efforts made so far to improve Hadoop MapReduce scheduling. We classify scheduling algorithms and techniques proposed in the literature so far based on their addressing areas and present a taxonomy. Furthermore, we also discuss various aspects of open issues and challenges in the scheduling of MapReduce to improve its performance. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

18.

F1owS:一种MapReduce数据流公平调度方法

李奇原刘杰叶丹许舒人《计算机科学》2012,39(9):157-161

MapReduce Job的调度机制一直是学术研究的热点。在分析MapReduce数据流调度模型的基础上,提出一种面向MapReduce数据流的公平调度方法FlowS。该方法采用数据流池来分配资源以保证MapReduce数据流的隔离性,并且采用数据流池动态构建算法来确保资源的公平分配。实验表明,该调度方法可以有效提高Hadoop集群对MapReduce数据流的处理效率。相似文献

19.

基于机器学习的MapReduce资源调度算法

于倩蔚承建王开朱林军《计算机应用研究》2016,33(1)

针对MapReduce中允许map和shuffle阶段重叠的优化模型需要自适应性的问题,提出了基于此模型的机器学习的资源调度算法,利用贝叶斯分类器依据作业对系统资源的需求和系统环境的匹配程度对作业进行调度,并不断更新分类器,使其具有自适应性,考虑了map和shuffle的重叠阶段。通过模拟实验验证,改进后的算法能够提高MapReduce系统的性能,获得更好的平均响应时间。相似文献

20.

Profiling and evaluating hardware choices for MapReduce environments: An application-aware approach

《Performance Evaluation》2014

The core business of many companies depends on the timely analysis of large quantities of new data. MapReduce clusters that routinely process petabytes of data represent a new entity in the evolving landscape of clouds and data centers. During the lifetime of a data center, old hardware needs to be eventually replaced by new hardware. The hardware selection process needs to be driven by performance objectives of the existing production workloads. In this work, we present a general framework, called Ariel, that automates system administrators’ efforts for evaluating different hardware choices and predicting completion times of MapReduce applications for their migration to a Hadoop cluster based on the new hardware. The proposed framework consists of two key components: (i) a set of microbenchmarks to profile the MapReduce processing pipeline on a given platform, and (ii) a regression-based model that establishes a performance relationship between the source and target platforms. Benchmarking and model derivation can be done using a small test cluster based on new hardware. However, the designed model can be used for predicting the jobs’ completion time on a large Hadoop cluster and be applied for its sizing to achieve desirable service level objectives (SLOs). We validate the effectiveness of the proposed approach using a set of twelve realistic MapReduce applications and three different hardware platforms. The evaluation study justifies our design choices and shows that the derived model accurately predicts performance of the test applications. The predicted completion times of eleven applications (out of twelve) are within 10% of the measured completion times on the target platforms. 相似文献