首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
云计算中的数据放置与任务调度算法   总被引:1,自引:0,他引:1  
在海量数据的云计算中,通常面临着数据传输时间长的问题.针对目前大多数数据放置与任务调度算法存在的副本静态性和传输标准精确度的不足,提出了一种动态调整副本个数、以时间作为衡量数据传输标准的数据放置与任务调度算法.该算法根据数据访问频率和存储大小,动态地调整副本个数,一方面减少了低访问率副本对存储空间的浪费;另一方面也减少了高访问率副本所需跨节点传输次数.考虑到节点间网络带宽的差异性,确定以数据传输时间作为传输衡量标准,提高了传输标准的精确度.实验结果表明,除了任务集和网络节点均较少的情况外,该算法均能有效地减少数据传输时间,甚至在任务集合和网络节点较多的情况下,能减少近50%的传输时间.  相似文献   

2.
云系统中面向海量多媒体数据的动态任务调度算法   总被引:1,自引:0,他引:1  
在云计算环境下,对处理海量多媒体数据的作业以及任务调度与资源分配算法进行建模,在此模型下提出一种云计算环境下面向海量多媒体数据的动态任务调度算法.算法以云系统中海量多媒体数据文件的分块多副本存储形式为基础来规划并行处理任务,以文件块和副本的映射关系为特征对云系统中数据节点执行聚类,以已执行完毕任务的历史反馈信息为基础来动态调度未执行任务.实验结果表明提出的算法对提高系统资源利用率和负载均衡有显著效果.  相似文献   

3.
人工智能的飞速发展对高性能计算提出了更高的要求,异构计算环境下任务调度问题一直是高性能计算中的关键问题.本文提出一种基于优先队列划分的调度算法(PQDSA),该算法根据DAG(有向无循环图)任务集的入口节点数量确定优先队列数,通过任务的通信开销和计算开销划分任务队列,进而将关键节点任务分配给合适的队列,以产生效果较佳的任务调度队列,从而提高任务间的并行性,降低任务集的完工时间.与此同时,进一步基于插入策略将任务调度到处理器上,使任务调度更加高效地执行.PQDSA算法可以减少任务间的时间消耗,提高处理器的调度效率.通过与两个经典算法的性能对比,实验结果表明本文提出的PQDSA算法在任务完工时间和调度效率方面都要明显优于对比的算法.  相似文献   

4.
基于云存储的二阶段动态优化调度机制   总被引:1,自引:0,他引:1  
在分布式存储的研究中,如何高效地利用存储空间是个热点问题.存储集群中,每个数据节点存储容量不可能完全一致,由于主节点选择数据节点的随机性,被选中数据节点磁盘可能接近满额,此时主节点会自动做存储负载均衡,占用数据传输带宽,不仅影响数据传输的性能,而且会引起传输数据的不可靠.论文提出一种基于云存储的二阶段动态优化调度机制:第一阶段通过计算副本存储优选比率,采用基于贪心算法的局部优化存储方案,选择存储节点,均衡副本放置空间;第二阶段采用实时监控存储集群,动态调整副本放置节点,达到存储资源的高效利用.最后通过实验,验证了该调度机制可有效地放置副本,减少节点间的数据传输,并提高文件访问效率.  相似文献   

5.
当集群中的部分节点是廉价主机时,采用HDFS的随机存储策略可能使访问频率高的数据存储在廉价节点上,受到廉价节点的性能影响,访问时间过长,降低了集群效率。为改善以上问题,提出一种改进的副本分级存储调度策略。为减少副本调度的次数,先根据节点的CPU、内存、网络、存储负载以及网络距离来评价节点的性能,再从中选取高性能节点进行存储。副本调度以节点中副本的访问频率为依据,结合硬件配置,把访问频率高的副本尽可能存储在高性能、高配置的节点中,以加快集群响应速度。实验结果表明,改进后的策略可以在异构集群中提高副本的访问效率,优化负载均衡。  相似文献   

6.
移动自组网络中由于节点的移动和能量的有限性,节点间的链路不稳定,使得数据的访问成功率较低,节点间能量消耗不平衡。针对此问题,提出基于节点稳定邻居的复制算法,对节点的数据项访问频度进行加权处理,综合考虑节点的稳定邻居数和剩余能量,以此确定数据副本的放置位置。模拟实验结果表明,该算法能有效提高数据访问成功率,平衡节点间能量消耗。  相似文献   

7.
针对Hadoop平台现有任务调度算法优化程度不高的问题, 提出了一种基于数据局部性的推测式任务调度算法。该算法通过计算节点上Map和Reduce任务时长比例, 结合不同节点上数据的局部特性, 采用了比现有算法更精确的任务进度探测方式找出快慢节点, 在快节点上启动剩余时间最长的落后任务的备份任务, 用移动计算代替移动数据。在Hadoop环境中进行了实验, 结果表明该算法比现有算法缩短了任务平均运行时间, 加快了任务的执行效率。  相似文献   

8.
本文提出了基于编码机制的网格数据复制思想,通过对副本数据进行线性分组编码,并将其分散保存到网格存储节点,可形成具有纠删能力的编码子副本组.针对目前热点研究的线性分组编码,探讨基于Cauchy Reed-Solo-mon Code、Tornado Code和Random Linear Code的编码数据复制方案,通过建模手段讨论三者的副本数据访问性能和副本数据可靠性,并与传统的完整数据复制和分块数据复制进行时比分析,证明所提出的编码数据复制有着较优的综合性能.具体实验数据进一步说明,编码副本的编码开销占整个数据复制开销的较小比例,表明编码数据复制是具有可行性的技术方案.  相似文献   

9.
对等网络中一种优化的副本分布方法   总被引:1,自引:0,他引:1  
数据复制技术是一种提高P2P系统中数据可靠性和可用性的常用策略.现有复制方法大多只考虑副本数量,副本数量越多就越能提高资源访问效率,但采用这样的数据复制方法将会带来高昂的副本一致性维护代价.为平衡副本一致性维护的开销和多副本带来的访问性能提升之间的关系,该文提出了动态副本分布方法.文中首先给出了副本目录的设计和副本信息的获取方法,能够获得某一逻辑资源的所有副本信息.然后,根据逻辑资源的全局副本信息,对访问频率高且平均响应时间长的数据资源进行复制,并给出副本数量的计算方法.最后,根据用户访问特征和节点实时带宽等信息计算放置副本的最佳地点,使副本分布能够适应数据访问请求和网络带宽的动态变化.模拟实验结果显示,该方法能够实现全局优化的副本分布,以少量数据副本提升资源访问的性能.  相似文献   

10.
提出一种云环境下的访问热点负载均衡模型:基于节点的吞吐量与响应时间等主要参考指标,构建节点负载判定模块;文件在HDFS存储的过程中,将文件对应的数据块编号与存储路径相结合,设计存放在数据节点中的数据块到文件目录映射表;提出一种基于节点负载以及节点的存储空间的迁移源节点和目标节点选择方法;基于机架感知的机制,制定一种动态副本迁移方案。最后利用执行器下发指令给相应的数据节点,执行具体的迁移任务以及完善迁移后副本因子等参数信息的调整。通过迅速扩散副本的方式,来增加热点文件的副本数量,使得系统能够对外提供更大的吞吐量,缩短系统反应时间。   相似文献   

11.
Data Grid is a geographically distributed environment that deals with large-scale data-intensive applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Data replication is another key optimization technique for reducing access latency and managing large data by storing data in a wisely manner. In this paper two algorithms are proposed, first a novel job scheduling algorithm called Combined Scheduling Strategy (CSS) that uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers the number of jobs waiting in queue, the location of required data for the job and the computing capacity of sites. Second a dynamic data replication strategy, called the Modified Dynamic Hierarchical Replication Algorithm (MDHRA) that improves file access time. This strategy is an enhanced version of Dynamic Hierarchical Replication (DHR) strategy. Data replication should be used wisely because the storage capacity of each Grid site is limited. Thus, it is important to design an effective strategy for the replication replacement. MDHRA replaces replicas based on the last time the replica was requested, number of access, and size of replica. It selects the best replica location from among the many replicas based on response time that can be determined by considering the data transfer time, the storage access latency, the replica requests that waiting in the storage queue and the distance between nodes. The simulation results demonstrate the proposed replication and scheduling strategies give better performance compared to the other algorithms.  相似文献   

12.
Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Scheduling is a traditional problem in parallel and distributed system. However, due to special issues and goals of Grid, traditional approach is not effective in this environment any more. Therefore, it is necessary to propose methods specialized for this kind of parallel and distributed system. Another solution is to use a data replication strategy to create multiple copies of files and store them in convenient locations to shorten file access times. To utilize the above two concepts, in this paper we develop a job scheduling policy, called hierarchical job scheduling strategy (HJSS), and a dynamic data replication strategy, called advanced dynamic hierarchical replication strategy (ADHRS), to improve the data access efficiencies in a hierarchical Data Grid. HJSS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. Moreover, due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica: first, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, number of intercommunications, number of replications, hit ratio, computing resource usage and storage usage.  相似文献   

13.
Data Grid is a geographically distributed environment that deals with large-scale data-intensive applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Data replication is another key optimization technique for reducing access latency and managing large data by storing data in a wisely manner. In this paper, two algorithms are proposed: first, a novel job scheduling algorithm called Combined Scheduling Strategy (CSS) that considers the number of jobs waiting in queue, the location of required data for the job, and computational capability; second, a dynamic data replication strategy called Dynamic Hierarchical Replication Algorithm (DHRA) that improves file access time. DHRA stores each replica in an appropriate site, i.e., appropriate site in the requested region that has the highest number of access for that particular replica. Also, it can minimize access latency by selecting the best replica when various sites hold replicas of datasets. The simulation results demonstrate the proposed replication and scheduling strategies give better performance compared to the other algorithms.  相似文献   

14.
The Data Grid provides massive aggregated computing resources and distributed storage space to deal with data-intensive applications. Due to the limitation of available resources in the grid as well as production of large volumes of data, efficient use of the Grid resources becomes an important challenge. Data replication is a key optimization technique for reducing access latency and managing large data by storing data in a wise manner. Effective scheduling in the Grid can reduce the amount of data transferred among nodes by submitting a job to a node where most of the requested data files are available. In this paper two strategies are proposed, first a novel job scheduling strategy called Weighted Scheduling Strategy (WSS) that uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers the number of jobs waiting in a queue, the location of the required data for the job and the computing capacity of the sites Second, a dynamic data replication strategy, called Enhanced Dynamic Hierarchical Replication (EDHR) that improves file access time. This strategy is an enhanced version of the Dynamic Hierarchical Replication strategy. It uses an economic model for file deletion when there is not enough space for the replica. The economic model is based on the future value of a data file. Best replica placement plays an important role for obtaining maximum benefit from replication as well as reducing storage cost and mean job execution time. So, it is considered in this paper. The proposed strategies are implemented by OptorSim, the European Data Grid simulator. Experiment results show that the proposed strategies achieve better performance by minimizing the data access time and avoiding unnecessary replication.  相似文献   

15.
Data replication techniques are used in data grid to reduce makespan, storage consumption, access latency and network bandwidth. Data replication enhances data availability and thereby increases the system reliability. There are two steps involved in data replication, namely, replica placement and replica selection. Replica placement involves identifying the best possible node to duplicate data based on network latency and user request. Replica selection involves selecting the best replica location to access the data for job execution in the data grid. Various replica placement and selection algorithms are available in the literature. These algorithms measure and analyze different parameters such as bandwidth consumption, access cost, scalability, execution time, storage consumption and makespan. In this paper, various replica placement and selection strategies along with their merits and demerits are discussed. This paper also analyses the performance of various strategies with respect to the parameters mentioned above. In particular, this paper focuses on the dynamic replica placement and selection strategies in the data grid environment.  相似文献   

16.
数据副本管理是云计算系统管理的重要组成部分,在云计算系统的海量数据处理过程中,针对目前已知的数据存放与资源调度算法存在考虑副本动态性和可靠性的不足,提出了一种动态的副本放置机制。该机制基于区域结构,考虑数据处理时其副本的数量和放置位置,以及副本的产生对于内存和带宽等系统资源的开销:首先根据云存储中的副本信息,对被访问频率高且访问平均响应时间长的数据信息进行复制,并给出副本数量的计算方法;考虑缩小副本分布的节点选择范围,提出动态的副本放置算法——DRA,将一定范围内的节点根据提出的域的划分,进行放置筛选,以存放数据副本。实验结果表明,提出的动态放置机制不仅减少了低访问率副本对系统存储空间的浪费;同时也减少了高访问率副本所需跨节点的传输延迟,有效提高了云存储系统中的数据文件的访问效率、负载的均衡水平,以及云存储系统的可靠性和可用性。  相似文献   

17.
Data grids support access to widely distributed storage for large numbers of users accessing potentially many large files. Efficient access is hindered by the high latency of the Internet. To improve access time, replication at nearby sites may be used. Replication also provides high availability, decreased bandwidth use, enhanced fault tolerance, and improved scalability. Resource availability, network latency, and user requests in a grid environment may vary with time. Any replica placement strategy must be able to adapt to such dynamic behavior. In this paper, we describe a new dynamic replica placement algorithm, Popularity Based Replica Placement (PBRP), for hierarchical data grids which is guided by file “popularity”. Our goal is to place replicas close to clients to reduce data access time while still using network and storage resources efficiently. The effectiveness of PBRP depends on the selection of a threshold value related to file popularity. We also present Adaptive-PBRP (APBRP) that determines this threshold dynamically based on data request arrival rates. We evaluate both algorithms using simulation. Results for a range of data access patterns show that our algorithms can shorten job execution time significantly and reduce bandwidth consumption compared to other dynamic replication methods.  相似文献   

18.
在异构Hadoop集群场景中, 为了缓和由于纠删码和副本存储模式混合使用, 以及服务器节点本身实时算力差异造成的MapReduce作业处理效率低下的问题, 本文实现了一种根据数据存储情况和节点实时负载来在多并发场景下动态调节MapReduce作业任务分配情况的调度策略. 该策略通过修改当前Hadoop框架中的数据存储选址策略并对节点任务并发量进行动态控制, 在多作业并发时实现更加均衡的作业间资源分配. 实验结果表明, 相较于Hadoop默认的两种作业调度策略, 本文提出的调度模式能够将作业完成时间缩短约17%, 并有效避免部分作业面临的饥饿现象.  相似文献   

19.
In recent years, grid technology has had such a fast growth that it has been used in many scientific experiments and research centers. A large number of storage elements and computational resources are combined to generate a grid which gives us shared access to extra computing power. In particular, data grid deals with data intensive applications and provides intensive resources across widely distributed communities. Data replication is an efficient way for distributing replicas among the data grids, making it possible to access similar data in different locations of the data grid. Replication reduces data access time and improves the performance of the system. In this paper, we propose a new dynamic data replication algorithm named PDDRA that optimizes the traditional algorithms. Our proposed algorithm is based on an assumption: members in a VO (Virtual Organization) have similar interests in files. Based on this assumption and also file access history, PDDRA predicts future needs of grid sites and pre-fetches a sequence of files to the requester grid site, so the next time that this site needs a file, it will be locally available. This will considerably reduce access latency, response time and bandwidth consumption. PDDRA consists of three phases: storing file access patterns, requesting a file and performing replication and pre-fetching and replacement. The algorithm was tested using a grid simulator, OptorSim developed by European Data Grid projects. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, effective network usage, total number of replications, hit ratio and percentage of storage filled.  相似文献   

20.
Cloud computing is becoming a very popular word in industry and is receiving a large amount of attention from the research community. Replica management is one of the most important issues in the cloud, which can offer fast data access time, high data availability and reliability. By keeping all replicas active, the replicas may enhance system task successful execution rate if the replicas and requests are reasonably distributed. However, appropriate replica placement in a large-scale, dynamically scalable and totally virtualized data centers is much more complicated. To provide cost-effective availability, minimize the response time of applications and make load balancing for cloud storage, a new replica placement is proposed. The replica placement is based on five important parameters: mean service time, failure probability, load variance, latency and storage usage. However, replication should be used wisely because the storage size of each site is limited. Thus, the site must keep only the important replicas.We also present a new replica replacement strategy based on the availability of the file, the last time the replica was requested, number of access, and size of replica. We evaluate our algorithm using the CloudSim simulator and find that it offers better performance in comparison with other algorithms in terms of mean response time, effective network usage, load balancing, replication frequency, and storage usage.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号