首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
考虑网格资源异构、自治、动态等特性,讨论本地用户具有强占优先权情况下的任务调度问题,提出了TBBS(Time-Balancing Based Scheduling Algorithm)算法.建立调度优化模型,以期望完成时间最小为目标选择执行任务的最佳资源组合.以时间均衡策略将任务分解并调度到资源上执行,减少了子任务同步时因等待而产生的延时,获得较好的并行计算性能.采用重复调度策略,适应计算网格中资源的特性.  相似文献   

2.
Distribution of data and computation allows for solving larger problems and executing applications that are distributed in nature. The grid is a distributed computing infrastructure that enables coordinated resource sharing within dynamic organizations consisting of individuals, institutions, and resources. The grid extends the distributed and parallel computing paradigms allowing for resource negotiation and dynamical allocation, heterogeneity, open protocols, and services. Grid environments can be used both for compute-intensive tasks and data intensive applications by exploiting their resources, services, and data access mechanisms. Data mining algorithms and knowledge discovery processes are both compute and data intensive, therefore the grid can offer a computing and data management infrastructure for supporting decentralized and parallel data analysis. This paper discusses how grid computing can be used to support distributed data mining. Research activities in grid-based data mining and some challenges in this area are presented along with some promising future directions for developing grid-based distributed data mining.  相似文献   

3.
Data-intensive Grid applications need access to large data sets that may each be replicated on different resources. Minimizing the overhead of transferring these data sets to the resources where the applications are executed requires that appropriate computational and data resources be selected. In this paper, we consider the problem of scheduling an application composed of a set of independent tasks, each of which requires multiple data sets that are each replicated on multiple resources. We break this problem into two parts: one, to match each task (or job) to one compute resource for executing the job and one storage resource each for accessing each data set required by the job and two, to assign the set of tasks to the selected resources. We model the first part as an instance of the well-known Set Covering Problem (SCP) and apply a known heuristic for SCP to match jobs to resources. The second part is tackled by extending existing MinMin and Sufferage algorithms to schedule the set of distributed data-intensive tasks. Through simulation, we experimentally compare the SCP-based matching heuristic to others in conjunction with the task scheduling algorithms and present the results.  相似文献   

4.
In modern scientific computing communities, scientists are involved in managing massive amounts of very large data collections in a geographically distributed environment. Research in the area of grid computing has given us various ideas and solutions to address these requirements. Data grid mostly deals with large computational problems and provides geographically distributed resources for large-scale data-intensive applications that generate large data sets. Peer-to-peer (P2P) networks have also become a major research topic over the last few years. In a distributed P2P system, a discovery algorithm is required to locate specific information, applications, or users within the system. In this research work, we present our scientific data grid as a large P2P-based distributed system model. By using this model, we study various discovery algorithms for locating data sets in a data grid system. The algorithms we studied are based on the P2P architecture. We investigate these algorithms using our Grid Simulator developed using PARSEC. In this paper, we illustrate our scientific data grid model and our Grid Simulator. We then analyze the performance of the discovery algorithms relative to their average number of hop, success rates and bandwidth consumption.  相似文献   

5.
网格计算是分布计算的一个新的重要的分支,它主要是实现了大规模资源的共享,并且达到了高性能。在许多应用中,需要对大量的数据集进行分析,而这些数据通常是地理上分布的大规模的数据,并且复杂度不断在增加。对于以上的这些应用,网格技术提供了有效的支持,介绍了网格的基础设施以及分布式数据挖掘。  相似文献   

6.
Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Scheduling is a traditional problem in parallel and distributed system. However, due to special issues and goals of Grid, traditional approach is not effective in this environment any more. Therefore, it is necessary to propose methods specialized for this kind of parallel and distributed system. Another solution is to use a data replication strategy to create multiple copies of files and store them in convenient locations to shorten file access times. To utilize the above two concepts, in this paper we develop a job scheduling policy, called hierarchical job scheduling strategy (HJSS), and a dynamic data replication strategy, called advanced dynamic hierarchical replication strategy (ADHRS), to improve the data access efficiencies in a hierarchical Data Grid. HJSS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. Moreover, due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica: first, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, number of intercommunications, number of replications, hit ratio, computing resource usage and storage usage.  相似文献   

7.
Distributed data mining on grids: services, tools, and applications   总被引:4,自引:0,他引:4  
Data mining algorithms are widely used today for the analysis of large corporate and scientific datasets stored in databases and data archives. Industry, science, and commerce fields often need to analyze very large datasets maintained over geographically distributed sites by using the computational power of distributed and parallel systems. The grid can play a significant role in providing an effective computational support for distributed knowledge discovery applications. For the development of data mining applications on grids we designed a system called Knowledge Grid. This paper describes the Knowledge Grid framework and presents the toolset provided by the Knowledge Grid for implementing distributed knowledge discovery. The paper discusses how to design and implement data mining applications by using the Knowledge Grid tools starting from searching grid resources, composing software and data components, and executing the resulting data mining process on a grid. Some performance results are also discussed.  相似文献   

8.
Grid computing is considered a promising trend, which enables the sharing of a wide variety of computational and storage resources geographically distributed. Despite the advantages of such paradigm, several problems have emerged during the last decade; most of them caused by an inefficient utilization of grid resources. The present contribution proposes an approach to improve the grid resources selection process. An optimization model for choosing grid resources in an intelligent way has been designed. A mathematical formulation to monitor the resources efficiency has also been established. Furthermore, the model provides a self‐adaptive capability to grid applications, enhancing them for dealing with the changing environmental conditions. The model applies an artificial intelligence algorithm for ensuring an efficient selection. In particular, three different versions have been implemented. Each of them uses a different algorithm. Finally, during the evaluation phase of the model, the experimental tests were performed in a real grid infrastructure. The results show that the model improves the infrastructure throughput, by increasing the finished tasks rate and by reducing the applications execution time. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

9.
The Data Grid provides massive aggregated computing resources and distributed storage space to deal with data-intensive applications. Due to the limitation of available resources in the grid as well as production of large volumes of data, efficient use of the Grid resources becomes an important challenge. Data replication is a key optimization technique for reducing access latency and managing large data by storing data in a wise manner. Effective scheduling in the Grid can reduce the amount of data transferred among nodes by submitting a job to a node where most of the requested data files are available. In this paper two strategies are proposed, first a novel job scheduling strategy called Weighted Scheduling Strategy (WSS) that uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers the number of jobs waiting in a queue, the location of the required data for the job and the computing capacity of the sites Second, a dynamic data replication strategy, called Enhanced Dynamic Hierarchical Replication (EDHR) that improves file access time. This strategy is an enhanced version of the Dynamic Hierarchical Replication strategy. It uses an economic model for file deletion when there is not enough space for the replica. The economic model is based on the future value of a data file. Best replica placement plays an important role for obtaining maximum benefit from replication as well as reducing storage cost and mean job execution time. So, it is considered in this paper. The proposed strategies are implemented by OptorSim, the European Data Grid simulator. Experiment results show that the proposed strategies achieve better performance by minimizing the data access time and avoiding unnecessary replication.  相似文献   

10.
Grid computing has emerged a new field, distinguished from conventional distributed computing. It focuses on large-scale resource sharing, innovative applications and in some cases, high performance orientation. The Grid serves as a comprehensive and complete system for organizations by which the maximum utilization of resources is achieved. The load balancing is a process which involves the resource management and an effective load distribution among the resources. Therefore, it is considered to be very important in Grid systems. For a Grid, a dynamic, distributed load balancing scheme provides deadline control for tasks. Due to the condition of deadline failure, developing, deploying, and executing long running applications over the grid remains a challenge. So, deadline failure recovery is an essential factor for Grid computing. In this paper, we propose a dynamic distributed load-balancing technique called “Enhanced GridSim with Load balancing based on Deadline Failure Recovery” (EGDFR) for computational Grids with heterogeneous resources. The proposed algorithm EGDFR is an improved version of the existing EGDC in which we perform load balancing by providing a scheduling system which includes the mechanism of recovery from deadline failure of the Gridlets. Extensive simulation experiments are conducted to quantify the performance of the proposed load-balancing strategy on the GridSim platform. Experiments have shown that the proposed system can considerably improve Grid performance in terms of total execution time, percentage gain in execution time, average response time, resubmitted time and throughput. The proposed load-balancing technique gives 7 % better performance than EGDC in case of constant number of resources, whereas in case of constant number of Gridlets, it gives 11 % better performance than EGDC.  相似文献   

11.
An ant algorithm for balanced job scheduling in grids   总被引:1,自引:1,他引:0  
Grid computing utilizes the distributed heterogeneous resources in order to support complicated computing problems. Grid can be classified into two types: computing grid and data grid. Job scheduling in computing grid is a very important problem. To utilize grids efficiently, we need a good job scheduling algorithm to assign jobs to resources in grids.In the natural environment, the ants have a tremendous ability to team up to find an optimal path to food resources. An ant algorithm simulates the behavior of ants. In this paper, we propose a Balanced Ant Colony Optimization (BACO) algorithm for job scheduling in the Grid environment. The main contributions of our work are to balance the entire system load while trying to minimize the makespan of a given set of jobs. Compared with the other job scheduling algorithms, BACO can outperform them according to the experimental results.  相似文献   

12.
一种基于内存服务的内存共享网格系统   总被引:1,自引:0,他引:1  
褚瑞  肖侬  卢锡城 《计算机学报》2006,29(7):1225-1233
内存密集型应用对运行环境的物理内存要求严格,在物理内存不足时将会引发大量磁盘IO,降低系统性能.传统的网络内存致力于在集群内部通过共享空闲节点的物理内存解决该问题,但受集群负载和内部网络影响较大.通过结合网络内存和服务计算、网格计算等技术,提出一种基于内存服务的内存共享网格系统——内存网格,并分析和讨论了实现内存服务的关键技术和算法.内存网格弥补了网络内存的不足,扩展了网格计算的应用范围.通过基于真实应用运行状态的模拟,证明了内存网格与网络内存相比具有性能的提高.  相似文献   

13.
网格任务调度算法是影响网格成功与否的关键技术之一。网格计算中,一个好的任务调度算法不但要考虑所有任务的makespan,使其值尽量小,同样要考虑到整个系统机器间的负载平衡问题。文章对异构计算环境下的元任务调度算法进行了分析,针对Min-min算法可能引发的负载不平衡问题,结合网格计算环境的特点,提出了一种适用于网格计算环境中的任务调度算法。  相似文献   

14.
基于知识网格的数据挖掘   总被引:8,自引:0,他引:8  
魏定国  彭宏 《计算机科学》2006,33(6):210-213
工业、科学、商务等领域的数据通常分布在不同的地方,需要在不同的地点对其进行分布式维护。只有使用计算功能超强的分布式、并行处理系统才能分析这些领域所产生的超大规模数据集。网格为分布式知识发现应用中的计算提供了有效支持。为了在网格上进行数据挖掘的开发,本文提供了一个称之为知识网格的系统,讨论如何应用知识网格设计实施数据挖掘应用,并说明如何搜索网格资源、编制软件和数据组件,以及数据挖掘应用在网格上的执行过程。  相似文献   

15.
网格技术可以充分利用广域网中异构的、广泛分布的、时刻变化的动态资源,以达到完全共享和各种资源之间良好的协同工作。通常这样的整合在没有较高的硬件计算性能的前提下,也能利用数量较多、成本较低的单机来实现超级计算机对巨量数据的迅捷计算。利用网格组件将办公室的单机资源充分整合,同时以绘制Mandelbrot集这个可以易并行的实例对网格计算和单机计算的速度进行对比。实验证明,网格计算在解决计算密集型问题比单机更有优势。  相似文献   

16.
The grid is a promising infrastructure that can allow scientists and engineers to access resources among geographically distributed environments. Grid computing is a new technology which focuses on aggregating resources (e.g., processor cycles, disk storage, and contents) from a large-scale computing platform. Making grid computing a reality requires a resource broker to manage and monitor available resources. This paper presents a workflow-based resource broker whose main functions are matching available resources with user requests and considering network information statuses during matchmaking in computational grids. The resource broker provides a graphic user interface for accessing available and the appropriate resources via user credentials. This broker uses the Ganglia and NWS tools to monitor resource status and network-related information, respectively. Then we propose a history-based execution time estimation model to predict the execution time of parallel applications, according to previous execution results. The experimental results show that our model can accurately predict the execution time of embarrassingly parallel applications. We also report on using the Globus Toolkit to construct a grid platform called the TIGER project that integrates resources distributed across five universities in Taichung city, Taiwan, where the resource broker was developed.
Po-Chi ShihEmail:
  相似文献   

17.
Data Grid provides scalable infrastructure for storage resource and data files management, which supports several large scale applications. Due to limitation of available resources in grid, efficient use of the grid resources becomes an important challenge. Replication is a technique used in data grid to improve fault tolerance and to reduce the bandwidth consumption. This paper proposes a Dynamic Hierarchical Replication (DHR) algorithm that places replicas in appropriate sites i.e. best site that has the highest number of access for that particular replica. It also minimizes access latency by selecting the best replica when various sites hold replicas. The proposed replica selection strategy selects the best replica location for the users' running jobs by considering the replica requests that waiting in the storage and data transfer time. The simulated results with OptorSim, i.e. European Data Grid simulator show that DHR strategy gives better performance compared to the other algorithms and prevents unnecessary creation of replica which leads to efficient storage usage.  相似文献   

18.
Grid computing has become conventional in distributed systems due to technological advancements and network popularity. Grid computing facilitates distributed applications by integrating available idle network computing resources into formidable computing power. As a result, by using efficient integration and sharing of resources, this enables abundant computing resources to solve complicated problems that a single machine cannot manage. However, grid computing mines resources from accessible idle nodes and node accessibility varies with time. A node that is currently idle, may become occupied within a second of time and then be unavailable to provide resources. Accordingly, node selection must provide effective and sufficient resources over a long period to allow load assignment. This study proposes a hybrid load balancing policy to integrate static and dynamic load balancing technologies. Essentially, a static load balancing policy is applied to select effective and suitable node sets. This will lower the unbalanced load probability caused by assigning tasks to ineffective nodes. When a node reveals the possible inability to continue providing resources, the dynamic load balancing policy will determine whether the node in question is ineffective to provide load assignment. The system will then obtain a new replacement node within a short time, to maintain system execution performance.  相似文献   

19.
网格计算利用互联网将分散在不同地理位置的高性能计算机组织成一个“虚拟的超级计算机”,从而实现计算资源共享和降低计算成本。基于校园网的网格计算模型C_Grid以校园骨干网络作为主要的通信网络,支持计算节点的动态加入,提供统一编程接口,实现不同任务的并行分布计算。通过圆周率!计算和梅森素数搜索的实验证明,C_Grid具有很好的可用性和较好的实时性。  相似文献   

20.
近年来,网格计算技术日益成为用来解决数据和计算密集型应用的可行方案,网格运行平台本身和在网格环境中的并行应用都需要大量的点对多点的群组通信.提出一种灵活、可容错的群组通信机制.该机制基于远程方法调用(RMI),可为分布式并行应用提供高效、可容错的群组通信.通信方法可以在本地对象、远程对象,或一组对象中激活.这种通信采用异步方式,通信发起者可以选择全等待或必要性等待两种机制来获取通信结果.从而最大程度地保证通信的可靠性或高效性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号