首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
Data Grid has evolved to be the solution for data-intensive applications, such as High Energy Physics (HEP), astrophysics, and computational genomics. These applications usually have large input of data to be analyzed and these input data are widely replicated across Data Grid to improve the performance. The job scheduling performance on traditional computing jobs can be studied using queuing theory. However, with the addition of data transfer, the job scheduling performance is too complex to be modeled. In this research, we study the impact of data transfer on the performance of job scheduling in the Data Grid environment. We have proposed a parallel downloading system that supports replicating data fragments and parallel downloading of replicated data fragments, to improve the job scheduling performance. The performance of the parallel downloading system is compared with non-parallel downloading system, using three scheduling heuristics: Shortest Turnaround Time (STT), Least Relative Load (LRL) and Data Present (DP). Our simulation results show that the proposed parallel download approach greatly improves the Data Grid performance for all three scheduling algorithms, in terms of the geometric mean of job turnaround time. The advantage of parallel downloading system is most evident when the Data Grid has relatively low network bandwidth and relatively high computing power.  相似文献   

2.
We address the problem of porting parallel distributed applications from static homogeneous cluster environments to dynamic heterogeneous Grid resources. We introduce a generic technique for adaptive load balancing of parallel applications on heterogeneous resources and evaluate it using a case study application: a Virtual Reactor for simulation of plasma chemical vapour deposition. This application has a modular architecture with a number of loosely coupled components suitable for distribution over the Grid. It requires large parameter space exploration that allows using Grid resources for high-throughput computing. The Virtual Reactor contains a number of parallel solvers originally designed for homogeneous computer clusters that needed adaptation to the heterogeneity of the Grid. In this paper we study the performance of one of the parallel solvers, apply the technique developed for adaptive load balancing, evaluate the efficiency of this approach and outline an automated procedure for optimal utilization of heterogeneous Grid resources for high-performance parallel computing.  相似文献   

3.
Improvements in the performance of processors and networks have made it feasible to treat collections of workstations, servers, clusters and supercomputers as integrated computing resources or Grids. However, the very heterogeneity that is the strength of computational and data Grids can also make application development for such an environment extremely difficult. Application development in a Grid computing environment faces significant challenges in the form of problem granularity, latency and bandwidth issues as well as job scheduling. Currently existing Grid technologies limit the development of Grid applications to certain classes, namely, embarrassingly parallel, hierarchical parallelism, work flow and database applications. Of all these classes, embarrassingly parallel applications are the easiest to develop in a Grid computing framework. The work presented here deals with creating a Grid‐enabled, high‐throughput, standalone version of a bioinformatics application, BLAST, using Globus as the Grid middleware. BLAST is a sequence alignment and search technique that is embarrassingly parallel in nature and thus amenable to adaptation to a Grid environment. A detailed methodology for creating the Grid‐enabled application is presented, which can be used as a template for the development of similar applications. The application has been tested on a ‘mini‐Grid’ testbed and the results presented here show that for large problem sizes, a distributed, Grid‐enabled version can help in significantly reducing execution times. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

4.
网格技术可以充分利用广域网中异构的、广泛分布的、时刻变化的动态资源,以达到完全共享和各种资源之间良好的协同工作。通常这样的整合在没有较高的硬件计算性能的前提下,也能利用数量较多、成本较低的单机来实现超级计算机对巨量数据的迅捷计算。利用网格组件将办公室的单机资源充分整合,同时以绘制Mandelbrot集这个可以易并行的实例对网格计算和单机计算的速度进行对比。实验证明,网格计算在解决计算密集型问题比单机更有优势。  相似文献   

5.
Over the past few years, research and development in bioinformatics (e.g. genomic sequence alignment) has grown with each passing day fueling continuing demands for vast computing power to support better performance. This trend usually requires solutions involving parallel computing techniques because cluster computing technology reduces execution times and increases genomic sequence alignment efficiency. One example, mpiBLAST is a parallel version of NCBI BLAST that combines NCBI BLAST with message passing interface (MPI) standards. However, as most laboratories cannot build up powerful cluster computing environments, Grid computing framework concepts have been designed to meet the need. Grid computing environments coordinate the resources of distributed virtual organizations and satisfy the various computational demands of bioinformatics applications. In this paper, we report on designing and implementing a BioGrid framework, called G‐BLAST, that performs genomic sequence alignments using Grid computing environments and accessible mpiBLAST applications. G‐BLAST is also suitable for cluster computing environments with a server node and several client nodes. G‐BLAST is able to select the most appropriate work nodes, dynamically fragment genomic databases, and self‐adjust according to performance data. To enhance G‐BLAST capability and usability, we also employ a WSRF Grid Service Portal and a Grid Service GUI desk application for general users to submit jobs and host administrators to maintain work nodes. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

6.
格点量子色动力学(QCD)是从第一原理出发求解QCD的非微扰方法, 通过在超立方格子上模拟胶子场和费米子场相互作用, 其计算结果被认为是对强相互作用现象的可靠描述, 格点计算对QCD理论研究意义重大. 但是, 格点QCD计算具有非常大的计算自由度导致计算效率难以提升, 通常对格子体系采用区域分解的方法实现并行计算的可扩展性, 但如何提升数据并行计算效率仍然是核心问题. 本文以格点QCD典型软件Grid为例, 研究格点QCD计算中的数据并行计算模式, 围绕格点QCD中的复杂张量计算和提升大规模并行计算效率的问题, 开展格点QCD方法中数据并行计算特征的理论分析, 之后针对Grid软件的SIMD和OpenMP等具体数据并行计算方式进行性能测试分析, 最后阐述数据并行计算模式对格点QCD计算应用的重要意义.  相似文献   

7.
提出了一个基于网络划分和分布式并行运算的P/G网快速验证方法.对于各子网运算,采用带加速子网运算策略的Cholesky分解法;并根据各个子网运算相互独立的特点,采用基于MPI(Message Passing Interface)的并行结构对子网络运算进行分布式并行运算.实验证明,该快速验证方法在运算时间和内存占用上效果十分良好.  相似文献   

8.
Grid Data Management: Open Problems and New Issues   总被引:3,自引:0,他引:3  
Initially developed for the scientific community, Grid computing is now gaining much interest in important areas such as enterprise information systems. This makes data management critical since the techniques must scale up while addressing the autonomy, dynamicity and heterogeneity of the data sources. In this paper, we discuss the main open problems and new issues related to Grid data management. We first recall the main principles behind data management in distributed systems and the basic techniques. Then we make precise the requirements for Grid data management. Finally, we introduce the main techniques needed to address these requirements. This implies revisiting distributed database techniques in major ways, in particular, using P2P techniques. Work partially funded by ARA “Massive Data” of the French ministry of research (project Respire), the European Strep Grid4All project, the CAPES–COFECUB Daad project and the CNPq–INRIA Gridata project.  相似文献   

9.
A PTS-PGATS based approach for data-intensive scheduling in data grids   总被引:1,自引:0,他引:1  
Grid computing is the combination of computer resources in a loosely coupled, heterogeneous, and geographically dispersed environment. Grid data are the data used in grid computing, which consists of large-scale data-intensive applications, producing and consuming huge amounts of data, distributed across a large number of machines. Data grid computing composes sets of independent tasks each of which require massive distributed data sets that may each be replicated on different resources. To reduce the completion time of the application and improve the performance of the grid, appropriate computing resources should be selected to execute the tasks and appropriate storage resources selected to serve the files required by the tasks. So the problem can be broken into two sub-problems: selection of storage resources and assignment of tasks to computing resources. This paper proposes a scheduler, which is broken into three parts that can run in parallel and uses both parallel tabu search and a parallel genetic algorithm. Finally, the proposed algorithm is evaluated by comparing it with other related algorithms, which target minimizing makespan. Simulation results show that the proposed approach can be a good choice for scheduling large data grid applications.  相似文献   

10.
The last decade has seen a substantial increase in commodity computer and network performance, mainly as a result of faster hardware and more sophisticated software. Nevertheless, there are still problems, in the fields of science, engineering, and business, which cannot be effectively dealt with using the current generation of supercomputers. In fact, due to their size and complexity, these problems are often very numerically and/or data intensive and consequently require a variety ofheterogeneous resources that are not available on a single machine. A number of teams have conducted experimental studies on the cooperative use of geographically distributed resources unified to act as a single powerful computer. This new approach is known by several names, such as metacomputing, scalable computing, global computing, Internet computing, and more recently peer‐to‐peer or Grid computing. The early efforts in Grid computing started as a project to link supercomputing sites, but have now grown far beyond their original intent. In fact, many applications can benefit from the Grid infrastructure, including collaborative engineering, data exploration, high‐throughput computing, and of course distributed supercomputing. Moreover, due to the rapid growth of the Internet and Web, there has been a rising interest in Web‐based distributed computing, and many projects have been started and aim to exploit the Web as an infrastructure for running coarse‐grained distributed and parallel applications. In this context, the Web has the capability to be a platform for parallel and collaborative work as well as a key technology to create a pervasive and ubiquitous Grid‐based infrastructure. This paper aims to present the state‐of‐the‐art of Grid computing and attempts to survey the major international efforts in developing this emerging technology. Copyright © 2002 John Wiley & Sons, Ltd.  相似文献   

11.
网格是继Internet和Web之后第三次信息技术革命,最终将改变分布式资源的共享和服务方式。该文主要讨论了海量数据的产生、存储、处理,以及其对数据网格技术的需求,分析了欧洲数据网格和LHC计算网格的功能,并探讨了网格技术研究的最新情况。  相似文献   

12.
13.
Scientific research is becoming increasingly dependent on the large-scale analysis of data using distributed computing infrastructures (Grid, cloud, GPU, etc.). Scientific computing (Petitet et al. 1999) aims at constructing mathematical models and numerical solution techniques for solving problems arising in science and engineering. In this paper, we describe the services of an integrated portal based on the P-Grade (Parallel Grid Run-time and Application Development Environment) portal (http://www.p-grade.hu) that enables the solution of large-scale linear systems of equations using direct solvers, makes easier the use of parallel block iterative algorithm and provides an interface for parallel decision making algorithms. The ultimate goal is to develop a single sign on integrated multi-service environment providing an easy access to different kind of mathematical calculations and algorithms to be performed on hybrid distributed computing infrastructures combining the benefits of large clusters, Grid or cloud, when needed.  相似文献   

14.
We present a framework for a parallel programming model by remote procedure calls, which bridge large-scale computing resource pools managed by multiple Grid-enabled job scheduling systems. With this system, the user can exploit not only remote servers and clusters, but also the computing resources provided by Grid-enabled job scheduling systems located on different sites. This framework requires a Grid remote procedure call (RPC) system to decouple the computation in a remote node from the Grid RPC mechanism and uses document-based communication rather than connection-based communication. We implemented the proposed framework as an extension of the OmniRPC system, which is a Grid RPC system for parallel programming. We designed a general interface to easily adapt the OmniRPC system to various Grid-enabled job scheduling systems, including XtremWeb, CyberGRIP, Condor and Grid Engine. We show the preliminary performance of these implementations using a phylogenetic application. We found that the proposed system can achieve approximately the same performance as OmniRPC and can handle interruptions in worker programs on remote nodes. Yoshihiro Nakajima is a Research Fellow of the Japan Society for the Promotion of Science  相似文献   

15.
A Grid environment can be viewed as a virtual computing architecture that provides the ability to perform higher throughput computing by taking advantage of many computers geographically dispersed and connected by a network. Bioinformatics applications stand to gain in such a distributed environment in terms of increased availability, reliability and efficiency of computational resources. There is already considerable research in progress toward applying parallel computing techniques on bioinformatics methods, such as multiple sequence alignment, gene expression analysis and phylogenetic studies. In order to cope with the dimensionality issue, most machine learning methods either focus on specific groups of proteins or reduce the size of the original data set and/or the number of attributes involved. Grid computing could potentially provide an alternative solution to this problem, by combining multiple approaches in a seamless way. In this paper we introduce a unifying methodology coupling the strengths of the Grid with the specific needs and constraints of the major bioinformatics approaches. We also present a tool that implements this process and allows researchers to assess the computational needs for a specific task and optimize the allocation of available resources for its efficient completion.  相似文献   

16.
The existence of good probabilistic models for the job arrival process and the delay components introduced at different stages of job processing in a Grid environment is important for the improved understanding of the Grid computing concept. In this study, we present a thorough analysis of the job arrival process in the EGEE infrastructure and of the time durations a job spends at different states in the EGEE environment. We define four delay components of the total job delay and model each component separately. We observe that the job inter-arrival times at the Grid level can be adequately modelled by a rounded exponential distribution, while the total job delay (from the time it is generated until the time it completes execution) is dominated by the computing element’s register and queuing times and the worker node’s execution times. Further, we evaluate the efficiency of the EGEE environment by comparing the job total delay performance with that of a hypothetical ideal super-cluster and conclude that we would obtain similar performance if we submitted the same workload to a super-cluster of size equal to 34% of the total average number of CPUs participating in the EGEE infrastructure. We also analyze the job inter-arrival times, the CE’s queuing times, the WN’s execution times, and the data sizes exchanged at the kallisto.hellasgrid.gr cluster, which is node in the EGEE infrastructure. In contrast to the Grid level, we find that at the cluster level the job arrival process exhibits self-similarity/long-range dependence. Finally, we propose simple and intuitive models for the job arrival process and the execution times at the cluster level.  相似文献   

17.
陶金花  苏林  李树楷 《计算机应用》2007,27(10):2578-2580
分析LiDAR数据处理流程,结合开放网格服务体系结构(OGSA),提出一种LiDAR数据处理平台体系,将数据处理任务合理划分并分配到各个分布的网格节点上,通过各节点并行、协同计算,达到提高运算速度的目的。最后以对激光点云重采样生成格网DEM为例,说明算法在该体系下的计算过程。  相似文献   

18.
关于网格及其它分布计算技术的若干问题的讨论   总被引:5,自引:0,他引:5  
1.引言在“网格:面向虚拟组织的资源共享技术”一文中,我们主要给出了由Ian Foster等定义的网格及相关基本概念和研究领域,讨论了网格的基本理念和关键技术。在“网格体系结构详解”一文中,详述了Globus项目提出的网格体系结构的构成及功能。这些内容旨在说明网格是什么。实际上,我们也可以从另一方面,或不同的角度来观察和认识网格。比  相似文献   

19.
高性能网格并行计算   总被引:4,自引:0,他引:4  
对高性能计算的各种方式进行了分析和比较,并阐述了网格和元计算的关系。通过对当前各种网格工程的透视,论述了网格体系结构和网格服务语义。探讨了网格的两个关键特点:异构性和动态性及其解决方法。对于认识网格概念以及指明未来高性能并行计算发展方向有一定意义。  相似文献   

20.
《Parallel Computing》2007,33(7-8):467-487
The approaches to deal with scheduling and load balancing on PC-based cluster systems are famous and well-known. Self-scheduling schemes, which are suitable for parallel loops with independent iterations on cluster computer system, they have been designed in the past. In this paper, we propose a new scheme that can adjust the scheduling parameter dynamically on an extremely heterogeneous PC-based cluster and Grid computing environments in order to improve system performance. A Grid computing environment consists of multiple PC-based clusters is constructed using Globus Toolkit and MPICH-G2 middleware. The experimental results show that our scheduling can result in higher performance than other similar schemes on Grid computing environments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号