共查询到20条相似文献,搜索用时 234 毫秒
1.
Junwei Zhang Bu-Sung Lee Xueyan Tang Chai-Kiat Yeo 《The Journal of supercomputing》2011,56(3):245-269
Data Grid has evolved to be the solution for data-intensive applications, such as High Energy Physics (HEP), astrophysics,
and computational genomics. These applications usually have large input of data to be analyzed and these input data are widely
replicated across Data Grid to improve the performance. The job scheduling performance on traditional computing jobs can be
studied using queuing theory. However, with the addition of data transfer, the job scheduling performance is too complex to
be modeled. In this research, we study the impact of data transfer on the performance of job scheduling in the Data Grid environment.
We have proposed a parallel downloading system that supports replicating data fragments and parallel downloading of replicated
data fragments, to improve the job scheduling performance. The performance of the parallel downloading system is compared
with non-parallel downloading system, using three scheduling heuristics: Shortest Turnaround Time (STT), Least Relative Load
(LRL) and Data Present (DP). Our simulation results show that the proposed parallel download approach greatly improves the
Data Grid performance for all three scheduling algorithms, in terms of the geometric mean of job turnaround time. The advantage
of parallel downloading system is most evident when the Data Grid has relatively low network bandwidth and relatively high
computing power. 相似文献
2.
Vladimir V. Korkhov Valeria V. Krzhizhanovskaya P.M.A. Sloot 《Journal of Parallel and Distributed Computing》2008
We address the problem of porting parallel distributed applications from static homogeneous cluster environments to dynamic heterogeneous Grid resources. We introduce a generic technique for adaptive load balancing of parallel applications on heterogeneous resources and evaluate it using a case study application: a Virtual Reactor for simulation of plasma chemical vapour deposition. This application has a modular architecture with a number of loosely coupled components suitable for distribution over the Grid. It requires large parameter space exploration that allows using Grid resources for high-throughput computing. The Virtual Reactor contains a number of parallel solvers originally designed for homogeneous computer clusters that needed adaptation to the heterogeneity of the Grid. In this paper we study the performance of one of the parallel solvers, apply the technique developed for adaptive load balancing, evaluate the efficiency of this approach and outline an automated procedure for optimal utilization of heterogeneous Grid resources for high-performance parallel computing. 相似文献
3.
Arun Krishnan 《Concurrency and Computation》2005,17(13):1607-1623
Improvements in the performance of processors and networks have made it feasible to treat collections of workstations, servers, clusters and supercomputers as integrated computing resources or Grids. However, the very heterogeneity that is the strength of computational and data Grids can also make application development for such an environment extremely difficult. Application development in a Grid computing environment faces significant challenges in the form of problem granularity, latency and bandwidth issues as well as job scheduling. Currently existing Grid technologies limit the development of Grid applications to certain classes, namely, embarrassingly parallel, hierarchical parallelism, work flow and database applications. Of all these classes, embarrassingly parallel applications are the easiest to develop in a Grid computing framework. The work presented here deals with creating a Grid‐enabled, high‐throughput, standalone version of a bioinformatics application, BLAST, using Globus as the Grid middleware. BLAST is a sequence alignment and search technique that is embarrassingly parallel in nature and thus amenable to adaptation to a Grid environment. A detailed methodology for creating the Grid‐enabled application is presented, which can be used as a template for the development of similar applications. The application has been tested on a ‘mini‐Grid’ testbed and the results presented here show that for large problem sizes, a distributed, Grid‐enabled version can help in significantly reducing execution times. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献
4.
5.
Over the past few years, research and development in bioinformatics (e.g. genomic sequence alignment) has grown with each passing day fueling continuing demands for vast computing power to support better performance. This trend usually requires solutions involving parallel computing techniques because cluster computing technology reduces execution times and increases genomic sequence alignment efficiency. One example, mpiBLAST is a parallel version of NCBI BLAST that combines NCBI BLAST with message passing interface (MPI) standards. However, as most laboratories cannot build up powerful cluster computing environments, Grid computing framework concepts have been designed to meet the need. Grid computing environments coordinate the resources of distributed virtual organizations and satisfy the various computational demands of bioinformatics applications. In this paper, we report on designing and implementing a BioGrid framework, called G‐BLAST, that performs genomic sequence alignments using Grid computing environments and accessible mpiBLAST applications. G‐BLAST is also suitable for cluster computing environments with a server node and several client nodes. G‐BLAST is able to select the most appropriate work nodes, dynamically fragment genomic databases, and self‐adjust according to performance data. To enhance G‐BLAST capability and usability, we also employ a WSRF Grid Service Portal and a Grid Service GUI desk application for general users to submit jobs and host administrators to maintain work nodes. Copyright © 2008 John Wiley & Sons, Ltd. 相似文献
6.
格点量子色动力学(QCD)是从第一原理出发求解QCD的非微扰方法, 通过在超立方格子上模拟胶子场和费米子场相互作用, 其计算结果被认为是对强相互作用现象的可靠描述, 格点计算对QCD理论研究意义重大. 但是, 格点QCD计算具有非常大的计算自由度导致计算效率难以提升, 通常对格子体系采用区域分解的方法实现并行计算的可扩展性, 但如何提升数据并行计算效率仍然是核心问题. 本文以格点QCD典型软件Grid为例, 研究格点QCD计算中的数据并行计算模式, 围绕格点QCD中的复杂张量计算和提升大规模并行计算效率的问题, 开展格点QCD方法中数据并行计算特征的理论分析, 之后针对Grid软件的SIMD和OpenMP等具体数据并行计算方式进行性能测试分析, 最后阐述数据并行计算模式对格点QCD计算应用的重要意义. 相似文献
7.
提出了一个基于网络划分和分布式并行运算的P/G网快速验证方法.对于各子网运算,采用带加速子网运算策略的Cholesky分解法;并根据各个子网运算相互独立的特点,采用基于MPI(Message Passing Interface)的并行结构对子网络运算进行分布式并行运算.实验证明,该快速验证方法在运算时间和内存占用上效果十分良好. 相似文献
8.
Grid Data Management: Open Problems and New Issues 总被引:3,自引:0,他引:3
Initially developed for the scientific community, Grid computing is now gaining much interest in important areas such as enterprise
information systems. This makes data management critical since the techniques must scale up while addressing the autonomy,
dynamicity and heterogeneity of the data sources. In this paper, we discuss the main open problems and new issues related
to Grid data management. We first recall the main principles behind data management in distributed systems and the basic techniques.
Then we make precise the requirements for Grid data management. Finally, we introduce the main techniques needed to address
these requirements. This implies revisiting distributed database techniques in major ways, in particular, using P2P techniques.
Work partially funded by ARA “Massive Data” of the French ministry of research (project Respire), the European Strep Grid4All
project, the CAPES–COFECUB Daad project and the CNPq–INRIA Gridata project. 相似文献
9.
Kenli Li Zhao Tong Dan Liu Teklay Tesfazghi Xiangke Liao 《Frontiers of Computer Science in China》2011,5(4):513-525
Grid computing is the combination of computer resources in a loosely coupled, heterogeneous, and geographically dispersed
environment. Grid data are the data used in grid computing, which consists of large-scale data-intensive applications, producing
and consuming huge amounts of data, distributed across a large number of machines. Data grid computing composes sets of independent
tasks each of which require massive distributed data sets that may each be replicated on different resources. To reduce the
completion time of the application and improve the performance of the grid, appropriate computing resources should be selected
to execute the tasks and appropriate storage resources selected to serve the files required by the tasks. So the problem can
be broken into two sub-problems: selection of storage resources and assignment of tasks to computing resources. This paper
proposes a scheduler, which is broken into three parts that can run in parallel and uses both parallel tabu search and a parallel
genetic algorithm. Finally, the proposed algorithm is evaluated by comparing it with other related algorithms, which target
minimizing makespan. Simulation results show that the proposed approach can be a good choice for scheduling large data grid
applications. 相似文献
10.
The last decade has seen a substantial increase in commodity computer and network performance, mainly as a result of faster hardware and more sophisticated software. Nevertheless, there are still problems, in the fields of science, engineering, and business, which cannot be effectively dealt with using the current generation of supercomputers. In fact, due to their size and complexity, these problems are often very numerically and/or data intensive and consequently require a variety ofheterogeneous resources that are not available on a single machine. A number of teams have conducted experimental studies on the cooperative use of geographically distributed resources unified to act as a single powerful computer. This new approach is known by several names, such as metacomputing, scalable computing, global computing, Internet computing, and more recently peer‐to‐peer or Grid computing. The early efforts in Grid computing started as a project to link supercomputing sites, but have now grown far beyond their original intent. In fact, many applications can benefit from the Grid infrastructure, including collaborative engineering, data exploration, high‐throughput computing, and of course distributed supercomputing. Moreover, due to the rapid growth of the Internet and Web, there has been a rising interest in Web‐based distributed computing, and many projects have been started and aim to exploit the Web as an infrastructure for running coarse‐grained distributed and parallel applications. In this context, the Web has the capability to be a platform for parallel and collaborative work as well as a key technology to create a pervasive and ubiquitous Grid‐based infrastructure. This paper aims to present the state‐of‐the‐art of Grid computing and attempts to survey the major international efforts in developing this emerging technology. Copyright © 2002 John Wiley & Sons, Ltd. 相似文献
11.
网格是继Internet和Web之后第三次信息技术革命,最终将改变分布式资源的共享和服务方式。该文主要讨论了海量数据的产生、存储、处理,以及其对数据网格技术的需求,分析了欧洲数据网格和LHC计算网格的功能,并探讨了网格技术研究的最新情况。 相似文献
12.
13.
Hrachya Astsatryan Vladimir Sahakyan Yuri Shoukouryan Michel Daydé Aurelie Hurault Ronan Guivarch Harutyun Terzyan Levon Hovhannisyan 《Journal of Grid Computing》2013,11(2):239-248
Scientific research is becoming increasingly dependent on the large-scale analysis of data using distributed computing infrastructures (Grid, cloud, GPU, etc.). Scientific computing (Petitet et al. 1999) aims at constructing mathematical models and numerical solution techniques for solving problems arising in science and engineering. In this paper, we describe the services of an integrated portal based on the P-Grade (Parallel Grid Run-time and Application Development Environment) portal (http://www.p-grade.hu) that enables the solution of large-scale linear systems of equations using direct solvers, makes easier the use of parallel block iterative algorithm and provides an interface for parallel decision making algorithms. The ultimate goal is to develop a single sign on integrated multi-service environment providing an easy access to different kind of mathematical calculations and algorithms to be performed on hybrid distributed computing infrastructures combining the benefits of large clusters, Grid or cloud, when needed. 相似文献
14.
Yoshihiro Nakajima Mitsuhisa Sato Yoshiaki Aida Taisuke Boku Franck Cappello 《Journal of Grid Computing》2008,6(2):141-157
We present a framework for a parallel programming model by remote procedure calls, which bridge large-scale computing resource
pools managed by multiple Grid-enabled job scheduling systems. With this system, the user can exploit not only remote servers
and clusters, but also the computing resources provided by Grid-enabled job scheduling systems located on different sites.
This framework requires a Grid remote procedure call (RPC) system to decouple the computation in a remote node from the Grid
RPC mechanism and uses document-based communication rather than connection-based communication. We implemented the proposed
framework as an extension of the OmniRPC system, which is a Grid RPC system for parallel programming. We designed a general
interface to easily adapt the OmniRPC system to various Grid-enabled job scheduling systems, including XtremWeb, CyberGRIP,
Condor and Grid Engine. We show the preliminary performance of these implementations using a phylogenetic application. We
found that the proposed system can achieve approximately the same performance as OmniRPC and can handle interruptions in worker
programs on remote nodes.
Yoshihiro Nakajima is a Research Fellow of the Japan Society for the Promotion of Science 相似文献
15.
Fotis E. Psomopoulos Author Vitae Pericles A. Mitkas Author Vitae 《Journal of Systems and Software》2010,83(7):1249-1257
A Grid environment can be viewed as a virtual computing architecture that provides the ability to perform higher throughput computing by taking advantage of many computers geographically dispersed and connected by a network. Bioinformatics applications stand to gain in such a distributed environment in terms of increased availability, reliability and efficiency of computational resources. There is already considerable research in progress toward applying parallel computing techniques on bioinformatics methods, such as multiple sequence alignment, gene expression analysis and phylogenetic studies. In order to cope with the dimensionality issue, most machine learning methods either focus on specific groups of proteins or reduce the size of the original data set and/or the number of attributes involved. Grid computing could potentially provide an alternative solution to this problem, by combining multiple approaches in a seamless way. In this paper we introduce a unifying methodology coupling the strengths of the Grid with the specific needs and constraints of the major bioinformatics approaches. We also present a tool that implements this process and allows researchers to assess the computational needs for a specific task and optimize the allocation of available resources for its efficient completion. 相似文献
16.
Konstantinos Christodoulopoulos Vasileios Gkamas Emmanouel A. Varvarigos 《Journal of Grid Computing》2008,6(1):77-101
The existence of good probabilistic models for the job arrival process and the delay components introduced at different stages
of job processing in a Grid environment is important for the improved understanding of the Grid computing concept. In this
study, we present a thorough analysis of the job arrival process in the EGEE infrastructure and of the time durations a job
spends at different states in the EGEE environment. We define four delay components of the total job delay and model each
component separately. We observe that the job inter-arrival times at the Grid level can be adequately modelled by a rounded
exponential distribution, while the total job delay (from the time it is generated until the time it completes execution)
is dominated by the computing element’s register and queuing times and the worker node’s execution times. Further, we evaluate
the efficiency of the EGEE environment by comparing the job total delay performance with that of a hypothetical ideal super-cluster
and conclude that we would obtain similar performance if we submitted the same workload to a super-cluster of size equal to
34% of the total average number of CPUs participating in the EGEE infrastructure. We also analyze the job inter-arrival times,
the CE’s queuing times, the WN’s execution times, and the data sizes exchanged at the kallisto.hellasgrid.gr cluster, which is node in the EGEE infrastructure. In contrast to the Grid level, we find that at the cluster level the job
arrival process exhibits self-similarity/long-range dependence. Finally, we propose simple and intuitive models for the job
arrival process and the execution times at the cluster level. 相似文献
17.
18.
关于网格及其它分布计算技术的若干问题的讨论 总被引:5,自引:0,他引:5
1.引言在“网格:面向虚拟组织的资源共享技术”一文中,我们主要给出了由Ian Foster等定义的网格及相关基本概念和研究领域,讨论了网格的基本理念和关键技术。在“网格体系结构详解”一文中,详述了Globus项目提出的网格体系结构的构成及功能。这些内容旨在说明网格是什么。实际上,我们也可以从另一方面,或不同的角度来观察和认识网格。比 相似文献
19.
20.
《Parallel Computing》2007,33(7-8):467-487
The approaches to deal with scheduling and load balancing on PC-based cluster systems are famous and well-known. Self-scheduling schemes, which are suitable for parallel loops with independent iterations on cluster computer system, they have been designed in the past. In this paper, we propose a new scheme that can adjust the scheduling parameter dynamically on an extremely heterogeneous PC-based cluster and Grid computing environments in order to improve system performance. A Grid computing environment consists of multiple PC-based clusters is constructed using Globus Toolkit and MPICH-G2 middleware. The experimental results show that our scheduling can result in higher performance than other similar schemes on Grid computing environments. 相似文献