首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
Improvements in the performance of processors and networks have made it feasible to treat collections of workstations, servers, clusters and supercomputers as integrated computing resources or Grids. However, the very heterogeneity that is the strength of computational and data Grids can also make application development for such an environment extremely difficult. Application development in a Grid computing environment faces significant challenges in the form of problem granularity, latency and bandwidth issues as well as job scheduling. Currently existing Grid technologies limit the development of Grid applications to certain classes, namely, embarrassingly parallel, hierarchical parallelism, work flow and database applications. Of all these classes, embarrassingly parallel applications are the easiest to develop in a Grid computing framework. The work presented here deals with creating a Grid‐enabled, high‐throughput, standalone version of a bioinformatics application, BLAST, using Globus as the Grid middleware. BLAST is a sequence alignment and search technique that is embarrassingly parallel in nature and thus amenable to adaptation to a Grid environment. A detailed methodology for creating the Grid‐enabled application is presented, which can be used as a template for the development of similar applications. The application has been tested on a ‘mini‐Grid’ testbed and the results presented here show that for large problem sizes, a distributed, Grid‐enabled version can help in significantly reducing execution times. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

2.
冯晓龙  高静 《计算机仿真》2020,37(2):231-236
针对生物信息分析中基因短序列比对任务计算耗时长的问题,采用Spark平台、RDD数据集以及分布式文件系统HDFS设计了一种分布式计算模型。采用分而治之的策略将庞大的计算任务分割为多个互不重叠的小任务在分布式集群上并行执行。通过基于位置偏移量等分的数据分区算法实现数据的分发;通过将基因短序列封装入RDD数据集的方法实现了短序列的逐条处理;通过将基因比对算法传入RDD的Map函数的方法实现了基因序列的比对。计算模型的实现使得串行比对算法在分布式集群上可扩展,并显著降低了计算耗时,计算结果可与后续的生物信息分析工作相兼容。实验结果证明计算模型具有较好的稳定性和可扩展性,在Spark集群上取得了优秀的加速比。  相似文献   

3.
A Grid environment can be viewed as a virtual computing architecture that provides the ability to perform higher throughput computing by taking advantage of many computers geographically dispersed and connected by a network. Bioinformatics applications stand to gain in such a distributed environment in terms of increased availability, reliability and efficiency of computational resources. There is already considerable research in progress toward applying parallel computing techniques on bioinformatics methods, such as multiple sequence alignment, gene expression analysis and phylogenetic studies. In order to cope with the dimensionality issue, most machine learning methods either focus on specific groups of proteins or reduce the size of the original data set and/or the number of attributes involved. Grid computing could potentially provide an alternative solution to this problem, by combining multiple approaches in a seamless way. In this paper we introduce a unifying methodology coupling the strengths of the Grid with the specific needs and constraints of the major bioinformatics approaches. We also present a tool that implements this process and allows researchers to assess the computational needs for a specific task and optimize the allocation of available resources for its efficient completion.  相似文献   

4.
The last decade has seen a substantial increase in commodity computer and network performance, mainly as a result of faster hardware and more sophisticated software. Nevertheless, there are still problems, in the fields of science, engineering, and business, which cannot be effectively dealt with using the current generation of supercomputers. In fact, due to their size and complexity, these problems are often very numerically and/or data intensive and consequently require a variety ofheterogeneous resources that are not available on a single machine. A number of teams have conducted experimental studies on the cooperative use of geographically distributed resources unified to act as a single powerful computer. This new approach is known by several names, such as metacomputing, scalable computing, global computing, Internet computing, and more recently peer‐to‐peer or Grid computing. The early efforts in Grid computing started as a project to link supercomputing sites, but have now grown far beyond their original intent. In fact, many applications can benefit from the Grid infrastructure, including collaborative engineering, data exploration, high‐throughput computing, and of course distributed supercomputing. Moreover, due to the rapid growth of the Internet and Web, there has been a rising interest in Web‐based distributed computing, and many projects have been started and aim to exploit the Web as an infrastructure for running coarse‐grained distributed and parallel applications. In this context, the Web has the capability to be a platform for parallel and collaborative work as well as a key technology to create a pervasive and ubiquitous Grid‐based infrastructure. This paper aims to present the state‐of‐the‐art of Grid computing and attempts to survey the major international efforts in developing this emerging technology. Copyright © 2002 John Wiley & Sons, Ltd.  相似文献   

5.
《Parallel Computing》2014,40(10):697-709
In order to run tasks in a parallel and load-balanced fashion, existing scientific parallel applications such as mpiBLAST introduce a data-initializing stage to move database fragments from shared storage to local cluster nodes. Unfortunately, with the exponentially increasing size of sequence databases in today’s big data era, such an approach is inefficient.In this paper, we develop a scalable data access framework to solve the data movement problem for scientific applications that are dominated by “read” operation for data analysis. SDAFT employs a distributed file system (DFS) to provide scalable data access for parallel sequence searches. SDAFT consists of two interlocked components: (1) a data centric load-balanced scheduler (DC-scheduler) to enforce data-process locality and (2) a translation layer to translate conventional parallel I/O operations into HDFS I/O. By experimenting our SDAFT prototype system with real-world database and queries at a wide variety of computing platforms, we found that SDAFT can reduce I/O cost by a factor of 4–10 and double the overall execution performance as compared with existing schemes.  相似文献   

6.
The advent of heterogeneous and distributed environments, such as Grid environments, made feasible the solution to computational‐intensive problems in a reliable and cost‐effective manner. In parallel, workflows with increased complexity that require specialized systems to deal with them are emerging, so as to carry out more composite and mission‐critical applications. In that rationale, quality‐of‐service (QoS) issues need to be tackled in order to ensure that each application satisfies the corresponding user requirements. Therefore, considering the quality provision aspect as fundamental for enabling Grid applications to become QoS compliant, we present an approach for service selection using QoS criteria. The latter is achieved with a suite of components that allow the different mappings of application workflow processes to Grid services that not only meet the user goals and requirements but also maximize his/her benefit in terms of the offered QoS level. We also demonstrate the operation of the aforementioned suite of components and evaluate its performance and effectiveness using a Grid scenario, based on a 3D image rendering application. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

7.
Chee Shin Yeo  Rajkumar Buyya 《Software》2006,36(13):1381-1419
In utility‐driven cluster computing, cluster Resource Management Systems (RMSs) need to know the specific needs of different users in order to allocate resources according to their needs. This in turn is vital to achieve service‐oriented Grid computing that harnesses resources distributed worldwide based on users' objectives. Recently, numerous market‐based RMSs have been proposed to make use of real‐world market concepts and behavior to assign resources to users for various computing platforms. The aim of this paper is to develop a taxonomy that characterizes and classifies how market‐based RMSs can support utility‐driven cluster computing in practice. The taxonomy is then mapped to existing market‐based RMSs designed for both cluster and other computing platforms to survey current research developments and identify outstanding issues. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

8.
Cloud computing, a common business model, provides cloud resources on demand to consumers over the Internet. However, because cloud computing lacks a uniform method of representing knowledge, which can offer customers a comprehensive solution for managing and developing cloud applications, cloud computing has low reuse potential. This work proposes a Semantic Agent as a Service (SAaaS), which was developed using Unified Modeling Language modelling. The SAaaS architecture is based on research into Cloud Computing, Semantic Web and Multi‐Agent Systems. The architecture can be combined with existing cloud service models, such as Software as a Service, Platform as a Service and Infrastructure as a Service, to design intelligent cloud computing applications. To demonstrate the efficacy of SAaaS, a Semantic‐based Project Resources Sharing Platform, an intelligent cloud computing application based on the SAaaS framework, is implemented to provide project resources on demand, consistent with the needs of project members.  相似文献   

9.
陶金花  苏林  李树楷 《计算机应用》2007,27(10):2578-2580
分析LiDAR数据处理流程,结合开放网格服务体系结构(OGSA),提出一种LiDAR数据处理平台体系,将数据处理任务合理划分并分配到各个分布的网格节点上,通过各节点并行、协同计算,达到提高运算速度的目的。最后以对激光点云重采样生成格网DEM为例,说明算法在该体系下的计算过程。  相似文献   

10.
Traditional distributed filesystem technologies designed for local and campus area networks do not adapt well to wide area Grid computing environments. To address this problem, we have designed the Chirp distributed filesystem, which is designed from the ground up to meet the needs of Grid computing. Chirp is easily deployed without special privileges, provides strong and flexible security mechanisms, tunable consistency semantics, and clustering to increase capacity and throughput. We demonstrate that many of these features also provide order-of-magnitude performance increases over wide area networks. We describe three applications in bioinformatics, biometrics, and gamma ray physics that each employ Chirp to attack large scale data intensive problems.  相似文献   

11.
Parallel loop self‐scheduling on parallel and distributed systems has been a critical problem and it is becoming more difficult to deal with in the emerging heterogeneous cluster computing environments. In the past, some self‐scheduling schemes have been proposed as applicable to heterogeneous cluster computing environments. In recent years, multicore computers have been widely included in cluster systems. However, previous researches into parallel loop self‐scheduling did not consider certain aspects of multicore computers; for example, it is more appropriate for shared‐memory multiprocessors to adopt Open Multi‐Processing (OpenMP) for parallel programming. In this paper, we propose a performance‐based approach using hybrid OpenMP and MPI parallel programming, which partition loop iterations according to the performance weighting of multicore nodes in a cluster. Because iterations assigned to one MPI process are processed in parallel by OpenMP threads run by the processor cores in the same computational node, the number of loop iterations allocated to one computational node at each scheduling step depends on the number of processor cores in that node. Experimental results show that the proposed approach performs better than previous schemes. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

12.
The service‐oriented architecture paradigm can be exploited for the implementation of data and knowledge‐based applications in distributed environments. The Web services resource framework (WSRF) has recently emerged as the standard for the implementation of Grid services and applications. WSRF can be exploited for developing high‐level services for distributed data mining applications. This paper describes Weka4WS, a framework that extends the widely used open source Weka toolkit to support distributed data mining on WSRF‐enabled Grids. Weka4WS adopts the WSRF technology for running remote data mining algorithms and managing distributed computations. The Weka4WS user interface supports the execution of both local and remote data mining tasks. On every computing node, a WSRF‐compliant Web service is used to expose all the data mining algorithms provided by the Weka library. The paper describes the design and implementation of Weka4WS using the WSRF libraries and services provided by Globus Toolkit 4. A performance analysis of Weka4WS for executing distributed data mining tasks in different network scenarios is presented. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

13.
提出了一种多序列比对ClustalW算法并行化处理的新方法ParaClustalW,该方法使用桌面网格计算平台作为高性能编程环境和运行平台.分析了多序列比对算法在桌面网格平台上的任务划分方式、并行化策略和实现技术.ParaClustalW策略考虑到序列的数目与序列的长度等因素,实现任务划分的均衡性.经实验证明,Para...  相似文献   

14.
Several classes of scientific and commercial applications require the execution of a large number of independent tasks. One highly successful and low‐cost mechanism for acquiring the necessary computing power for these applications is the ‘public‐resource computing’, or ‘desktop Grid’ paradigm, which exploits the computational power of private computers. So far, this paradigm has not been applied to data mining applications for two main reasons. First, it is not straightforward to decompose a data mining algorithm into truly independent sub‐tasks. Second, the large volume of the involved data makes it difficult to handle the communication costs of a parallel paradigm. This paper introduces a general framework for distributed data mining applications called Mining@home. In particular, we focus on one of the main data mining problems: the extraction of closed frequent itemsets from transactional databases. We show that it is possible to decompose this problem into independent tasks, which however need to share a large volume of the data. We thus introduce a data‐intensive computing network, which adopts a P2P topology based on super peers with caching capabilities, aiming to support the dissemination of large amounts of information. Finally, we evaluate the execution of a pattern extraction task on such network. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

15.
This paper presents a convergence of distributed key‐value storage systems in clouds and supercomputers. It specifically presents ZHT, a zero‐hop distributed key‐value store system, which has been tuned for the requirements of high‐end computing systems. ZHT aims to be a building block for future distributed systems, such as parallel and distributed file systems, distributed job management systems, and parallel programming systems. ZHT has some important properties, such as being lightweight, dynamically allowing nodes join and leave, fault tolerant through replication, persistent, scalable, and supporting unconventional operations such as append, compare and swap, callback in addition to the traditional insert/lookup/remove. We have evaluated ZHT's performance under a variety of systems, ranging from a Linux cluster with 64 nodes, an Amazon EC2 virtual cluster up to 96 nodes, to an IBM Blue Gene/P supercomputer with 8K nodes. We compared ZHT against other key‐value stores and found it offers superior performance for the features and portability it supports. This paper also presents several real systems that have adopted ZHT, namely, FusionFS (a distributed file system), IStore (a storage system with erasure coding), MATRIX (distributed scheduling), Slurm++ (distributed HPC job launch), Fabriq (distributed message queue management); all of these real systems have been simplified because of key‐value storage systems and have been shown to outperform other leading systems by orders of magnitude in some cases. It is important to highlight that some of these systems are rooted in HPC systems from supercomputers, while others are rooted in clouds and ad hoc distributed systems; through our work, we have shown how versatile key‐value storage systems can be in such a variety of environments. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

16.
基于代理的网格计算中间件   总被引:11,自引:0,他引:11  
WADE系统是基于代理技术实现的一个可屏蔽异构和分布性的动态自适应的校园计算网格,提出了基于代理技术在校园网络内实现并行计算的方法,详细论述了基于代理的网格计算中间件的体系结构和主要模块功能,阐述了利用代理实现异构编译、协同计算的过程,给出了代理的Java实现方法,利用软件代理实现网格计算中间件,可以解决异构计算平台下多种并行编程环境的协同计算问题,为用户提供统一的服务接口,这将大大增强系统的可用性。  相似文献   

17.
QoS guided Min-Min heuristic for grid task scheduling   总被引:75,自引:1,他引:74       下载免费PDF全文
Task scheduling is an integrated component of computing.With the emergence of Grid and ubiquitous computing,new challenges appear in task scheduling based on properties such as security,quality of service,and lack of central control within distributed administrative domains.A Grid task scheduling framework must be able to deal with these issues.One of the goals of Grid task scheduling is to achivev high system throughput while matching applications with the available computing resources.This matching of resources in a non-deterministically shared heterogeneous environment leads to concerns over Quality of Service (QoS).In this paper a novel QoS guided task scheduling algorithm for Grid computing is introduced.The proposed novel algorithm is based on a general adaptive scheduling heuristics that includes QoS guidance.The algorithm is evaluated within a simulated Grid environment.The experimental results show that the nwe QoS guided Min-Min heuristic can lead to significant performance gain for a variety of applications.The approach is compared with others based on the quality of the prediction formulated by inaccurate information.  相似文献   

18.
Confidence in a pairwise local sequence alignment is a fundamental problem in bioinformatics. For huge DNA sequences, this problem is highly compute‐intensive because it involves evaluating hundreds of local alignments to construct an empirical score distribution. Recent parallel solutions support only kilobyte‐scale sequence sizes and/or are based on sophisticated infrastructures that are not available for most of the research labs. This paper presents an efficient parallel solution for evaluating the statistical significance for a pair of huge DNA sequences using cloud infrastructures. This solution can receive requests from various researchers via web‐portal and allocate resources according to their demand. In this way, the benefits of cloud‐based services can be achieved. The fundamental innovation of this research work is proposing an efficient solution that utilizes both shared and distributed memory architectures via cloud technology to enhance the performance of evaluating the statistical significance for pair of DNA sequences. Therefore, the restriction on the sequence sizes is released to be in megabyte‐scale, which was not supported before for the statistical significance problem. The performance evaluation of the proposed solution was carried out on Microsoft's cloud and compared with the existing parallel solutions. The results show that the processing speed outperforms the recent cluster solutions that target the same problem. In addition, the performance metrics exhibit linear behavior for the addressed number of instances. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

19.
The paper presents a new open‐source framework called KernelHive for multilevel parallelization of computations among various clusters, cluster nodes, and finally, among both CPUs and GPUs for a particular application. An application is modeled as an acyclic directed graph with a possibility to run nodes in parallel and automatic expansion of nodes (called node unrolling) depending on the number of computation units available. A methodology is proposed for parallelization and mapping of an application to the environment that includes selection of devices using a chosen optimizer, selection of best grid configurations for compute devices, optimization of data partitioning and the execution. One of possibly many scheduling algorithms can be selected considering execution time, power consumption, and so on. An easy‐to‐use GUI is provided for modeling and monitoring with a repository of ready‐to‐use constructs and computational kernels. The methodology, execution times, and scalability have been demonstrated for a distributed and parallel password‐breaking example run in a heterogeneous environment with a cluster and servers with different numbers of nodes and both CPUs and GPUs. Additionally, performance of the framework has been compared with an MPI + OpenCL implementation using a parallel geospatial interpolation application employing up to 40 cluster nodes and 320 cores. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

20.
MapReduce编程模型是广泛应用于云计算环境下处理海量数据的一种并行计算框架。然而该框架下的面向数据密集型计算,集群节点间的数据传输依赖性较强,造成节点间的消息处理负载过重。提出基于消息代理机制的MapReduce改进模型,优化数据流。经实验数据表明,基于消息代理机制的MapReduce框架能提高数据密集型应用上的负载均衡。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号