共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Richard McClatchey Ashiq Anjum Heinz Stockinger Arshad Ali Ian Willers Michael Thomas 《Journal of Grid Computing》2007,5(1):43-64
In Grids scheduling decisions are often made on the basis of jobs being either data or computation intensive: in data intensive
situations jobs may be pushed to the data and in computation intensive situations data may be pulled to the jobs. This kind
of scheduling, in which there is no consideration of network characteristics, can lead to performance degradation in a Grid
environment and may result in large processing queues and job execution delays due to site overloads. In this paper we describe
a Data Intensive and Network Aware (DIANA) meta-scheduling approach, which takes into account data, processing power and network
characteristics when making scheduling decisions across multiple sites. Through a practical implementation on a Grid testbed,
we demonstrate that queue and execution times of data-intensive jobs can be significantly improved when we introduce our proposed
DIANA scheduler. The basic scheduling decisions are dictated by a weighting factor for each potential target location which
is a calculated function of network characteristics, processing cycles and data location and size. The job scheduler provides
a global ranking of the computing resources and then selects an optimal one on the basis of this overall access and execution
cost. The DIANA approach considers the Grid as a combination of active network elements and takes network characteristics
as a first class criterion in the scheduling decision matrix along with computations and data. The scheduler can then make
informed decisions by taking into account the changing state of the network, locality and size of the data and the pool of
available processing cycles. 相似文献
3.
通过对网格计算中资源协同调度机制的研究,根据该机制的资源协同分配需求,提出一个以Globus Toolkit为平台的基于网格计算的分布协同计算模型。在基于经济模型的作业调度机制的竞价机制中,给出一个较为完善的价格函数模型以及相应的作业预测完成时间模型,利用设计的价格函数模型实现资源提供的主观能动性,使得资源的调配更加合理、快速。同时,在分布式计算模型中还给出一个基于备份的作业调度容错机制。 相似文献
4.
随着信息技术的持续发展和广泛使用,大量的数据不断被收集和存储,对分布的目标数据进行数据挖掘处理任务的规模越来越大,而传统的数据挖掘无法解决分布式海量数据挖掘的问题,分布式系统很难解决异构的操作系统和协议问题.网格技术的发展成熟,使得利用网格环境下强大的资源共享异构虚拟组织实现协同并行数据挖掘成为网格技术应用的一个研究重点.本文提出基于网格环境的Agent技术、多线程和集中表决技术的关联规则并行挖掘方案,并在GT4下实验验证,实现对大规模数据的网格环境分布式并行数据挖掘. 相似文献
5.
Replication of data is a popular and convenient form of data organization in distributed systems. Together with its advantages, data replication brings specific problems, which have to be solved by system designers. This paper deals with methods for resolving inconsistencies in data replication. The problem investigated in this work is: How to restore the data consistency if after some time of functioning their versions differ from each other on some sites of the system. We propose a solution of this problem by determining consensus of replicated data versions. We assume that there is a possibility to define a distance function between versions of replicated data, next different consensus choice functions are defined and analyzed. A numerical and practical example of applying these methods is also presented. 相似文献
6.
The Grid is an infrastructure for resource sharing and coordinated use of those resources in dynamic heterogeneous distributed environments. The effective use of a Grid requires the definition of metadata for managing the heterogeneity of involved resources that include computers, data, network facilities, and software tools provided by different organizations. Metadata management becomes a key issue when complex applications, such as data-intensive simulations and data mining applications, are executed on a Grid. This paper discusses metadata models for heterogeneous resource management in Grid-based data mining applications. In particular, it discusses how resources are represented and managed in the Knowledge Grid, a framework for Grid-enabled distributed data mining. The paper illustrates how XML-based metadata is used to describe data mining tools, data sources, mining models, and execution plans, and how metadata is used for the design and execution of distributed knowledge discovery applications on Grids. 相似文献
7.
为对网格文件进行数据访问,提出一个带有标准服务器和协议的高性能存储系统——联众系统。采用集群环境中的数据管理方法,通过使用多个标准数据服务器访问多个站点,采用的技术包括Grid FTP以及OGSA Byte I/O界面等。实验结果证明,联众系统能够被用作真实网格环境中的并行文件系统,并获得较好的数据访问结果。 相似文献
8.
陈雪兆 《数字社区&智能家居》2009,(36)
阐述网格计算概念及其与传统分布式计算的区别。介绍了一种分布式关联规则挖掘算法,并对其进行了几点改进,最后用网格服务实现了该算法。实验测试结果表明,使用网格服务可以合并若干台计算机的计算能力来减少算法的运行时间。 相似文献
9.
10.
GridFTP作为网格环境中安全高效的数据传输协议,是对标准FTP协议的扩展。文中介绍了GridFTP的特性、实现及性能,分析了该协议的的发展前景,并详细说明了Linux系统中配置GridFTP服务的步骤。GridFTP由于具备标准FTP所没有的第三方控制传输、并行传输等新特征,已成为主要的网格数据传输协议。 相似文献
11.
科学和工商业应用需要分析分布在各异构站点的海量数据。这就需要合适的分布式并行系统来存储和管理数据。网格为分布式数据挖掘和知识发现提供了有效的计算支持。文中在讨论知识网格体系结构的基础上,利用可视化网格应用环境VEGA实现了基于网格的分布式数据挖掘过程。 相似文献
12.
13.
14.
15.
在线事务处理(online transaction processing,OLTP)应用面临并发量和数据量持续增长的问题,并且高并发读写操作使得后台数据库成为瓶颈。内存数据网格(in-memory data grid,IMDG)是基于内存的新型分布式数据访问平台,是解决系统数据库写操作瓶颈的有效技术途径之一。然而内存数据网格中数据访问操作涉及的数据分布是不可预知的,需要提供分布式事务保障。针对内存数据网格的系统特点,提出了一种分布式事务保障机制,设计实现了事务处理模型、请求处理和数据定位方法以及事务保障协议,并规范化地定义了客户端与服务器端以及服务器端之间的操作接口。在事务处理基准测试TPC-W上的实验结果表明,新机制可以提高在线事务应用的处理速度,并具备良好的扩展性。 相似文献
16.
17.
副本定位技术是数据网格的关键技术。本文采用改进的Chord算法,同时借鉴了结构化P2P技术,提出了一种基于结构化P2P模式的副本定位方法。该方法能在一定程度上解决局域网间的“绕路”问题,提高了定位效率,节省了传输时间和带宽,优化了数据网格的性能。 相似文献
18.
传统数据网格调度算法容易陷入局部最优值和收敛速度过慢的问题。分析分层式数据网格的特点,对数据网格进行层次划分和节点角色二级划分。针对分层式网络调度模型,设计了一种基于节点博弈的分层式数据网格资源调度优化算法(CTDGRA算法)。该算法基于博弈论框架,将数据分布任务调度计划生成问题转变成静态数据任务与动态节点资源映射优化选取方案问题。兼顾数据任务间的依赖关系、节点域间的节点能力及节点的偏好行为,衡量各节点目标并获得全局最为有利或最为合理的方案的行为方案从而保证系统全局最优QOS。仿真实验表明,算法能激励普通节点贡献空闲能力的意愿,同时避免低性能节点成为资源获取的性能瓶颈,较好地提升系统的吞吐力。 相似文献
19.
分布式大数据函数依赖发现 总被引:1,自引:0,他引:1
在关系数据库中,函数依赖发现是一种十分重要的数据库分析技术,在知识发现、数据库语义分析、数据质量评估以及数据库设计等领域有着广泛的应用.现有的函数依赖发现算法主要针对集中式数据,通常仅适用于数据规模比较小的情况.在大数据背景下,分布式环境函数依赖发现更富有挑战性.提出了一种分布式环境下大数据的函数依赖发现算法,其基本思想是首先在各个节点利用本地数据并行进行函数依赖发现,基于以上发现的结果对函数依赖候选集进行剪枝,然后进一步利用函数依赖的左部(left hand side, LHS)的特征,对函数依赖候选集进行分组,针对每一组候选函数依赖并行执行分布式环境发现算法,最终得到所有函数依赖.对不同分组情况下所能检测的候选函数依赖数量进行了分析,在算法的执行过程中,综合考虑了数据迁移量和负载均衡的问题.在真实的大数据集上的实验表明,提出的检测算法在检测效率方面与已有方法相比有明显的提升. 相似文献
20.
In this paper, we propose a new algorithm, named Grid-based Distributed Max-Miner (GridDMM), for mining maximal frequent itemsets from databases on a Data Grid. A frequent itemset is maximal if none of its
supersets is frequent. GridDMM is specifically suitable for use in Grid environments due to low communication and synchronization
overhead. GridDMM consists of a local mining phase and a global mining phase. During the local mining phase, each node mines
the local database to discover the local maximal frequent itemsets, then they form a set of maximal candidate itemsets for
the top-down search in the subsequent global mining phase. A new prefix-tree data structure is developed to facilitate the
storage and counting of the global candidate itemsets of different sizes. We built a Data Grid system on a cluster of workstations
using the open-source Globus Toolkit, and evaluated the GridDMM algorithm in terms of performance, scalability, and the overhead
of communication and synchronization. GridDMM demonstrates better performance than other sequential and parallel algorithms,
and its performance is scalable in terms of the database size and the number of nodes.
This research was supported in part by LexisNexis, NCR and AFRL/Wright Brothers Institute (WBI). 相似文献