首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于知识网格的数据挖掘   总被引:8,自引:0,他引:8  
魏定国  彭宏 《计算机科学》2006,33(6):210-213
工业、科学、商务等领域的数据通常分布在不同的地方,需要在不同的地点对其进行分布式维护。只有使用计算功能超强的分布式、并行处理系统才能分析这些领域所产生的超大规模数据集。网格为分布式知识发现应用中的计算提供了有效支持。为了在网格上进行数据挖掘的开发,本文提供了一个称之为知识网格的系统,讨论如何应用知识网格设计实施数据挖掘应用,并说明如何搜索网格资源、编制软件和数据组件,以及数据挖掘应用在网格上的执行过程。  相似文献   

2.
Distributed data mining implements techniques for analyzing data on distributed computing systems by exploiting data distribution and parallel algorithms. The grid is a computing infrastructure for implementing distributed high‐performance applications and solving complex problems, offering effective support to the implementation and use of data mining and knowledge discovery systems. The Web Services Resource Framework has become the standard for the implementation of grid services and applications, and it can be exploited for developing high‐level services for distributed data mining applications. This paper describes how distributed data mining patterns, such as collective learning, ensemble learning, and meta‐learning models, can be implemented as Web Services Resource Framework mining services by exploiting the grid infrastructure. The goal of this work was to design a distributed architectural model that can be exploited for different distributed mining patterns deployed as grid services for the analysis of dispersed data sources. In order to validate such an approach, we presented also the implementation of two clustering algorithms on the developed architecture. In particular, the distributed k‐means and distributed expectation maximization were exploited as pilot examples to show the suitability of the implemented service‐oriented framework. An extensive evaluation of its performance was provided. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

3.
网格计算是为解决大规模资源密集型问题而提出的新一代计算平台,是当前并行和分布处理技术的一个发展方向,而资源管理是计算网格的关键技术之一。对各种各样可利用资源的整合和管理是网格应用的基础,而资源的分布性、动态性、异构性、自治性和需要协调一致性使得网格资源的管理调度成为一个棘手的问题。目前基于市场的经济资源管理和调度算法非常适合计算网格中的资源管理问题,但有调度价格不能更改、负载平衡等问题。文中提出了“网格环境下基于经济模型的资源代理”,依靠多维QoS指导的调度策略和经济模型的启发式调节资源价格,改进和优化计算网格资源的分配。  相似文献   

4.
This paper addresses the problem of parallel dynamic security assessment applications from static homogeneous cluster environment to dynamic heterogeneous grid environment. Functional parallelism and data parallelism are supported by each of the message passing interface model and TCP/IP model. To consider the differences in heterogeneous computing resources and complexity of large-scale power system communities, a kernel-based multilevel algorithm is proposed for network partitioning. Since the bottleneck in distributed computation is low speed network communication, a bi-level latency exploitation technique is introduced for numerically solving system differential equations. The proposed grid-based implementation includes the core simulation engine, grid computing middleware, a Python interface and Python front-end utilities. Tests for a 39-bus network, a 4000-bus network and a 10,000-bus network are reported, and the results of these experiments demonstrate that the proposed scheme is able to execute the distributed simulations on computational grid infrastructure and provide efficient parallelism.  相似文献   

5.
Parameter-space (p-space) studies involve running a single application several times with different parameter sets. Since the jobs are mutually independent, many computing resources can be recruited to conduct an entire study in a distributed manner. The p-space studies are attractive applications for grids, which are networked collections of computing and other resources. Legion is a grid infrastructure that facilitates the secure and easy use of heterogeneous, geographically distributed resources by providing the illusion of a single virtual machine from those resources. Legion provides tools and services that support advanced p-space studies, i.e., studies that make complex demands such as transparent access to distributed files, fault-tolerance and security. We demonstrate these benefits with a protein-folding experiment in which a molecular simulation package was run over a grid managed by Legion.  相似文献   

6.
The service‐oriented architecture paradigm can be exploited for the implementation of data and knowledge‐based applications in distributed environments. The Web services resource framework (WSRF) has recently emerged as the standard for the implementation of Grid services and applications. WSRF can be exploited for developing high‐level services for distributed data mining applications. This paper describes Weka4WS, a framework that extends the widely used open source Weka toolkit to support distributed data mining on WSRF‐enabled Grids. Weka4WS adopts the WSRF technology for running remote data mining algorithms and managing distributed computations. The Weka4WS user interface supports the execution of both local and remote data mining tasks. On every computing node, a WSRF‐compliant Web service is used to expose all the data mining algorithms provided by the Weka library. The paper describes the design and implementation of Weka4WS using the WSRF libraries and services provided by Globus Toolkit 4. A performance analysis of Weka4WS for executing distributed data mining tasks in different network scenarios is presented. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

7.
The availability of powerful microprocessors and improvements in the performance of networks has enabled high performance computing on wide-area, distributed systems. Computational grids, by integrating diverse, geographically distributed and essentially heterogeneous resources provide the infrastructure for solving large-scale problems. However, heterogeneity, on the one hand allows for scalability, but on the other hand makes application development and deployment for such an environment extremely difficult. The field of life sciences has been an explosion in data over the past decade. The data acquired needs to be processed, interpreted and analyzed to be useful. The large resource needs of bioinformatics allied to the large number of data-parallel applications in this field and the availability of a powerful, high performance, computing grid environment lead naturally to opportunities for developing grid-enabled applications. This survey, done as part of the Life Sciences Research Group (a research group belonging to the Global Grid Forum) attempts to collate information regarding grid-enabled applications in this field. Arun Krishnan, Ph.D.: He did his undergraduate in Electrochemical Engineering in the Central Electrochemical Research Institute in India and went on to do his Ph.D. in Advanced Process Control from the University of South Carolina. He then worked in the control and high performance computing industries for about 3 years before moving to the Bioinformatics Institute in Singapore. He is currently a Young Investigator for the Distributed Computing in Biomedicine Group at BII. His research interests include parallel and distributed computing with special emphasis on grid computing and its application to the biomedical area. He is also interested in developing parallel algorithms for sequence analysis and protein structure prediction.  相似文献   

8.
基于网格的煤矿安全应用网格体系研究   总被引:2,自引:0,他引:2  
徐森  王建国  李学文 《计算机仿真》2005,22(12):234-237
煤矿安全应用工作中数据信息分布、计算资源闲置、异构以及计算功能单一严重制约着行业应用和信息化的发展。网格(grid)是新一代高性能计算环境和信息服务基础设施,能够实现动态跨地域的资源共享和协同工作。该文基于新兴的网格技术,参照OGSA体系,提出并构建了从信息获取、处理到应用的完整的煤矿安全应用网格(CMSAG)体系结构。同时结合行业应用需求,着重研究了其资源表示、资源管理、存储访问以及数据缓存等关键技术,最后通过具体应用示例的工作流程论证了该网格体系的应用模式。  相似文献   

9.
网格计算是分布计算的一个新的重要的分支,它主要是实现了大规模资源的共享,并且达到了高性能。在许多应用中,需要对大量的数据集进行分析,而这些数据通常是地理上分布的大规模的数据,并且复杂度不断在增加。对于以上的这些应用,网格技术提供了有效的支持,介绍了网格的基础设施以及分布式数据挖掘。  相似文献   

10.
The computational grid is rapidly evolving into a service-oriented computing infrastructure that facilitates resource sharing for solving large-scale data and computationally intensive problems. Peer-to-peer (P2P) systems have emerged as an infrastructure enabling technologies for enhanced scalability and reliability in file sharing and content distribution. It is envisioned that P2P enabled service-oriented grid systems would virtualize various resources as services with high scalability and reliability. Many legacy software resources exist nowadays, but making them grid aware services for effective resource sharing has become an issue of vital importance. This paper presents GSLab, a toolkit for automatically wrapping legacy software into services that can be published, discovered and reused in grid environments. GSLab employs Sun Grid Engine (SGE) to enhance its performance in execution of wrapped services. Using GSLab, we have automatically wrapped a legacy computer animation rendering code written in C as a service that can be discovered and accessed in a SGE environment. The evaluation results show that the performance of GSLab improves with an increase in the number of computing nodes involved.
Nick AntonopoulosEmail:
  相似文献   

11.
Grid computing, which is characterized by large-scale sharing and collaboration of dynamic distributed resources has quickly become a mainstream technology in distributed computing and is changing the traditional way of software development. In this article, we present a grid-based software testing framework for unit and integration test, which takes advantage of the large-scale and cost-efficient computational grid resources to establish a testbed for supporting automated software test in complex software applications. Within this software testing framework, a dynamic bag-of-tasks model using swarm intelligence is developed to adaptively schedule unit test cases. Various high-confidence computing mechanisms, such as redundancy, intermediate value checks, verification code injection, and consistency checks are employed to verify the correctness of each test case execution on the grid. Grid workflow is used to coordinate various test units for integration test. Overall, we expect that the grid-based software testing framework can provide efficient and trustworthy services to significantly accelerate the testing process with large-scale software testing.
Yong-Duan SongEmail:
  相似文献   

12.
网格技术可以充分利用广域网中异构的、广泛分布的、时刻变化的动态资源,以达到完全共享和各种资源之间良好的协同工作。通常这样的整合在没有较高的硬件计算性能的前提下,也能利用数量较多、成本较低的单机来实现超级计算机对巨量数据的迅捷计算。利用网格组件将办公室的单机资源充分整合,同时以绘制Mandelbrot集这个可以易并行的实例对网格计算和单机计算的速度进行对比。实验证明,网格计算在解决计算密集型问题比单机更有优势。  相似文献   

13.
利用网格服务的分布式频繁模式挖掘算法   总被引:3,自引:1,他引:3  
充分利用网格计算平台的各种服务来进行分布式数据挖掘,是近来数据挖掘方面的一个热点。网格计算中的任务管理、任务调度和资源管理等服务可以为分布式数据挖掘提供极大的便利。该文在这些研究的基础上,介绍了一种基于网格平台的分布式频繁模式挖掘算法。该算法借鉴了FP-growth算法的思想并利用网格平台所提供的分布式计算的各种便利的服务,能在网格计算环境中进行分布式频繁模式的挖掘。  相似文献   

14.
网格计算的研究新进展   总被引:21,自引:0,他引:21  
网格计算诞生了一个全新的领域,它以大规模的资源协作共享、创新的应用以及高性能计算的特点,区别于传统的分布式计算。文章简述了网格计算的定义、特点、功能和基本体系结构,重点综述了网格的研究新进展及其商业化应用前景,分析了当前网格研究的发展趋势及其我国的对策。  相似文献   

15.
数据挖掘算法广泛地应用于数据分析。工业、科学和商业领域需要分析地理上分布的大量数据集,而网格能有效地提供高性能应用和分布式的基础设施。为了利用网格实现数据挖掘和知识表示,文中根据知识网格的概念,在GlobusToolkit的基础上,分析了知识网格的体系结构和它的主要组件,根据数据挖掘的过程设计了一种网格数据挖掘系统软件模型,并指出了该模型应提供的服务,这些服务会屏蔽所有关于网格底层的所有细节,使最终用户只关心知识发现的过程。  相似文献   

16.
Grid computing, which is characterized by large-scale sharing and collaboration of dynamic resources, is becoming an emerging computing platform on a global scale for data-intensive and computation-intensive scientific application. However, the complications of large-scale scientific computations and simulations harnessing massive computing resources are compounded by extensive heterogeneity in environments arising from “the Grid.” Scientists and engineers lack an intuitive grid-based compilation tool, which has contributed to the difficulty of exploiting these diverse resources and developing their applications on the grid. While manual configuration of various toolkits simplifying the end-to-end completion of a job is adequate for a computational grid with a limited number of nodes, the compilation procedure becomes inefficient for a computational grid with an increasing number of heterogeneous computational service providers. On the other hand, a global-scale computational grid is a potentially untrustworthy computing environment. How to take advantage of the potentially untrustworthy grid resources to provide trustworthy computational services for large-scale scientific applications is another critical issue. In this article, a remote compiling service for a heterogeneous computational grid is developed. In addition to running compilation tasks, the remote compiling service provides security enforcement and validation facilities, including intermediate value checking, secure source program submission, restricted compilation, and binary inspection, to support trustworthy compilation and execution of grid-based scientific applications. Overall, it is expected that our remote compiling services on the grid can tackle the heterogeneity problem of the grid and provide a secure, trustworthy, reliable, and state-of-the-art mechanism to develop grid-aware scientific applications.
Xiaohong YuanEmail:
  相似文献   

17.
18.
A PTS-PGATS based approach for data-intensive scheduling in data grids   总被引:1,自引:0,他引:1  
Grid computing is the combination of computer resources in a loosely coupled, heterogeneous, and geographically dispersed environment. Grid data are the data used in grid computing, which consists of large-scale data-intensive applications, producing and consuming huge amounts of data, distributed across a large number of machines. Data grid computing composes sets of independent tasks each of which require massive distributed data sets that may each be replicated on different resources. To reduce the completion time of the application and improve the performance of the grid, appropriate computing resources should be selected to execute the tasks and appropriate storage resources selected to serve the files required by the tasks. So the problem can be broken into two sub-problems: selection of storage resources and assignment of tasks to computing resources. This paper proposes a scheduler, which is broken into three parts that can run in parallel and uses both parallel tabu search and a parallel genetic algorithm. Finally, the proposed algorithm is evaluated by comparing it with other related algorithms, which target minimizing makespan. Simulation results show that the proposed approach can be a good choice for scheduling large data grid applications.  相似文献   

19.
万虎  余明晖  杨庆  费奇 《计算机仿真》2008,25(1):6-10,26
分布式仿真已经广泛应用于科学研究、工程、商业等方方面面.HLA的提出是为了解决分布式仿真中不同类型的仿真模型、仿真应用模块之间的互操作和提高仿真组件的重用性,并没有涉及到仿真资源层的管理,所以在面对现在分布式仿真向大规模发展和需要使用在地理位置上处于分布式的计算资源和数据资源时,存在一定的局限.基于网格的分布式仿真是一门刚刚起步的新兴技术,它致力于将传统的分布式仿真移植到网格环境中,以利用网格技术的各项优势.文中介绍了分布式仿真与网格技术的发展与应用现状,随后着重介绍基于网格的分布式仿真的研究发展前沿,并分析讨论了其面临的问题和发展的方向,为进一步的研究提供一定的指导性帮助.  相似文献   

20.
基于.NET框架的网格计算框架Alchemi模型,对于利用国内巨大数量的闲置PC资源从事计算网格的应用有着重要意义.从面向协议的网格基础架构说起,以逻辑角度对Alchemi的体系框架作了阐述,并针对Alchemi框架提出了网格体系结构中核心部分--注册服务的理论解决方案.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号