首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 63 毫秒
1.
数据挖掘算法广泛地应用于数据分析。工业、科学和商业领域需要分析地理上分布的大量数据集,而网格能有效地提供高性能应用和分布式的基础设施。为了利用网格实现数据挖掘和知识表示,文中根据知识网格的概念,在GlobusToolkit的基础上,分析了知识网格的体系结构和它的主要组件,根据数据挖掘的过程设计了一种网格数据挖掘系统软件模型,并指出了该模型应提供的服务,这些服务会屏蔽所有关于网格底层的所有细节,使最终用户只关心知识发现的过程。  相似文献   

2.
Distributed data mining on grids: services, tools, and applications   总被引:4,自引:0,他引:4  
Data mining algorithms are widely used today for the analysis of large corporate and scientific datasets stored in databases and data archives. Industry, science, and commerce fields often need to analyze very large datasets maintained over geographically distributed sites by using the computational power of distributed and parallel systems. The grid can play a significant role in providing an effective computational support for distributed knowledge discovery applications. For the development of data mining applications on grids we designed a system called Knowledge Grid. This paper describes the Knowledge Grid framework and presents the toolset provided by the Knowledge Grid for implementing distributed knowledge discovery. The paper discusses how to design and implement data mining applications by using the Knowledge Grid tools starting from searching grid resources, composing software and data components, and executing the resulting data mining process on a grid. Some performance results are also discussed.  相似文献   

3.
4.
Distribution of data and computation allows for solving larger problems and executing applications that are distributed in nature. The grid is a distributed computing infrastructure that enables coordinated resource sharing within dynamic organizations consisting of individuals, institutions, and resources. The grid extends the distributed and parallel computing paradigms allowing for resource negotiation and dynamical allocation, heterogeneity, open protocols, and services. Grid environments can be used both for compute-intensive tasks and data intensive applications by exploiting their resources, services, and data access mechanisms. Data mining algorithms and knowledge discovery processes are both compute and data intensive, therefore the grid can offer a computing and data management infrastructure for supporting decentralized and parallel data analysis. This paper discusses how grid computing can be used to support distributed data mining. Research activities in grid-based data mining and some challenges in this area are presented along with some promising future directions for developing grid-based distributed data mining.  相似文献   

5.
基于知识网格的数据挖掘   总被引:8,自引:0,他引:8  
魏定国  彭宏 《计算机科学》2006,33(6):210-213
工业、科学、商务等领域的数据通常分布在不同的地方,需要在不同的地点对其进行分布式维护。只有使用计算功能超强的分布式、并行处理系统才能分析这些领域所产生的超大规模数据集。网格为分布式知识发现应用中的计算提供了有效支持。为了在网格上进行数据挖掘的开发,本文提供了一个称之为知识网格的系统,讨论如何应用知识网格设计实施数据挖掘应用,并说明如何搜索网格资源、编制软件和数据组件,以及数据挖掘应用在网格上的执行过程。  相似文献   

6.
While the infrastructure of computing shifting from isolated computers to Internet-linked resources, the software industry transfers its focus from computing products to dependable services. That''s one of the reasons why pervasive computing, grid computing, service computing and cloud computing are introduced on the open and dynamic Internet-linked platform. The communication abilities it introduced enable various resources exchanging and sharing freely, and participating in the community in the social network. There are three layers in such a platform: computing infrastructure, software services and information web. The underline layer, computing infrastructure, provides computing and communication facilities, and the surface layer, information web, is full of information recombination and consuming. The layer of software services serves as a factory to process constantly emerging and heterogeneous dynamic information using distributed, autonomous, and evolvable computing facilities. Therefore, services provided by such software systems should be adaptive, situational, trustworthy, autonomous and etc. To achieve flexible objectives, these services can be coordinated in various styles such as integration, cooperation, orchestration and etc. A portmanteau term \Internetware" is used  相似文献   

7.
The Internet provides a global open infrastructure for exchanging and sharing of various resource all over the world. The rapid development and wide application of the Internet makes it a new mainstream platform to use, develop, deploy and execute software systems and applications. With the vision of "Internet as a computer", many technical initiatives such as pervasive computing, grid computing, service computing and cloud computing emerges on this open and dynamic environment. In order to support the various new application styles and accommodate the fundamental change of the underlying infrastructure, many specific software technologies such as service-oriented architecture are proposed for current practices. While these technologies are useful and widely accepted, they have not formed a systematic solution as matured as the object-oriented technology, as a uniformed software methodology and technology system is yet to be developed.  相似文献   

8.
Grid computing facilitates the aggregation and coordination of resources that are distributed across multiple administrative domains for large‐scale and complex e‐Science experiments. Writing, deploying, and testing grid applications over highly heterogeneous and distributed resources are complex and challenging. The process requires grid‐enabled programming tools that can handle the complexity and scale of the infrastructure. However, while a large amount of research has been undertaken into grid middleware, little work has been directed specifically at the area of grid application development tools. This paper presents the design and implementation of ISENGARD, an infrastructure for supporting e‐Science and grid application development. ISENGARD provides services, tools, and APIs that simplify grid software development. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

9.
Over the last few years, the adaptation ability has become an essential characteristic for grid applications due to the fact that it allows applications to face the dynamic and changing nature of grid systems. This adaptive capability is applied within different grid processes such as resource monitoring, resource discovery, or resource selection. In this regard, the present approach provides a self-adaptive ability to grid applications, focusing on enhancing the resources selection process. This contribution proposes an Efficient Resources Selection model to determine the resources that best fit the application requirements. Hence, the model guides applications during their execution without modifying or controlling grid resources. Within the evaluation phase, the experiments were carried out in a real European grid infrastructure. Finally, the results show that not only a self-adaptive ability is provided by the model but also a reduction in the applications’ execution time and an improvement in the successfully completed tasks rate are accomplished.  相似文献   

10.
In modern scientific computing communities, scientists are involved in managing massive amounts of very large data collections in a geographically distributed environment. Research in the area of grid computing has given us various ideas and solutions to address these requirements. Data grid mostly deals with large computational problems and provides geographically distributed resources for large-scale data-intensive applications that generate large data sets. Peer-to-peer (P2P) networks have also become a major research topic over the last few years. In a distributed P2P system, a discovery algorithm is required to locate specific information, applications, or users within the system. In this research work, we present our scientific data grid as a large P2P-based distributed system model. By using this model, we study various discovery algorithms for locating data sets in a data grid system. The algorithms we studied are based on the P2P architecture. We investigate these algorithms using our Grid Simulator developed using PARSEC. In this paper, we illustrate our scientific data grid model and our Grid Simulator. We then analyze the performance of the discovery algorithms relative to their average number of hop, success rates and bandwidth consumption.  相似文献   

11.
The grid is a promising infrastructure that can allow scientists and engineers to access resources among geographically distributed environments. Grid computing is a new technology which focuses on aggregating resources (e.g., processor cycles, disk storage, and contents) from a large-scale computing platform. Making grid computing a reality requires a resource broker to manage and monitor available resources. This paper presents a workflow-based resource broker whose main functions are matching available resources with user requests and considering network information statuses during matchmaking in computational grids. The resource broker provides a graphic user interface for accessing available and the appropriate resources via user credentials. This broker uses the Ganglia and NWS tools to monitor resource status and network-related information, respectively. Then we propose a history-based execution time estimation model to predict the execution time of parallel applications, according to previous execution results. The experimental results show that our model can accurately predict the execution time of embarrassingly parallel applications. We also report on using the Globus Toolkit to construct a grid platform called the TIGER project that integrates resources distributed across five universities in Taichung city, Taiwan, where the resource broker was developed.
Po-Chi ShihEmail:
  相似文献   

12.
Semantics and knowledge grids: building the next-generation grid   总被引:2,自引:0,他引:2  
Just as the Internet is shifting its focus from information and communication to a knowledge delivery infrastructure, we see the Grid moving from computation and data management to a pervasive, worldwide knowledge management infrastructure. We have the technology to store and access data, but we seem to lack the ability to transform data tombs into useful data and extract knowledge from them. We review some of the current and future technologies that will impact the architecture, computational model, and applications of future grids. We attempt to forecast the evolution of computational grids into what we call the next-generation grid, with a particular focus on the use of semantics and knowledge discovery techniques and services. We propose a comprehensive software architecture for the next-generation grid, which integrates currently available services and components in Semantic Web, Semantic Grid, P2P, and ubiquitous systems. We'll also discuss a case study that shows how some new technologies can improve grid applications.  相似文献   

13.
The computational grid is rapidly evolving into a large-scale computing infrastructure that facilitates resource sharing and problem solving over the Internet. Information services play a crucial role in grid environments for discovery of resources. The dynamic nature and the large-scale of a grid pose many challenges to information services in terms of scalability and resilience. This paper presents RDSpace which can be used as a substrate for resource discovery in grid environments. RDSpace builds a shared tuple space on top of a structured peer-to-peer overlay to achieve high scalability in dealing with a large number of computing nodes and to support range queries in discovery of resources. Another novelty of RDSpace lies in its capability to handle churn situations where nodes may join or leave the space frequently. RDSpace is evaluated from the aspects of scalability and churn handling, and the evaluation results are also presented in this paper.  相似文献   

14.
高分子模拟计算网格为复杂材料研究人员提供了分子模拟软件、计算资源和信息共享的平台。设计和实现了网格作业管理模块,包括作业的提交、调度和监视,为用户提供了一个透明的网格资源的使用方法。该模块已经成功部署到高分子模拟计算网格系统中,试验结果表明,作业管理为网格用户提供更好的服务质量,实现了对网格资源的优化使用。  相似文献   

15.
数据网格已逐步在科学研究领域得到应用.提高数据网格的性能以适应分布式数据管理已经成为研究数据网格的一个热点.提出了网格局部性的概念,分析了网格局部性对数据网格性能的影响,并从增强网格局部性的角度对数据网格的性能进行优化,提出了综合跳一扩散副本替换策略(jump-DRP)和参考生物外激素的任务调度策略(JARIP).实验结果表明,考虑了网格局部性因素的jump-DRP与JARIP的策略组合提高了网格平台的任务处理性能,并对各类应用背景及任务的复杂程度具有鲁棒性.  相似文献   

16.
The Grid is an infrastructure for resource sharing and coordinated use of those resources in dynamic heterogeneous distributed environments. The effective use of a Grid requires the definition of metadata for managing the heterogeneity of involved resources that include computers, data, network facilities, and software tools provided by different organizations. Metadata management becomes a key issue when complex applications, such as data-intensive simulations and data mining applications, are executed on a Grid. This paper discusses metadata models for heterogeneous resource management in Grid-based data mining applications. In particular, it discusses how resources are represented and managed in the Knowledge Grid, a framework for Grid-enabled distributed data mining. The paper illustrates how XML-based metadata is used to describe data mining tools, data sources, mining models, and execution plans, and how metadata is used for the design and execution of distributed knowledge discovery applications on Grids.  相似文献   

17.
This paper presents a platform that supports the execution of scientific applications covering different programming models (such as Master/Slave, Parallel/MPI, MapReduce and Workflows) on Cloud infrastructures. The platform includes (i) a high-level declarative language to express the requirements of the applications featuring software customization at runtime, (ii) an approach based on virtual containers to encapsulate the logic of the different programming models, (iii) an infrastructure manager to interact with different IaaS backends, (iv) a configuration software to dynamically configure the provisioned resources and (v) a catalog and repository of virtual machine images. By using this platform, an application developer can adapt, deploy and execute parallel applications agnostic to the Cloud backend.  相似文献   

18.
《Parallel Computing》2007,33(4-5):289-301
Large scale grids for in silico drug discovery open opportunities of particular interest to neglected and emerging diseases. In 2005 and 2006, we have been able to deploy large scale virtual docking within the framework of the WISDOM initiative against malaria and avian influenza requiring about 100 years of CPU on the EGEE, Auvergrid and TWGrid infrastructures. These achievements demonstrated the relevance of large scale grids for the virtual screening by molecular docking. This also allowed evaluating the performances of the grid infrastructures and to identify specific issues raised by large scale deployment.  相似文献   

19.
The GEOsciences Network (GEON, www.geongrid.org ) is a large‐scale collaborative cyberinfrastructure project involving information technology and geoscience researchers from multiple institutions. The GEON infrastructure provides portal, middleware, and data resources to facilitate scientific discovery for domain scientists using applications, tools, and services. It consists of both a service‐oriented Web/Grid framework and application toolkits, using the Web service and portlet programming model to represent applications. Based on those grid environments, we have developed the SYNSEIS (SYNthetic SEISmogram) tool within the GEON infrastructure to support personalized experiments in seismology. In this paper, we present an overview of SYNSEIS from a user point of view, and demonstrate how one can use a simple management scheme to perform a parameter sweep and distribute the work in computational resources, using a scientific application that was not specifically designed to perform parameter sweeps. The performance advantages to be gained by using this scheme with scientific codes for dealing with a large number of jobs on computational grids are very substantial. In particular, we identify the earthquake simulations in the SYNSEIS tool as an example application that can benefit from running jobs on computational resources and subsequently promote the sharing of computational resources among partner sites involved in the GEON project. Finally, we also discuss the parallel scaling behavior of our primary earthquake simulation application. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

20.
从智慧水利关键基础设施的角度,描述江苏省水利数据中心总体架构,探讨构建水利云数据中心关键技术,包括分布式云架构、数据共享访问、业务支撑平台、应用集成和云安全等,总结基于云平台搭建的水利典型应用。为省级水利数据中心建设和水利、水务信息化基础架构提供参考和借鉴,为实现基础水信息平台目标总结经验。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号