首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
通用知识网格下以用户为中心的数据挖掘本体研究   总被引:2,自引:0,他引:2  
分布异构的海量数据挖掘是数据挖掘领域急待解决的课题,通用知识网格(UKB)架构模型用于在网格环境下创建大规模的分布式知识发现和知识集成系统。本体服务器是整个架构的核心模块,负责本体的管理和查询。数据挖掘本体服务是本体服务器提供的主要服务。本文主要介绍通用知识网格下以用户为中心的数据挖掘本体的设计和OWL实现。数据挖掘本体可满足各种不同领域、不同层次用户的知识发现服务,使系统具有开放性、可扩展性和高用户可用性。还介绍了一个反洗钱领域数据挖掘解决方案实例。  相似文献   

2.
数据挖掘算法广泛地应用于数据分析。工业、科学和商业领域需要分析地理上分布的大量数据集,而网格能有效地提供高性能应用和分布式的基础设施。为了利用网格实现数据挖掘和知识表示,文中根据知识网格的概念,在GlobusToolkit的基础上,分析了知识网格的体系结构和它的主要组件,根据数据挖掘的过程设计了一种网格数据挖掘系统软件模型,并指出了该模型应提供的服务,这些服务会屏蔽所有关于网格底层的所有细节,使最终用户只关心知识发现的过程。  相似文献   

3.
Centralized data mining techniques are widely used today for the analysis of large corporate and scientific data stored in databases. However, industry, science, and commerce fields often need to analyze very large datasets maintained over geographically distributed sites by using the computational power of distributed systems. The Grid can play a significant role in providing an effective computational infrastructure support for this kind of data mining. Similarly, the advent of multi-agent systems has brought us a new paradigm for the development of complex distributed applications. During the past decades, there have been several models and systems proposed to apply agent technology building distributed data mining (DDM). Through a combination of these two techniques, we investigated the critical issues to build DDM on Grid infrastructure and design an Agent Grid Intelligent Platform as a testbed. We also implement an integrated toolkit VAStudio for quickly developing agent-based DDM applications and compare its function with other systems.  相似文献   

4.
The Grid is an infrastructure for resource sharing and coordinated use of those resources in dynamic heterogeneous distributed environments. The effective use of a Grid requires the definition of metadata for managing the heterogeneity of involved resources that include computers, data, network facilities, and software tools provided by different organizations. Metadata management becomes a key issue when complex applications, such as data-intensive simulations and data mining applications, are executed on a Grid. This paper discusses metadata models for heterogeneous resource management in Grid-based data mining applications. In particular, it discusses how resources are represented and managed in the Knowledge Grid, a framework for Grid-enabled distributed data mining. The paper illustrates how XML-based metadata is used to describe data mining tools, data sources, mining models, and execution plans, and how metadata is used for the design and execution of distributed knowledge discovery applications on Grids.  相似文献   

5.
广域网中存在地理上分布的海量的各种数据,分析和处理这些数据需要利用高性能的分布式并行处理系统,网格能够满足这种要求.知识网格就是使用基本的网格服务(通信服务、信息服务、授权服务和资源管理服务)去建立特定的分布式并行知识发现工具和服务.结合知识网格特点,讨论了知识网格的体系结构和支持知识挖掘应用的服务集.运用分布式数据挖掘的元学习模型,给出了利用知识网格提供的知识挖掘服务实现分布式数据挖掘的过程.  相似文献   

6.
科学和工商业应用需要分析分布在各异构站点的海量数据。这就需要合适的分布式并行系统来存储和管理数据。网格为分布式数据挖掘和知识发现提供了有效的计算支持。文中在讨论知识网格体系结构的基础上,利用可视化网格应用环境VEGA实现了基于网格的分布式数据挖掘过程。  相似文献   

7.
网格计算是分布计算的一个新的重要的分支,它主要是实现了大规模资源的共享,并且达到了高性能。在许多应用中,需要对大量的数据集进行分析,而这些数据通常是地理上分布的大规模的数据,并且复杂度不断在增加。对于以上的这些应用,网格技术提供了有效的支持,介绍了网格的基础设施以及分布式数据挖掘。  相似文献   

8.
In modern scientific computing communities, scientists are involved in managing massive amounts of very large data collections in a geographically distributed environment. Research in the area of grid computing has given us various ideas and solutions to address these requirements. Data grid mostly deals with large computational problems and provides geographically distributed resources for large-scale data-intensive applications that generate large data sets. Peer-to-peer (P2P) networks have also become a major research topic over the last few years. In a distributed P2P system, a discovery algorithm is required to locate specific information, applications, or users within the system. In this research work, we present our scientific data grid as a large P2P-based distributed system model. By using this model, we study various discovery algorithms for locating data sets in a data grid system. The algorithms we studied are based on the P2P architecture. We investigate these algorithms using our Grid Simulator developed using PARSEC. In this paper, we illustrate our scientific data grid model and our Grid Simulator. We then analyze the performance of the discovery algorithms relative to their average number of hop, success rates and bandwidth consumption.  相似文献   

9.
Distribution of data and computation allows for solving larger problems and executing applications that are distributed in nature. The grid is a distributed computing infrastructure that enables coordinated resource sharing within dynamic organizations consisting of individuals, institutions, and resources. The grid extends the distributed and parallel computing paradigms allowing for resource negotiation and dynamical allocation, heterogeneity, open protocols, and services. Grid environments can be used both for compute-intensive tasks and data intensive applications by exploiting their resources, services, and data access mechanisms. Data mining algorithms and knowledge discovery processes are both compute and data intensive, therefore the grid can offer a computing and data management infrastructure for supporting decentralized and parallel data analysis. This paper discusses how grid computing can be used to support distributed data mining. Research activities in grid-based data mining and some challenges in this area are presented along with some promising future directions for developing grid-based distributed data mining.  相似文献   

10.
基于知识网格的数据挖掘   总被引:8,自引:0,他引:8  
魏定国  彭宏 《计算机科学》2006,33(6):210-213
工业、科学、商务等领域的数据通常分布在不同的地方,需要在不同的地点对其进行分布式维护。只有使用计算功能超强的分布式、并行处理系统才能分析这些领域所产生的超大规模数据集。网格为分布式知识发现应用中的计算提供了有效支持。为了在网格上进行数据挖掘的开发,本文提供了一个称之为知识网格的系统,讨论如何应用知识网格设计实施数据挖掘应用,并说明如何搜索网格资源、编制软件和数据组件,以及数据挖掘应用在网格上的执行过程。  相似文献   

11.
面向知识网格的本体学习研究   总被引:12,自引:1,他引:11  
网格计算正在从单纯的面向大型计算的分布式资源共享发展为一种面向服务的架构,以实现透明而可靠的分布式系统集成。网格智能是指如何获取、预处理、表示和集成不同层次的网格服务(如HTML/XML/RDF/OWL文档、服务响应时间和服务质量等)的数据和信息,并最终转换为有用的智能(知识)。因为高层知识将在未来的网格应用起到越来越重要的作用,本体是知识网格实现的关键。文章提出了一种实现从Web文档中本体(半)自动构建的本体学习框架WebOntLearn,并讨论了本体学习中领域概念的抽取、概念之间关系的抽取和分类体系的自动构建等关键技术。  相似文献   

12.
数据挖掘工作面临一个问题:由于数据挖掘任务需要处理大规模数据,导致任务执行时间过长。网格计算的研究目标就是将分散的、异构的、闲置的计算机结合为一个高性能的计算机系统,因此可以利用网格系统提供的高性能计算能力来有效降低数据处理时间。提出并实现基于网格计算的数据挖掘系统——DMGrid。重点考虑了并行计算功能,同时考虑了网格计算资源的动态配置。和现存的数据挖掘网格不同的是,DMGrid提供了一个引擎来执行应用中设定的工作流,同时还提供了应用运行监控功能。最后在实验中通过设计两个应用程序(客户流失分析和客户价值分析),证明了DMGrid的可行性。  相似文献   

13.
网格的数据挖掘*   总被引:24,自引:2,他引:22  
网格是网络计算、分布式计算和高性能计算技术研究的热点。随着科学计算领域中的数据剧烈增长以及未来网格计算环境下广域分布的海量数据共享成为现实,数据挖掘技术将在挖掘有效的信息、发现新的知识和规律发挥着重要的作用。结合网格的特点,概述了网格数据挖掘的特点和关键技术,重点讨论了网格数据挖掘的体系结构和基本过程,最后给出了基于OGSA的网格数据挖掘的例子。  相似文献   

14.
计算网格的资源分发和发现机制   总被引:1,自引:0,他引:1  
1 引言计算网格的资源管理系统是为实现计算网格系统资源共享所应提供的最主要的服务之一。计算网格资源管理系统的基本功能是接受来自计算网格范围内的机器的资源请求,并且把特定的资源分配给资源请求者,并且合理地调度相应的资源,使请求资源的作业得以运行。资源分发、资源发现和资源的调度构成了计算网格资源管理系统的最主要的内容。资源分发和资源发现提供方法,通过该方法,在计算网格内部的机器能够形成一个可用的资源和其状态的一个视图。资源  相似文献   

15.
利用网格服务的分布式频繁模式挖掘算法   总被引:3,自引:1,他引:3  
充分利用网格计算平台的各种服务来进行分布式数据挖掘,是近来数据挖掘方面的一个热点。网格计算中的任务管理、任务调度和资源管理等服务可以为分布式数据挖掘提供极大的便利。该文在这些研究的基础上,介绍了一种基于网格平台的分布式频繁模式挖掘算法。该算法借鉴了FP-growth算法的思想并利用网格平台所提供的分布式计算的各种便利的服务,能在网格计算环境中进行分布式频繁模式的挖掘。  相似文献   

16.
Grid computing offers the powerful alternative of sharing resources on a worldwide scale, across different institutions to run computationally intensive, scientific applications without the need for a centralized supercomputer. Much effort has been put into development of software that deploys legacy applications on a grid-based infrastructure and efficiently uses available resources. One field that can benefit greatly from the use of grid resources is that of drug discovery since molecular docking simulations are an integral part of the discovery process. In this paper, we present a scalable, reusable platform to choreograph large virtual screening experiments over a computational grid using the molecular docking simulation software DOCK. Software components are applied on multiple levels to create automated workflows consisting of input data delivery, job scheduling, status query, and collection of output to be displayed in a manageable fashion for further analysis. This was achieved using Opal OP to wrap the DOCK application as a grid service and PERL for data manipulation purposes, alleviating the requirement for extensive knowledge of grid infrastructure. With the platform in place, a screening of the ZINC 2,066,906 compound "drug-like" subset database against an enzyme's catalytic site was successfully performed using the MPI version of DOCK 5.4 on the PRAGMA grid testbed. The screening required 11.56 days laboratory time and utilized 200 processors over 7 clusters.  相似文献   

17.
Distributed data mining implements techniques for analyzing data on distributed computing systems by exploiting data distribution and parallel algorithms. The grid is a computing infrastructure for implementing distributed high‐performance applications and solving complex problems, offering effective support to the implementation and use of data mining and knowledge discovery systems. The Web Services Resource Framework has become the standard for the implementation of grid services and applications, and it can be exploited for developing high‐level services for distributed data mining applications. This paper describes how distributed data mining patterns, such as collective learning, ensemble learning, and meta‐learning models, can be implemented as Web Services Resource Framework mining services by exploiting the grid infrastructure. The goal of this work was to design a distributed architectural model that can be exploited for different distributed mining patterns deployed as grid services for the analysis of dispersed data sources. In order to validate such an approach, we presented also the implementation of two clustering algorithms on the developed architecture. In particular, the distributed k‐means and distributed expectation maximization were exploited as pilot examples to show the suitability of the implemented service‐oriented framework. An extensive evaluation of its performance was provided. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

18.
在分析科学数据网格环境下数据挖掘之特点的基础上,提出了科学数据挖掘网格服务框架.科学数据挖掘网格服务以网格服务的形式提供了科学数据网格环境下的数据挖掘解决方案.与传统的数据挖掘系统相比,科学数据挖掘网格服务具有诸多优点,更适合科学数据网格和科学数据库环境.目前已经实际应用于几个数据库中,不仅具有简单的查询检索功能,而且可以进行数据统计分析及知识发现,进一步提高了科学数据网格服务的水平.  相似文献   

19.
网格技术的发展使网格数据挖掘成为处理分布异构海量数据的重要手段。该文将本体引入到网格数据挖掘中,讨论了网格数据挖掘本体的结构,并提出了网格数据挖掘本体的建立过程,最后讨论了网格数据挖掘本体实现。  相似文献   

20.
网格技术的发展使网格数据挖掘成为处理分布异构海量数据的重要手段。该文将本体引入到网格数据挖掘中。讨论了网格数据挖掘本体的结构,并提出了网格数据挖掘本体的建立过程,最后讨论了网格数据挖掘本体实现。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号