首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Distributed data mining on grids: services, tools, and applications   总被引:4,自引:0,他引:4  
Data mining algorithms are widely used today for the analysis of large corporate and scientific datasets stored in databases and data archives. Industry, science, and commerce fields often need to analyze very large datasets maintained over geographically distributed sites by using the computational power of distributed and parallel systems. The grid can play a significant role in providing an effective computational support for distributed knowledge discovery applications. For the development of data mining applications on grids we designed a system called Knowledge Grid. This paper describes the Knowledge Grid framework and presents the toolset provided by the Knowledge Grid for implementing distributed knowledge discovery. The paper discusses how to design and implement data mining applications by using the Knowledge Grid tools starting from searching grid resources, composing software and data components, and executing the resulting data mining process on a grid. Some performance results are also discussed.  相似文献   

2.
网格计算是分布计算的一个新的重要的分支,它主要是实现了大规模资源的共享,并且达到了高性能。在许多应用中,需要对大量的数据集进行分析,而这些数据通常是地理上分布的大规模的数据,并且复杂度不断在增加。对于以上的这些应用,网格技术提供了有效的支持,介绍了网格的基础设施以及分布式数据挖掘。  相似文献   

3.
4.
This article describes the Open Science Grid, a large distributed computational infrastructure in the United States which supports many different high-throughput scientific applications, and partners (federates) with other infrastructures nationally and internationally to form multi-domain integrated distributed systems for science. The Open Science Grid consortium not only provides services and software to an increasingly diverse set of scientific communities, but also fosters a collaborative team of practitioners and researchers who use, support and advance the state of the art in large-scale distributed computing. The scale of the infrastructure can be expressed by the daily throughput of around seven hundred thousand jobs, just under a million hours of computing, a million file transfers, and half a petabyte of data movement. In this paper we introduce and reflect on some of the OSG capabilities, usage and activities.  相似文献   

5.
面向知识网格的本体学习研究   总被引:12,自引:1,他引:11  
网格计算正在从单纯的面向大型计算的分布式资源共享发展为一种面向服务的架构,以实现透明而可靠的分布式系统集成。网格智能是指如何获取、预处理、表示和集成不同层次的网格服务(如HTML/XML/RDF/OWL文档、服务响应时间和服务质量等)的数据和信息,并最终转换为有用的智能(知识)。因为高层知识将在未来的网格应用起到越来越重要的作用,本体是知识网格实现的关键。文章提出了一种实现从Web文档中本体(半)自动构建的本体学习框架WebOntLearn,并讨论了本体学习中领域概念的抽取、概念之间关系的抽取和分类体系的自动构建等关键技术。  相似文献   

6.
The Grid is an infrastructure for resource sharing and coordinated use of those resources in dynamic heterogeneous distributed environments. The effective use of a Grid requires the definition of metadata for managing the heterogeneity of involved resources that include computers, data, network facilities, and software tools provided by different organizations. Metadata management becomes a key issue when complex applications, such as data-intensive simulations and data mining applications, are executed on a Grid. This paper discusses metadata models for heterogeneous resource management in Grid-based data mining applications. In particular, it discusses how resources are represented and managed in the Knowledge Grid, a framework for Grid-enabled distributed data mining. The paper illustrates how XML-based metadata is used to describe data mining tools, data sources, mining models, and execution plans, and how metadata is used for the design and execution of distributed knowledge discovery applications on Grids.  相似文献   

7.
Advances in science and engineering have put high demands on tools for high performance large-scale data exploration and analysis. Visualization is a powerful technology for analyzing data and presenting results. Todays science and engineering have benefited from state-of-the-art of Grid technologies and modern visualization systems. To visualize the large amount of data, rendering technologies are widely used to parallelize visualization tasks over distributed resources on computational Grids. It raises the necessity to balance the computational load and to minimize the network bandwidth requirements. This article explains in Grid environments how new approaches of visualization architecture and load-balancing algorithms address these challenges in a principled fashion. The Grid infrastructure that supports large scale distributed visualization is also introduced. Some typical visualization systems on Grids are referenced for discussions.  相似文献   

8.
Many data and compute intensive Grid applications, such as computational astrophysics, may be able to benefit from networking supported by dynamically provisioned lightpaths. To date, the majority of high performance distributed environments have been based on traditional routed packet networks, provisioned as external services rather than as integrated components within those environments. Because this approach often cannot provide high performance capabilities required by these applications, an alternative distributed infrastructure architecture is being designed based on dynamic lightpaths, supported by optical networks. These designs implement communication services and infrastructure as integral components of distributed infrastructure. The resultant environments resemble large scale specialized instruments. Presented here is one such architecture, implemented on a wide-area, optical Grid test bed, featuring a closely integrated dedicated lightpath mesh. The test bed was used to conduct a series of experiments to explore its potential for supporting adaptive mesh refinement (AMR) astrophysics simulations. While preliminary, the results of these experiments indicate that this architecture may provide the deterministic capabilities required by a wide range of high performance distributed services and applications, especially for computational science.  相似文献   

9.
Distributed data mining for e-business   总被引:2,自引:1,他引:1  
In the internet-based e-business environment, most business data are distributed, heterogeneous and private. To achieve true business intelligence, mining large amounts of distributed data is necessary. Through a thorough literature review, this paper identifies four main issues in distributed data mining (DDM) systems for e-business and classifies modern DDM systems into three classes with representative samples. To address these identified issues, this paper proposes a novel DDM model named DRHPDM (Data source Relevance-based Hierarchical Parallel Distributed data mining Model). In addition, to improve the quality of the final result, the data sources are divided into a centralized mining layer and a distributed mining layer, according to their relevance. To improve the openness, cross-platform ability, and intelligence of the DDM system, web service and multi-agent technologies are adopted. The feasibility of DRHPDM was verified by building a prototype system and applying it to a web usage mining scenario.  相似文献   

10.
Multi-agent systems (MAS) offer an architecture for distributed problem solving. Distributed data mining (DDM) algorithms focus on one class of such distributed problem solving tasks—analysis and modeling of distributed data. This paper offers a perspective on DDM algorithms in the context of multi-agents systems. It discusses broadly the connection between DDM and MAS. It provides a high-level survey of DDM, then focuses on distributed clustering algorithms and some potential applications in multi-agent-based problem solving scenarios. It reviews algorithms for distributed clustering, including privacy-preserving ones. It describes challenges for clustering in sensor-network environments, potential shortcomings of the current algorithms, and future work accordingly. It also discusses confidentiality (privacy preservation) and presents a new algorithm for privacy-preserving density-based clustering.  相似文献   

11.
Structural bioinformatics applies computational methods to analyze and model three-dimensional molecular structures. There is a huge number of applications available to work with structural data on large scale. Using these tools on distributed computing infrastructures (DCIs), however, is often complicated due to a lack of suitable interfaces. The MoSGrid (Molecular Simulation Grid) science gateway provides an intuitive user interface to several widely-used applications for structural bioinformatics, molecular modeling, and quantum chemistry. It ensures the confidentiality, integrity, and availability of data via a granular security concept, which covers all layers of the infrastructure. The security concept applies SAML (Security Assertion Markup Language) and allows trust delegation from the user interface layer across the high-level middleware layer and the Grid middleware layer down to the HPC facilities. SAML assertions had to be integrated into the MoSGrid infrastructure in several places: the workflow-enabled Grid portal WS-PGRADE (Web Services Parallel Grid Runtime and Developer Environment), the gUSE (Grid User Support Environment) DCI services, and the cloud file system XtreemFS. The presented security infrastructure allows a single sign-on process to all involved DCI components and, therefore, lowers the hurdle for users to utilize large HPC infrastructures for structural bioinformatics.  相似文献   

12.
Distributed data mining implements techniques for analyzing data on distributed computing systems by exploiting data distribution and parallel algorithms. The grid is a computing infrastructure for implementing distributed high‐performance applications and solving complex problems, offering effective support to the implementation and use of data mining and knowledge discovery systems. The Web Services Resource Framework has become the standard for the implementation of grid services and applications, and it can be exploited for developing high‐level services for distributed data mining applications. This paper describes how distributed data mining patterns, such as collective learning, ensemble learning, and meta‐learning models, can be implemented as Web Services Resource Framework mining services by exploiting the grid infrastructure. The goal of this work was to design a distributed architectural model that can be exploited for different distributed mining patterns deployed as grid services for the analysis of dispersed data sources. In order to validate such an approach, we presented also the implementation of two clustering algorithms on the developed architecture. In particular, the distributed k‐means and distributed expectation maximization were exploited as pilot examples to show the suitability of the implemented service‐oriented framework. An extensive evaluation of its performance was provided. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

13.
Distribution of data and computation allows for solving larger problems and executing applications that are distributed in nature. The grid is a distributed computing infrastructure that enables coordinated resource sharing within dynamic organizations consisting of individuals, institutions, and resources. The grid extends the distributed and parallel computing paradigms allowing for resource negotiation and dynamical allocation, heterogeneity, open protocols, and services. Grid environments can be used both for compute-intensive tasks and data intensive applications by exploiting their resources, services, and data access mechanisms. Data mining algorithms and knowledge discovery processes are both compute and data intensive, therefore the grid can offer a computing and data management infrastructure for supporting decentralized and parallel data analysis. This paper discusses how grid computing can be used to support distributed data mining. Research activities in grid-based data mining and some challenges in this area are presented along with some promising future directions for developing grid-based distributed data mining.  相似文献   

14.
Distributed Data Mining in Peer-to-Peer Networks   总被引:9,自引:0,他引:9  
Peer-to-peer (P2P) networks are gaining popularity in many applications such as file sharing, e-commerce, and social networking, many of which deal with rich, distributed data sources that can benefit from data mining. P2P networks are, in fact,well-suited to distributed data mining (DDM), which deals with the problem of data analysis in environments with distributed data,computing nodes,and users. This article offers an overview of DDM applications and algorithms for P2P environments,focusing particularly on local algorithms that perform data analysis by using computing primitives with limited communication overhead. The authors describe both exact and approximate local P2P data mining algorithms that work in a decentralized and communication-efficient manner.  相似文献   

15.
Analysis and Provision of QoS for Distributed Grid Applications   总被引:5,自引:0,他引:5  
Grid computing provides the infrastructure necessary to access and use distributed resources as part of virtual organizations. When used in this way, Grid computing makes it possible for users to participate in collaborative and distributed applications such as tele-immersion, visualization, and computational simulation. Some of these applications operate in a collaborative mode, requiring data to be stored and delivered in a timely manner. This class of applications must adhere to stringent real-time constraints and Quality-of-Service (QoS) requirements. A QoS management approach is therefore required to orchestrate and guarantee the timely interaction between such applications and services. We discuss the design and a prototype implementation of a QoS system, and demonstrate how we enable Grid applications to become QoS compliant. We validate this approach through a case study of an image processing task derived from a nanoscale structures application.  相似文献   

16.
Semantics and knowledge grids: building the next-generation grid   总被引:2,自引:0,他引:2  
Just as the Internet is shifting its focus from information and communication to a knowledge delivery infrastructure, we see the Grid moving from computation and data management to a pervasive, worldwide knowledge management infrastructure. We have the technology to store and access data, but we seem to lack the ability to transform data tombs into useful data and extract knowledge from them. We review some of the current and future technologies that will impact the architecture, computational model, and applications of future grids. We attempt to forecast the evolution of computational grids into what we call the next-generation grid, with a particular focus on the use of semantics and knowledge discovery techniques and services. We propose a comprehensive software architecture for the next-generation grid, which integrates currently available services and components in Semantic Web, Semantic Grid, P2P, and ubiquitous systems. We'll also discuss a case study that shows how some new technologies can improve grid applications.  相似文献   

17.
杨洁  赵政 《微处理机》2007,28(1):43-45
TCP协议广泛应用于可靠的端到端的拥塞控制网络通信,也是应用于几乎所有分布式计算网络的协议。随着网格技术的发展,广域网的性能成为其中关键的组成部分,但操作系统中与TCP相关的协议栈仍然仅适用于昨天的网络速度,很多高性能的网格只使用了带宽的一小部分。先介绍了一些优化网络性能的调节缓冲区大小的方法并讨论了它们的特点,最后详细说明一个合理可行的网格通用传输机制。  相似文献   

18.
Setting up and deploying complex applications on a Grid infrastructure is still challenging and the programming models are rapidly evolving. Efficiently exploiting Grid parallelism is often not straight forward. In this paper, we report on the techniques used for deploying applications on the EGEE production Grid through four experiments coming from completely different scientific areas: nuclear fusion, astrophysics and medical imaging. These applications have in common the need for manipulating huge amounts of data and all are computationally intensive. All the cases studied show that the deployment of data intensive applications require the development of more or less elaborated application-level workload management systems on top of the gLite middleware to efficiently exploit the EGEE Grid resources. In particular, the adoption of high level workflow management systems eases the integration of large scale applications while exploiting Grid parallelism transparently. Different approaches for scientific workflow management are discussed. The MOTEUR workflow manager strategy to efficiently deal with complex data flows is more particularly detailed. Without requiring specific application development, it leads to very significant speed-ups.  相似文献   

19.
The ever growing needs for computation power and accesses to critical resources have launched in a very short time a large number of grid projects and many realizations have been done on dedicated network infrastructures. On Internet-based infrastructures, however, there are very few distributed or interactive applications (MPI, DIS, HLA, remote visualization) because of insufficient end-to-end performances (bandwidth, latency, for example) to support such an interactivity. For the moment, computing resources and network resources are viewed separately in the Grid architecture and we believe this is the main bottleneck for achieving end-to-end performances. In this paper, we promote the idea of a Grid infrastructure able to adapt to the applications needs and thus define the idea of application-aware Grid infrastructures where the network infrastructure is tightly involved in both the communication and processing process. We report on our early experiences in building application-aware components based on active networking technologies for providing a low latency and a low overhead multicast framework for applications running on a computational Grid. Performance results from both simulations and implementation prototypes confirm that introducing application-aware components at specific location in the network infrastructure can succeed in providing not only performances for the end-users but also new perspectives in building a communication framework for computational Grids.  相似文献   

20.
Over the last decade, Grid computing paved the way for a new level of large scale distributed systems. This infrastructure made it possible to securely and reliably take advantage of widely separated computational resources that are part of several different organizations. Resources can be incorporated to the Grid, building a theoretical virtual supercomputer. In time, cloud computing emerged as a new type of large scale distributed system, inheriting and expanding the expertise and knowledge that have been obtained so far. Some of the main characteristics of Grids naturally evolved into clouds, others were modified and adapted and others were simply discarded or postponed. Regardless of these technical specifics, both Grids and clouds together can be considered as one of the most important advances in large scale distributed computing of the past ten years; however, this step in distributed computing has came along with a completely new level of complexity. Grid and cloud management mechanisms play a key role, and correct analysis and understanding of the system behavior are needed. Large scale distributed systems must be able to self-manage, incorporating autonomic features capable of controlling and optimizing all resources and services. Traditional distributed computing management mechanisms analyze each resource separately and adjust specific parameters of each one of them. When trying to adapt the same procedures to Grid and cloud computing, the vast complexity of these systems can make this task extremely complicated. But large scale distributed systems complexity could only be a matter of perspective. It could be possible to understand the Grid or cloud behavior as a single entity, instead of a set of resources. This abstraction could provide a different understanding of the system, describing large scale behavior and global events that probably would not be detected analyzing each resource separately. In this work we define a theoretical framework that combines both ideas, multiple resources and single entity, to develop large scale distributed systems management techniques aimed at system performance optimization, increased dependability and Quality of Service (QoS). The resulting synergy could be the key to address the most important difficulties of Grid and cloud management.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号