共查询到20条相似文献,搜索用时 15 毫秒
1.
在分析传统分布式数据挖掘平台不足的基础上,结合网格服务的思想,提出了基于网格服务的分布式数据挖掘平台,同时在该平台上,实现了分布式BP网络分类算法(GBPC-GS)。仿真实验表明,与单机环境相比,随着网格节点数增加,算法的平均耗时明显下降,同时CPU的负载也下降了约40%。 相似文献
2.
This paper gives an overview of two middleware systems that have been developed over the last 6 years to address the challenges involved in developing parallel and distributed implementations of data mining algorithms. FREERIDE (FRamework for Rapid Implementation of Data mining Engines) focuses on data mining in a cluster environment. FREERIDE is based on the observation that parallel versions of several well-known data mining techniques share a relatively similar structure, and can be parallelized by dividing the data instances (or records or transactions) among the nodes. The computation on each node involves reading the data instances in an arbitrary order, processing each data instance, and performing a local reduction. The reduction involves only commutative and associative operations, which means the result is independent of the order in which the data instances are processed. After the local reduction on each node, a global reduction is performed. This similarity in the structure can be exploited by the middleware system to execute the data mining tasks efficiently in parallel, starting from a relatively high-level specification of the technique. 相似文献
3.
广域网中存在地理上分布的海量的各种数据,分析和处理这些数据需要利用高性能的分布式并行处理系统,网格能够满足这种要求.知识网格就是使用基本的网格服务(通信服务、信息服务、授权服务和资源管理服务)去建立特定的分布式并行知识发现工具和服务.结合知识网格特点,讨论了知识网格的体系结构和支持知识挖掘应用的服务集.运用分布式数据挖掘的元学习模型,给出了利用知识网格提供的知识挖掘服务实现分布式数据挖掘的过程. 相似文献
4.
Distributed data mining implements techniques for analyzing data on distributed computing systems by exploiting data distribution and parallel algorithms. The grid is a computing infrastructure for implementing distributed high‐performance applications and solving complex problems, offering effective support to the implementation and use of data mining and knowledge discovery systems. The Web Services Resource Framework has become the standard for the implementation of grid services and applications, and it can be exploited for developing high‐level services for distributed data mining applications. This paper describes how distributed data mining patterns, such as collective learning, ensemble learning, and meta‐learning models, can be implemented as Web Services Resource Framework mining services by exploiting the grid infrastructure. The goal of this work was to design a distributed architectural model that can be exploited for different distributed mining patterns deployed as grid services for the analysis of dispersed data sources. In order to validate such an approach, we presented also the implementation of two clustering algorithms on the developed architecture. In particular, the distributed k‐means and distributed expectation maximization were exploited as pilot examples to show the suitability of the implemented service‐oriented framework. An extensive evaluation of its performance was provided. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献
5.
Distributed data mining: a survey 总被引:1,自引:1,他引:0
Li Zeng Ling Li Lian Duan Kevin Lu Zhongzhi Shi Maoguang Wang Wenjuan Wu Ping Luo 《Information Technology and Management》2012,13(4):403-409
Most data mining approaches assume that the data can be provided from a single source. If data was produced from many physically distributed locations like Wal-Mart, these methods require a data center which gathers data from distributed locations. Sometimes, transmitting large amounts of data to a data center is expensive and even impractical. Therefore, distributed and parallel data mining algorithms were developed to solve this problem. In this paper, we survey the-state-of-the-art algorithms and applications in distributed data mining and discuss the future research opportunities. 相似文献
6.
《Engineering Applications of Artificial Intelligence》2005,18(7):791-807
Multi-agent systems (MAS) offer an architecture for distributed problem solving. Distributed data mining (DDM) algorithms focus on one class of such distributed problem solving tasks—analysis and modeling of distributed data. This paper offers a perspective on DDM algorithms in the context of multi-agents systems. It discusses broadly the connection between DDM and MAS. It provides a high-level survey of DDM, then focuses on distributed clustering algorithms and some potential applications in multi-agent-based problem solving scenarios. It reviews algorithms for distributed clustering, including privacy-preserving ones. It describes challenges for clustering in sensor-network environments, potential shortcomings of the current algorithms, and future work accordingly. It also discusses confidentiality (privacy preservation) and presents a new algorithm for privacy-preserving density-based clustering. 相似文献
7.
Donghua Yang 《Information Sciences》2007,177(17):3574-3591
Query processing in data grids is a difficult issue due to the heterogeneous, unpredictable and volatile behaviors of the grid resources. Applying join operations on remote relations in data grids is a unique and interesting problem. However, to the best of our knowledge, little is done to date on multi-join query processing in data grids. An approach for processing multi-join queries is proposed in this paper. Firstly, a relation-reduction algorithm for reducing the sizes of operand relations is presented in order to minimize data transmission cost among grid nodes. Then, a method for scheduling computer nodes in data grids is devised to parallel process multi-join queries. Thirdly, an innovative method is developed to efficiently execute join operations in a pipeline fashion. Finally, a complete algorithm for processing multi-join queries is given. Analytical and experimental results show the effectiveness and efficiency of the proposed approach. 相似文献
8.
Data replication and consistency refer to the same data being stored in distributed sites, and kept consistent when one or
more copies are modified. A good file maintenance and consistency strategy can reduce file access times and access latencies,
and increase download speeds, thus reducing overall computing times. In this paper, we propose dynamic services for replicating
and maintaining data in grid environments, and directing replicas to appropriate locations for use. To address a problem with
the Bandwidth Hierarchy-based Replication (BHR) algorithm, a strategy for maintaining replicas dynamically, we propose the
Dynamic Maintenance Service (DMS). We also propose a One-way Replica Consistency Service (ORCS) for data grid environments,
a positive approach to resolving consistency maintenance issues we hope will strike a balance between improving data access
performance and replica consistency. Experimental results show that our services are more efficient than other strategies. 相似文献
9.
《Future Generation Computer Systems》2007,23(1):61-68
Centralized data mining techniques are widely used today for the analysis of large corporate and scientific data stored in databases. However, industry, science, and commerce fields often need to analyze very large datasets maintained over geographically distributed sites by using the computational power of distributed systems. The Grid can play a significant role in providing an effective computational infrastructure support for this kind of data mining. Similarly, the advent of multi-agent systems has brought us a new paradigm for the development of complex distributed applications. During the past decades, there have been several models and systems proposed to apply agent technology building distributed data mining (DDM). Through a combination of these two techniques, we investigated the critical issues to build DDM on Grid infrastructure and design an Agent Grid Intelligent Platform as a testbed. We also implement an integrated toolkit VAStudio for quickly developing agent-based DDM applications and compare its function with other systems. 相似文献
10.
Distributed data mining for e-business 总被引:1,自引:1,他引:1
In the internet-based e-business environment, most business data are distributed, heterogeneous and private. To achieve true
business intelligence, mining large amounts of distributed data is necessary. Through a thorough literature review, this paper
identifies four main issues in distributed data mining (DDM) systems for e-business and classifies modern DDM systems into
three classes with representative samples. To address these identified issues, this paper proposes a novel DDM model named
DRHPDM (Data source Relevance-based Hierarchical Parallel Distributed data mining Model). In addition, to improve the quality
of the final result, the data sources are divided into a centralized mining layer and a distributed mining layer, according
to their relevance. To improve the openness, cross-platform ability, and intelligence of the DDM system, web service and multi-agent
technologies are adopted. The feasibility of DRHPDM was verified by building a prototype system and applying it to a web usage
mining scenario. 相似文献
11.
构建了一个基于数据挖掘的分布式入侵检测系统模型。采用误用检测技术与异常检测技术相结合的方法,利用数据挖掘技术如关联分析、序列分析、分类分析、聚类分析等对安全审计数据进行智能检测,分析来自网络的入侵攻击或未授权的行为,提供实时报警和自动响应,实现一个自适应、可扩展的分布式入侵检测系统。实验表明,该模型对已知的攻击模式具有很高的检测率,对未知攻击模式也具有一定的检测能力。 相似文献
12.
This special issue provides a leading forum for timely, in-depth presentation of recent advances in algorithms, theories and
applications in temporal data mining. The selected papers underwent a rigorous refereeing and revision process. 相似文献
13.
生物信息学是以计算机为工具对生物信息进行存储、检索和分析的科学。人类基因组计划的启动和实施使得核酸、蛋白质数据迅速增长,如何从海量数据中获取有效信息成为生物信息学迫切要解决的问题。数据挖掘与生物信息学有很好的结合点,其在生物信息学领域的应用潜力日益受到人们的重视。简述了生物信息学和数据挖掘,并介绍了数据挖掘技术在生物信息学中的几类典型应用。 相似文献
14.
基于领域本体的数据挖掘服务发现算法 总被引:3,自引:0,他引:3
随着数据库的广泛应用,数据挖掘技术面临数据的海量化、分布化问题。采用面向服务的架构构造数据挖掘系统是解决该问题的方法之一。提出一种基于领域本体的数据挖掘服务发现算法,通过引入领域知识,定义数据挖掘本体,有效地解决了数据挖掘服务发现问题。首先给出了结合领域知识的数据挖掘服务发现框架,提出了数据挖掘方法本体和质量本体的定义,并给出了根据领域知识及用户需求进行数据挖掘服务发现的算法,为数据挖掘服务选择提供了较为完善的方案。 相似文献
15.
Effective data distribution techniques can significantly reduce the total execution time of a program on grid computing environments,
especially for data mining applications. In this paper, we describe a linear programming formulation for the data distribution
problem on grids. Furthermore, a heuristic method, named Heuristic Data Distribution Scheme (HDDS), is proposed to solve this
problem. We implement two types of data mining applications, Association Rule Mining and Decision Tree Construction, and conduct
experiments on grid testbeds. Experimental results show that data mining programs using the proposed HDDS to distribute data
could execute more efficiently than traditional schemes could. 相似文献
16.
Email is one of the most popular forms of communication nowadays, mainly due to its efficiency, low cost, and compatibility of diversified types of information. In order to facilitate better usage of emails and explore business potentials in emailing, various data mining techniques have been applied on email data. In this paper, we present a brief survey of the major research efforts on email mining. To emphasize the differences between email mining and general text mining, we organize our survey on five major email mining tasks, namely spam detection, email categorization, contact analysis, email network property analysis and email visualization. Those tasks are inherently incorporated into various usages of emails. We systematically review the commonly used techniques and also discuss the related software tools available. 相似文献
17.
This paper proposes a generic approach for designing vulnerability testing tools for web services, which includes the definition of the testing procedure and the tool components. Based on the proposed approach, we present the design of three innovative testing tools that implement three complementary techniques (improved penetration testing, attack signatures and interface monitoring, and runtime anomaly detection) for detecting injection vulnerabilities, thus offering an extensive support for different scenarios. A case study has been designed to demonstrate the tools for the particular case of SQL Injection vulnerabilities. The experimental evaluation demonstrates that the tools can effectively be used in different scenarios and that they outperform well-known commercial tools by achieving higher detection coverage and lower false-positive rates. 相似文献
18.
19.
20.
Feng Zhao Chris Bailey-Kellogg Xingang Huang Iván Ordóñez 《New Generation Computing》1999,17(4):333-347
This paper describes problems, challenges, and opportunities forintelligent simulation of physical systems. Prototype intelligent simulation tools have been constructed for interpreting massive data sets from physical fields and for designing engineering systems. We identify the characteristics of intelligent simulation and describe several concrete application examples. These applications, which include weather data interpretation, distributed control optimization, and spatio-temporal diffusion-reaction pattern analysis, demonstrate that intelligent simulation tools are indispensable for the rapid prototyping of application programs in many challenging scientific and engineering domains. 相似文献