共查询到20条相似文献,搜索用时 109 毫秒
1.
提出了基于数据抽取器的知识发现模型。在模型中,将知识发现过程分成数据预处理、数据抽取、数据挖掘和结果分析四个阶段。该模型利用标准的SQL语言构造数据抽取器,为不同的学习算法准备数据,减少数据挖掘算法对数据库直接调用的次数,避免了直接对大型数据库的数据进行调用,使得对大型数据库进行快速数据挖掘成为可能。可以加快知识发现过程,提高数据挖掘效率,实现对于大型数据库的知识发现。最后设计了SQL-C4.5算法,该算法实现了利用数据抽取器为决策树算法C4.5抽取必要的统计数据,实现了C4.5决策树的构建。 相似文献
2.
3.
数据挖掘是从大量原始数据中抽取隐藏知识的过程。大部分数据挖掘工具采用规则发现和决策树分类技术来发现数据模式和规则,其核心是归纳算法。与传统统计方法相比,基于机器学习技术得到的分类结果具有较好的可解释性。在针对特定的数据集进行数据挖掘时,如果缺乏相应的领域知识,用户或决策者就很难确定选择何种归纳算法。因此,需要尝试各种算法。借助MLC++,决策者能够轻而易举地比较不同分类算法对特定数据集的有效性,从而选择合适的分类算法。同时,系统开发人员也可以利用MLC++设计各种混合算法。 相似文献
4.
高祥涛 《计算机工程与应用》2009,45(5):243-248
研究探索了包括目标理解、准备数据、数据预处理、建立模型、评估解释、知识应用等水文数据挖掘的六个过程。并采用公共的数据处理和挖掘算法,实现各过程之间数据无缝连接,形成了松散耦合的水文数据挖掘系统体系框架。在实施水文数据挖掘过程中,将数据挖掘的一些数据处理方式应用到了水文领域,同时也采用了水文领域中的一些数据处理技术,实现了数据挖掘领域与专业领域的数据处理和评价方式融合。通过选取江苏省国家水文数据库中的两个不同代表性区域的水文资料,严格按照水文数据挖掘的过程控制,以水文相似年查找为突破口,实施数据挖掘。在全方位地对结果分析、对比和评价后发现,以数据挖掘的方法,采用聚类分析中分层聚类的凝聚算法,进行水文相似年查找所发现的结果与水文领域专家知识基本相符。 相似文献
5.
6.
《计算机辅助设计与图形学学报》2016,(1)
数据挖掘是一种从大量数据中发现信息的过程,其大量依赖自动算法的特质,使得用户难以对数据和算法过程本身直观地进行理解、探索和优化.近年来,随着可视化领域的蓬勃发展,有很多工作开始探究如何使用可视化方法辅助数据挖掘过程,使用户更加直观地理解数据,并对数据和算法和进行探索.文中首先对数据挖掘和可视化在知识提取流程进行比较分析,并从可视化增强的通用数据挖掘方法和面向应用场景的方法 2个方面对近年相关技术进行梳理总结,并依托一些相关主题的国际会议内容指出需要进一步探索的方向. 相似文献
7.
数据库中知识发现的处理过程模型的研究 总被引:6,自引:1,他引:5
1 前言数据库中的知识发现KDD(Knowledge Discov-ery in Database)是近年来随着数据库和人工智能技术的发展而出现的,它是从大量数据中提取出可信的、新颖的、有效的并能被人理解的模式的高级处理过程。它主要采用机器学习算法或统计方法进行知识学习,一般将KDD中进行知识学习的阶段称为数据挖掘(Data Mining)。数据挖掘是KDD中的一个非常重要的处理步骤。人们往往不加区分地使用两者。一般来说,在工程应用领域多称数据挖掘,而在研究领域人们则多称为数据库中的知识发现。人们进行的关于KDD的研究是为了将知识发现的研究成果应用于实际数据处理中,为科学的决策提供支持。正是因为这样,目前所进行的关于 相似文献
8.
9.
大型事务数据库中的一种快速的规则挖掘算法 总被引:1,自引:0,他引:1
1 引言数据挖掘(Data Mining),也称为数据库中知识发现KDD,是指发掘隐藏在堆积如山的数据中的真知灼见,这基本上正在变成一种商业上非做不可的事情。关联规则(As-sociation Rules)是数据挖掘的重要研究内容,目前的绝大部分关联规则挖掘算法一般都分为两个阶段:①频繁项目集的发现;②规则的产生。算法的计算工作量主要集中在第一阶段上,因此,如何快速确定频繁项目集是算法效率的关键,在这方面已有许多工作与成果。但总的来讲,许多研究都是在Apriori算法或其派生算法的基础上进行的。这些算法或多或少存在如下两个问题:①算法必须耗费大量的时间处理规模巨大的候选项目集;②算法必须多次重复机械地扫描 相似文献
10.
基于领域本体的数据挖掘服务发现算法 总被引:3,自引:0,他引:3
随着数据库的广泛应用,数据挖掘技术面临数据的海量化、分布化问题。采用面向服务的架构构造数据挖掘系统是解决该问题的方法之一。提出一种基于领域本体的数据挖掘服务发现算法,通过引入领域知识,定义数据挖掘本体,有效地解决了数据挖掘服务发现问题。首先给出了结合领域知识的数据挖掘服务发现框架,提出了数据挖掘方法本体和质量本体的定义,并给出了根据领域知识及用户需求进行数据挖掘服务发现的算法,为数据挖掘服务选择提供了较为完善的方案。 相似文献
11.
Toward intelligent assistance for a data mining process: an ontology-based approach for cost-sensitive classification 总被引:3,自引:0,他引:3
Bernstein A. Provost F. Hill S. 《Knowledge and Data Engineering, IEEE Transactions on》2005,17(4):503-518
A data mining (DM) process involves multiple stages. A simple, but typical, process might include preprocessing data, applying a data mining algorithm, and postprocessing the mining results. There are many possible choices for each stage, and only some combinations are valid. Because of the large space and nontrivial interactions, both novices and data mining specialists need assistance in composing and selecting DM processes. Extending notions developed for statistical expert systems we present a prototype intelligent discovery assistant (IDA), which provides users with 1) systematic enumerations of valid DM processes, in order that important, potentially fruitful options are not overlooked, and 2) effective rankings of these valid processes by different criteria, to facilitate the choice of DM processes to execute. We use the prototype to show that an IDA can indeed provide useful enumerations and effective rankings in the context of simple classification processes. We discuss how an IDA could be an important tool for knowledge sharing among a team of data miners. Finally, we illustrate the claims with a demonstration of cost-sensitive classification using a more complicated process and data from the 1998 KDDCUP competition. 相似文献
12.
传统的数据挖掘方法存在效率低和非智能化等不足,难以满足网络环境下对海量数据的挖掘需要。文中从数据挖掘技术与Agent技术的特征出发,论述了两者结合的优势,将Agent技术应用到数据挖掘中,提出了基于Agent的数据挖掘模型,并阐述了该模型的组织结构。该模型能够降低问题的复杂性,减少人工的参与,在很大程度上提高了数据挖掘的智能性和高效性。 相似文献
13.
传统的数据挖掘方法存在效率低和非智能化等不足,难以满足网络环境下对海量数据的挖掘需要。文中从数据挖掘技术与Agent技术的特征出发,论述了两者结合的优势,将Agent技术应用到数据挖掘中,提出了基于Agent的数据挖掘模型,并阐述了该模型的组织结构。该模型能够降低问题的复杂性,减少人工的参与,在很大程度上提高了数据挖掘的智能性和高效性。 相似文献
14.
采用限制与多维技术的数据采掘 总被引:1,自引:0,他引:1
针对当今数据采掘中效率不够高的问题,提出了采用限制与多维技术来进行数据采掘,讨论了哪些种类的限制能运用到采掘过程中,设计了一个数据采掘系统结构。 相似文献
15.
Tzung-Pei Hong Kuei-Ying Lin Shyue-Liang Wang 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2006,10(10):925-932
Many researchers in database and machine learning fields are primarily interested in data mining because it offers opportunities to discover useful information and important relevant patterns in large databases. Most previous studies have shown how binary valued transaction data may be handled. Transaction data in real-world applications usually consist of quantitative values, so designing a sophisticated data-mining algorithm able to deal with various types of data presents a challenge to workers in this research field. In the past, we proposed a fuzzy data-mining algorithm to find association rules. Since sequential patterns are also very important for real-world applications, this paper thus focuses on finding fuzzy sequential patterns from quantitative data. A new mining algorithm is proposed, which integrates the fuzzy-set concepts and the AprioriAll algorithm. It first transforms quantitative values in transactions into linguistic terms, then filters them to find sequential patterns by modifying the AprioriAll mining algorithm. Each quantitative item uses only the linguistic term with the maximum cardinality in later mining processes, thus making the number of fuzzy regions to be processed the same as the number of the original items. The patterns mined out thus exhibit the sequential quantitative regularity in databases and can be used to provide some suggestions to appropriate supervisors. 相似文献
16.
A dynamic predictive-control model of a nonlinear and temporal process is considered. Evolutionary computation and data mining algorithms are integrated for solving the model. Data-mining algorithms learn dynamic equations from process data. Evolutionary algorithms are then applied to solve the optimization problem guided by the knowledge extracted by data-mining algorithms. Several properties of the optimization model are shown in detail, in particular, a selection of regressors, time delays, prediction and control horizons, and weights. The concepts proposed in this paper are illustrated with an industrial case study in combustion process. 相似文献
17.
《Applied Artificial Intelligence》2013,27(5-6):545-561
This paper presents a new means of selecting quality data for mining multiple data sources. Traditional data-mining strategies obtain necessary data from internal and external data sources and pool all the data into a huge homogeneous dataset for discovery. In contrast, our data-mining strategy identifies quality data from (internal and external) data sources for a mining task. A framework is advocated for generating quality data. Experimental results demonstrate that application of this new data collecting technique can not only identify quality data, but can also efficiently reduce the amount of data that must be considered during mining. 相似文献
18.
19.
Constraint-based, multidimensional data mining 总被引:2,自引:0,他引:2
Although many data-mining methodologies and systems have been developed in recent years, the authors contend that by and large, present mining models lack human involvement, particularly in the form of guidance and user control. They believe that data mining is most effective when the computer does what it does best-like searching large databases or counting-and users do what they do best, like specifying the current mining session's focus. This division of labor is best achieved through constraint-based mining, in which the user provides restraints that guide a search. Mining can also be improved by employing a multidimensional, hierarchical view of the data. Current data warehouse systems have provided a fertile ground for systematic development of this multidimensional mining. Together, constraint-based and multidimensional techniques can provide a more ad hoc, query-driven process that effectively exploits the semantics of data than those supported by current standalone data-mining systems 相似文献
20.
When using data-mining tools to analyze big data, users often need tools to support the understanding of individual data attributes and control the analysis progress. This requires the integration of data-mining algorithms with interactive tools to manipulate data and analytical process. This is where visual analytics can help. More than simple visualization of a dataset or some computation results, visual analytics provides users an environment to iteratively explore different inputs or parameters and see the corresponding results. In this research, we explore a design of progressive visual analytics to support the analysis of categorical data with a data-mining algorithm, Apriori. Our study focuses on executing data mining techniques step-by-step and showing intermediate result at every stage to facilitate sense-making. Our design, called Pattern Discovery Tool, targets for a medical dataset. Starting with visualization of data properties and immediate feedback of users’ inputs or adjustments, Pattern Discovery Tool could help users detect interesting patterns and factors effectively and efficiently. Afterward, further analyses such as statistical methods could be conducted to test those possible theories. 相似文献