首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
数据挖掘工具DMTools的设计与实现   总被引:3,自引:0,他引:3       下载免费PDF全文
介绍了一个通用的数据工具DMTools。它实现了基于数据库的知识发现的主要过程,可视分析,数据预处理,数据库的知识发现,数据挖掘,模型解释及模型评估算。主要介绍了这个系统的体系结构和各愉的功能。使用本工具。可从各行业的历史业务数据库中挖掘出隐含的有价值的知识,用于决策支持。  相似文献   

2.
This research adopts a framework that synthesizes Knowledge Discovery in Database (KDD), Cross Industry Standard Process for Data Mining (CRISP-DM), and agile practices. The application of this framework is demonstrated through an institutional case study of three knowledge discovery projects: Persistence, Retention, and Donor projects. Results from the case study suggest that (a) interaction and iteration are foundations for the success of a knowledge discovery project, especially one with a strong business focus; (b) agile practices facilitate the interaction and iteration nature of a knowledge discovery project; (c) adding business understanding and deployment steps from CRISP-DM to KDD explicitly helps data miners stay focused on the ultimate goals of the project—the needs of the business and the users.  相似文献   

3.
EDM: A general framework for Data Mining based on Evidence Theory   总被引:16,自引:0,他引:16  
Data Mining or Knowledge Discovery in Databases [1, 15, 23] is currently one of the most exciting and challenging areas where database techniques are coupled with techniques from Artificial Intelligence and mathematical sub-disciplines to great potential advantage. It has been defined as the non-trivial extraction of implicit, previously unknown and potentially useful information from data. A lot of research effort is being directed towards building tools for discovering interesting patterns which are hidden below the surface in databases. However, most of the work being done in this field has been problem-specific and no general framework has yet been proposed for Data Mining. In this paper we seek to remedy this by proposing, EDM — Evidence-based Data Mining — a general framework for Data Mining based on Evidence Theory.

Having a general framework for Data Mining offers a number of advantages. It provides a common method for representing knowledge which allows prior knowledge from the user or knowledge discoveryd by another discovery process to be incorporated into the discovery process. A common knowledge representation also supports the discovery of meta-knowledge from knowledge discovered by different Data Mining techniques. Furthermore, a general framework can provide facilities that are common to most discovery processes, e.g. incorporating domain knowledge and dealing with missing values.

The framework presented in this paper has the following additional advantages. The framework is inherently parallel. Thus, algorithms developed within this framework will also be parallel and will therefore be expected to be efficient for large data sets — a necessity as most commercial data sets, relational or otherwise, are very large. This is compounded by the fact that the algorithms are complex. Also, the parallelism within the framework allows its use in parallel, distributed and heterogeneous databases. The framework is easily updated and new discovery methods can be readily incorporated within the framework, making it ‘general’ in the functional sense in addition to the representational sense considered above. The framework provides an intuitive way of dealing with missing data during the discovery process using the concept of Ignorance borrowed from Evidence Theory.

The framework consists of a method for representing data and knowledge, and methods for data manipulation or knowledge discovery. We suggest an extension of the conventional definition of mass functions in Evidence Theory for use in Data Mining, as a means to represent evidence of the existence of rules in the database. The discovery process within EDM consists of a series of operations on the mass functions. Each operation is carried out by an EDM operator. We provide a classification for the EDM operators based on the discovery functions performed by them and discuss aspects of the induction, domain and combination operator classes.

The application of EDM to two separate Data Mining tasks is also addressed, highlighting the advantages of using a general framework for Data Mining in general and, in particular, using one that is based on Evidence Theory.  相似文献   


4.
随着计算机软硬件技术、通讯技术以及信息处理技术的飞速发展与广泛应用,现代数据管理技术也在加速发展。本文从当前数据库技术面临的新的问题和主要挑战谈起,接下来从对象-关系数据库、XML及XML在数据管理中的应用、在Web、VC中的应用、语义Web等几个方面各有所侧重的对现代数据管理技术的研究现状和发展趋势进行了评述。  相似文献   

5.
This paper develops tests and validates a model for the antecedents of open source software (OSS) defects, using Data and Text Mining. The public archives of OSS projects are used to access historical data on over 5,000 active and mature OSS projects. Using domain knowledge and exploratory analysis, a wide range of variables is identified from the process, product, resource, and end-user characteristics of a project to ensure that the model is robust and considers all aspects of the system. Multiple Data Mining techniques are used to refine the model and data is enriched by the use of Text Mining for knowledge discovery from qualitative information. The study demonstrates the suitability of Data Mining and Text Mining for model building. Results indicate that project type, end-user activity, process quality, team size and project popularity have a significant impact on the defect density of operational OSS projects. Since many organizations, both for profit and not for profit, are beginning to use Open Source Software as an economic alternative to commercial software, these results can be used in the process of deciding what software can be reasonably maintained by an organization.  相似文献   

6.
Data Mining and Knowledge Discovery - Efficient and interpretable classification of time series is an essential data mining task with many real-world applications. Recently several dictionary- and...  相似文献   

7.
8.
Data Mining and Knowledge Discovery - A vast and growing literature on explaining deep learning models has emerged. This paper contributes to that literature by introducing a global gradient-based...  相似文献   

9.
Data Mining and Knowledge Discovery - Temporal graphs are structures which model relational data between entities that change over time. Due to the complex structure of data, mining statistically...  相似文献   

10.
This research explores a specific step in the Knowledge Discovery of Databases (KDD) process, Data Mining. The actual data mining process deals significantly with prediction, estimation, classification, pattern recognition and the development of association rules. Therefore, this analysis depends heavily on the accuracy of the database and on the chosen sample data to be used for model training and testing. Data mining is based upon searching the concatenation of multiple databases that usually contain some amount of missing data along with a variable percentage of inaccurate data, pollution, outliers and noise. The issue of missing data must be addressed as ignoring this problem can introduce bias into the models being evaluated and lead to inaccurate data mining conclusions. The objective of this research is to address the Effects of the Neural Network s-Sigmoid Function on KDD in the Presence of Imprecise Data using a three factor ANOVA test and Tukey's Honestly Significant Difference statistics.  相似文献   

11.
最近几年,知识发现研究的进展很快。目前,在知识发现领域图像数据知识发现形成了新的研究热点。本文介绍了基于Hilbert空间理论的图像知识发现模型IMDFSSM,采用模式(定义为Hilbert空间中的矢量)来定量地表征图像数据的知识表示和参与知识发现过程。然后用图像挖掘系统作为实例进行了验证,结果表明该模型对于图像数据的知识发现过程具有指导性作用。  相似文献   

12.
Li  Ziyue  Yan  Hao  Zhang  Chen  Tsung  Fugee 《Data mining and knowledge discovery》2022,36(4):1247-1278
Data Mining and Knowledge Discovery - Individual passenger travel patterns have significant value in understanding passenger’s behavior, such as learning the hidden clusters of locations,...  相似文献   

13.
Data Mining and Knowledge Discovery - Dealing with relational learning generally relies on tools modeling relational data. An undirected graph can represent these data with vertices depicting...  相似文献   

14.
Data Mining and Knowledge Discovery - Time series are ubiquitous in data mining applications. Similar to other types of data, annotations can be challenging to acquire, thus preventing from...  相似文献   

15.
文本知识发现:基于信息抽取的文本挖掘   总被引:11,自引:0,他引:11  
1.引言大家熟知,所谓“数据丰富但知识缺乏“的现状导致了数据挖掘(Data Mining)技术研究的兴起,数据挖掘又称数据库知识发现(Knowledge Discovery in Databases)是从海量的结构化信息中抽取或挖掘隐含信息和知识的重要方法和途径。数据挖掘技术已相当成熟。因为除了结构化的数据之外,在数字化信息中更多地存在大量自由、非结构化或半结构化的文本信息如新闻文章、电子书本、电子图书馆藏、Web页面内容、Email、文档数据库等,显然手工处理需要花费大量的人力物力,并且具有不确定性。所以出现了从文本中发现知  相似文献   

16.
Data Mining and Knowledge Discovery - Discrete Markov chains are frequently used to analyse transition behaviour in sequential data. Here, the transition probabilities can be estimated using...  相似文献   

17.
当今社会,数据无处不在,数据挖掘技术作为一种新的信息处理技术,从海量的数据中找出有潜在价值的数据规律或数据模型。用人工的方式难以实现这个目标,Weka是一种可用于数据挖掘的工具,数据挖掘用户可使用Weka执行数据预处理,分类,回归,聚类,关联规则等任务。以Weka自带的数据集为例,详细介绍作为易于使用的数据挖掘工具Weka的使用。  相似文献   

18.
Data Mining and Knowledge Discovery - We introduce and study knowledge drift (KD), a special form of concept drift that occurs in hierarchical classification. Under KD the vocabulary of concepts,...  相似文献   

19.
Data Mining and Knowledge Discovery - We study the problem of efficiently mining statistically-significant sequential patterns from large datasets, under different null models. We consider one null...  相似文献   

20.
Data Mining and Knowledge Discovery - In critical situations involving discrimination, gender inequality, economic damage, and even the possibility of casualties, machine learning models must be...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号