首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
ESTIMATING DBLEARN'S POTENTIAL FOR KNOWLEDGE DISCOVERY IN DATABASES   总被引:1,自引:0,他引:1  
We propose a procedure for estimating DBLEARN's potential for knowledge discovery, given a relational database and concept hierarchies. This procedure is most useful for evaluating alternative concept hierarchies for the same database. The DBLEARN knowledge discovery program uses an attribute-oriented inductive-inference method to discover potentially significant high-level relationships in a database. A concept forest, with at most one concept hierarchy for each attribute, defines the possible generalizations that DBLEARN can make for a database. The potential for discovery in a database is estimated by examining the complexity of the corresponding concept forest. Two heuristic measures are defined based on the number, depth, and height of the interior nodes. Higher values for these measures indicate more complex concept forests and arguably more potential for discovery. Experimental results using a variety of concept forests and four commercial databases show that in practice both measures permit quite reliable decisions to be made; thus, the simplest may be most appropriate.  相似文献   

2.
A method for finding all deterministic and maximally general rules for a target classification is explained in detail and illustrated with examples. Maximally general rules are rules with minimal numbers of conditions. The method has been developed within the context of the rough sets model and is based on the concepts of a decision matrix and a decision function. The problem of finding all the rules is reduced to the problem of computing prime implicants of a group of associated Boolean expressions. The method is particularly applicable to identifying all potentially interesting deterministic rules in a knowledge discovery system but can also be used to produce possible rules or nondeterministic rules with decision probabilities, by adapting the method to the definitions of the variable precision rough sets model.  相似文献   

3.
4.
LEARNING IN RELATIONAL DATABASES: A ROUGH SET APPROACH   总被引:49,自引:0,他引:49  
Knowledge discovery in databases, or dala mining, is an important direction in the development of data and knowledge-based systems. Because of the huge amount of data stored in large numbers of existing databases, and because the amount of data generated in electronic forms is growing rapidly, it is necessary to develop efficient methods to extract knowledge from databases. An attribute-oriented rough set approach has been developed for knowledge discovery in databases. The method integrates machine-learning paradigm, especially learning-from-examples techniques, with rough set techniques. An attribute-oriented concept tree ascension technique is first applied in generalization, which substantially reduces the computational complexity of database learning processes. Then the cause-effect relationship among the attributes in the database is analyzed using rough set techniques, and the unimportant or irrelevant attributes are eliminated. Thus concise and strong rules with little or no redundant information can be learned efficiently. Our study shows that attribute-oriented induction combined with rough set theory provide an efficient and effective mechanism for knowledge discovery in database systems.  相似文献   

5.
Consistency and Completeness in Rough Sets   总被引:4,自引:0,他引:4  
Consistency and completeness are defined in the context of rough set theory and shown to be related to the lower approximation and upper approximation, respectively. A member of a composed set (union of elementary sets) that is consistent with respect to a concept, surely belongs to the concept. An element that is not a member of a composed set that is complete with respect to a concept, surely does not belong to the concept. A consistent rule and a complete rule are useful in addition to any other rules learnt to describe a concept. When an element satisfies the consistent rule, it surely belongs to the concept, and when it does not satisfy the complete rule, it surely does not belong to the concept. In other cases, the other learnt rules are used. The results in the finite universe are extended to the infinite universe, thus introducing a rough set model for the learning from examples paradigm. The results in this paper have application in knowledge discovery or learning from database environments that are inconsistent, but at the same time demand accurate and definite knowledge. This study of consistency and completeness in rough sets also lays the foundation for related work at the intersection of rough set theory and inductive logic programming.  相似文献   

6.
The architecture of an intelligent multistrategy assistant for knowledge discovery from facts, INLEN, is described and illustrated by an exploratory application. INLEN integrates a database, a knowledge base, and machine learning methods within a uniform user-oriented framework. A variety of machine learning programs are incorporated into the system to serve as high-levelknowledge generation operators (KGOs). These operators can generate diverse kinds of knowledge about the properties and regularities existing in the data. For example, they can hypothesize general rules from facts, optimize the rules according to problem-dependent criteria, determine differences and similarities among groups of facts, propose new variables, create conceptual classifications, determine equations governing numeric variables and the conditions under which the equations apply, deriving statistical properties and using them for qualitative evaluations, etc. The initial implementation of the system, INLEN 1b, is described, and its performance is illustrated by applying it to a database of scientific publications.  相似文献   

7.
一种改进的Bayesian网络结构学习算法   总被引:6,自引:2,他引:6  
  相似文献   

8.
极小极大规则学习及在决策树规则简化中的应用   总被引:3,自引:0,他引:3  
文中在粗糙集理论中的约简概念的启发下提出极小规则和极大规则的概念及极小极大规则学习。  相似文献   

9.
EXTRACTING LAWS FROM DECISION TABLES: A ROUGH SET APPROACH   总被引:10,自引:0,他引:10  
We present some methods, based on the rough set and Boolean reasoning approaches, for extracting laws from decision tables. First we discuss several procedures for decision rules synthesis from decision tables. Next we show how to apply some near-to-functional relations between data to data filtration. Two methods of searching for new classifiers (features) are described: searching for new classifiers in a given set of logical formulas, and searching for some functions approximating near-to-functional relations.  相似文献   

10.
The purpose of this work is to analyse the cognitive process of the domain theories in terms of the measurement theory to develop a computational machine learning approach for implementing it. As a result, the relational data mining approach, the authors proposed in the preceding books, was improved. We present the approach as an implementation of the cognitive process as the measurement theory perceived. We analyse the cognitive process in the first part of the paper and present the theory and method of the logically most powerful empirical theory discovery in the second. The theory is based on the notion of law-like rules, which conform to all the properties of laws of nature, namely generality, simplicity, maximum refutability and minimum number of parameters. This notion is defined for deterministic and probabilistic cases. Based on the method, the discovery system is developed. The system was successfully applied to many practical tasks.  相似文献   

11.
A Formalism for Relevance and Its Application in Feature Subset Selection   总被引:7,自引:0,他引:7  
Bell  David A.  Wang  Hui 《Machine Learning》2000,41(2):175-195
The notion of relevance is used in many technical fields. In the areas of machine learning and data mining, for example, relevance is frequently used as a measure in feature subset selection (FSS). In previous studies, the interpretation of relevance has varied and its connection to FSS has been loose. In this paper a rigorous mathematical formalism is proposed for relevance, which is quantitative and normalized. To apply the formalism in FSS, a characterization is proposed for FSS: preservation of learning information and minimization of joint entropy. Based on the characterization, a tight connection between relevance and FSS is established: maximizing the relevance of features to the decision attribute, and the relevance of the decision attribute to the features. This connection is then used to design an algorithm for FSS. The algorithm is linear in the number of instances and quadratic in the number of features. The algorithm is evaluated using 23 public datasets, resulting in an improvement in prediction accuracy on 16 datasets, and a loss in accuracy on only 1 dataset. This provides evidence that both the formalism and its connection to FSS are sound.  相似文献   

12.
DATA-BASED ACQUISITION AND INCREMENTAL MODIFICATION OF CLASSIFICATION RULES   总被引:18,自引:0,他引:18  
One of the most important problems in the application of knowledge discovery systems is the identification and subsequent updating of rules. Many applications require that the classification rules be derived from data representing exemplar occurrences of data patterns belonging to different classes. The problem of identifying such rules in data has been researched within the field of machine learning, and more recently in the context of rough set theory and knowledge discovery in databases. In this paper we present an incremental methodology for finding all maximally generalized rules and for adaptive modification of them when new data become available. The methodology is developed in the context of rough set theory and is based on the earlier idea of discernibility matrix introduced by Skowron.  相似文献   

13.
Data Mining in Large Databases Using Domain Generalization Graphs   总被引:5,自引:0,他引:5  
Attribute-oriented generalization summarizes the information in a relational database by repeatedly replacing specific attribute values with more general concepts according to user-defined concept hierarchies. We introduce domain generalization graphs for controlling the generalization of a set of attributes and show how they are constructed. We then present serial and parallel versions of the Multi-Attribute Generalization algorithm for traversing the generalization state space described by joining the domain generalization graphs for multiple attributes. Based upon a generate-and-test approach, the algorithm generates all possible summaries consistent with the domain generalization graphs. Our experimental results show that significant speedups are possible by partitioning path combinations from the DGGs across multiple processors. We also rank the interestingness of the resulting summaries using measures based upon variance and relative entropy. Our experimental results also show that these measures provide an effective basis for analyzing summary data generated from relational databases. Variance appears more useful because it tends to rank the less complex summaries (i.e., those with few attributes and/or tuples) as more interesting.  相似文献   

14.
知识发现的理论及其实现   总被引:3,自引:0,他引:3  
洪家荣 《自动化学报》1993,19(6):663-669
本文提出知识的一种理论,该理论基于对人类知识发现认识过程的模拟,包括经验数据分类,各类数据的概念抽象,及概念间蕴涵关系的发现等步骤,文中介绍了实现这个理论的一个集成化学习系统KD3,以及它在自动建立专家系统知识库等方面的应用。  相似文献   

15.
Knowledge discovery from image data is a multi-step iterative process. This paper describes the procedure we have used to develop a knowledge discovery system that classifies regions of the ocean floor based on textural features extracted from acoustic imagery. The image is subdivided into rectangular cells called texture elements (texels); a gray-level co-occurence matrix (GLCM) is computed for each texel in four directions. Secondary texture features are then computed from the GLCM resulting in a feature vector representation of each texel instance. Alternatively, a region-growing approach is used to identify irregularly shaped regions of varying size which have a homogenous texture and for which the texture features are computed. The Bayesian classifier Autoclass is used to cluster the instances. Feature extraction is one of the major tasks in knowledge discovery from images. The initial goal of this research was to identify regions of the image characterized by sand waves. Experiments were designed to use expert judgements to select the most effective set of features, to identify the best texel size, and to determine the number of meaningful classes in the data. The region-growing approach has proven to be more successful than the texel-based approach. This method provides a fast and accurate method for identifying provinces in the ocean floor of interest to geologists.  相似文献   

16.
The Graph Theorist, GT, is a system that performs mathematical research in graph theory. From the definitions in its input knowledge base, GT constructs examples of mathematical concepts, conjectures and proves mathematical theorems about concepts, and discovers new concepts. Discovery is driven both by examples and by definitional form. The discovery processes construct a semantic net that links all of GT's concepts together.
Each definition is an algebraic expression whose semantic interpretation is a stylized algorithm to generate a class of graphs correctly and completely. From a knowledge base of these concept definitions, GT is able to conjecture and prove such theorems as "The set of acyclic, connected graphs is precisely the set of trees" and "There is no odd-regular graph on an odd number of vertices." GT explores new concepts either to develop an area of knowledge or to link a newly acquired concept into a pre-existing knowledge base. New concepts arise from the specialization of an existing concept, the generalization of an existing concept, and the merger of two or more existing concepts. From an initial knowledge base containing only the definition of "graph," GT discovers such concepts as acyclic graphs, connected graphs, and bipartite graphs.  相似文献   

17.
Extract Rules by Using Rough Set and Knowledge-Based NN   总被引:2,自引:0,他引:2       下载免费PDF全文
In this paper,rough set theory is used to extract roughly-correct inference rules from information systems.Based on this idea,the learning algorithm ERCR is presented.In order to refine the learned roughly-correct inference rules,the knowledge-based neural network is used.The method presented here sufficiently combines the advantages of rough set theory and neural network.  相似文献   

18.
数据库知识发现系统及领域知识在其中的作用   总被引:1,自引:0,他引:1  
论述了一种理想化的知识发现系统模型 ,及其各组成部分的功能。进一步讨论了领域知识在其中的重要作用。  相似文献   

19.
项婧  任劼 《计算机工程与设计》2006,27(15):2905-2908
近年来,需要深入研究癌症细胞的基因表达技术正在不断增多。机器学习算法已经被广泛用于当今世界的许多领域,但是却很少应用于生物信息领域。系统研究了决策树的生成、修剪的原理和算法以及其它与决策树相关的问题;并且根据CAMDA2000(critical assessment of mieroarray data analysis)提供的急性淋巴白血病(ALL)和急性骨髓白血病(AML)数据集,设计并实现了一个基于ID3算法的决策树分类器,并利用后剪枝算法简化决策树。最后通过实验验证算法的有效性,实验结果表明利用该决策树分类器对白血病微阵列实验数据进行判别分析,分类准确率很高,证明了决策树算法在医学数据挖掘领域有着广泛的应用前景。  相似文献   

20.
This paper presents an infrastructure and methodology to extract conceptual structure from Web pages, which are mainly constructed by HTML tags and incomplete text. Human beings can easily read Web pages and grasp an idea about the conceptual structure of underlying data, but cannot handle excessive amounts of data due to lack of patience and time. However, it is extremely difficult for machines to accurately determine the content of Web pages due to lack of understanding of context and semantics. Our work provides a methodology and infrastructure to process Web data and extract the underlying conceptual structure, in particular relationships between ontological concepts using Inductive Logic Programming in order to help with automating the processing of the excessive amount of Web data by capturing its conceptual structures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号