首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
2.
一种基于粗糙集理论的最简规则挖掘方法   总被引:4,自引:0,他引:4  
赛煜  王海洋 《计算机工程》2003,29(20):77-79
提出了一种基于粗糙集理论的最简规则挖掘方法,它是一个采用基于分类正确度的粗糙集模型进行多概念分类规则挖掘的新方法,能有效处理决策表的不一致性,采用启发式算法,挖掘出满足给定精确度的最简产生式规则知识。用多个UCI数据集对算法进行了测试,并且与著名的Rosetta软件进行实验对比,结果说明此方法大大提高了总的数据约简量,可以有效地简化最终得到的规则知识。  相似文献   

3.
将Rough集理论应用于规则归纳系统,提出了一种基于粗糙集获取规则知识库的增量式学习方法,能够有效处理决策表中不一致情形,采用启发式算法获取决策表的最简规则,当新对象加入时在原有规则集基础上进行规则知识库的增量式更新,避免了为更新规则而重新运行规获取算法。并用UCI中多个数据集从规则集的规则数目、数据浓缩率、预测能力等指标对该算法进行了测试。实验表明了该算法的有效性。  相似文献   

4.
This paper deals with learning first-order logic rules from data lacking an explicit classification predicate. Consequently, the learned rules are not restricted to predicate definitions as in supervised inductive logic programming. First-order logic offers the ability to deal with structured, multi-relational knowledge. Possible applications include first-order knowledge discovery, induction of integrity constraints in databases, multiple predicate learning, and learning mixed theories of predicate definitions and integrity constraints. One of the contributions of our work is a heuristic measure of confirmation, trading off novelty and satisfaction of the rule. The approach has been implemented in the Tertius system. The system performs an optimal best-first search, finding the k most confirmed hypotheses, and includes a non-redundant refinement operator to avoid duplicates in the search. Tertius can be adapted to many different domains by tuning its parameters, and it can deal either with individual-based representations by upgrading propositional representations to first-order, or with general logical rules. We describe a number of experiments demonstrating the feasibility and flexibility of our approach.  相似文献   

5.
The paper presents results of application of a rule induction and pruning algorithm for classification of a microseismic hazard sate in coal mines. Due to imbalanced distribution of examples describing states “hazardous” and “safe”, the special algorithm was used for induction and rule pruning. The algorithm selects optimal parameters‘ values influencing rule induction and pruning based on training and tuning sets. A rule quality measure which decides about a form and classification abilities of rules that are induced is the basic parameter of the algorithm. The specificity and sensitivity of a classifier were used to evaluate its quality. Conducted tests show that the admitted method of rules induction and classifier’s quality evaluation enables to get better results of classification of microseismic hazards than by methods currently used in mining practice. Results obtained by the rules-based classifier were also compared with results got by a decision tree induction algorithm and by a neuro-fuzzy system.  相似文献   

6.
A hybrid coevolutionary algorithm for designing fuzzy classifiers   总被引:1,自引:0,他引:1  
Rule learning is one of the most common tasks in knowledge discovery. In this paper, we investigate the induction of fuzzy classification rules for data mining purposes, and propose a hybrid genetic algorithm for learning approximate fuzzy rules. A novel niching method is employed to promote coevolution within the population, which enables the algorithm to discover multiple rules by means of a coevolutionary scheme in a single run. In order to improve the quality of the learned rules, a local search method was devised to perform fine-tuning on the offspring generated by genetic operators in each generation. After the GA terminates, a fuzzy classifier is built by extracting a rule set from the final population. The proposed algorithm was tested on datasets from the UCI repository, and the experimental results verify its validity in learning rule sets and comparative advantage over conventional methods.  相似文献   

7.
Data-driven discovery of quantitative rules in relational databases   总被引:9,自引:0,他引:9  
A quantitative rule is a rule associated with quantitative information which assesses the representativeness of the rule in the database. An efficient induction method is developed for learning quantitative rules in relational databases. With the assistance of knowledge about concept hierarchies, data relevance, and expected rule forms, attribute-oriented induction can be performed on the database, which integrates database operations with the learning process and provides a simple, efficient way of learning quantitative rules from large databases. The method involves the learning of both characteristic rules and classification rules. Quantitative information facilitates quantitative reasoning, incremental learning, and learning in the presence of noise. Moreover, learning qualitative rules can be treated as a special case of learning quantitative rules. It is shown that attribute-oriented induction provides an efficient and effective mechanism for learning various kinds of knowledge rules from relational databases  相似文献   

8.
Fuzzy rule induction in a set covering framework   总被引:1,自引:0,他引:1  
  相似文献   

9.
Competition-Based Induction of Decision Models from Examples   总被引:5,自引:0,他引:5  
Symbolic induction is a promising approach to constructing decision models by extracting regularities from a data set of examples. The predominant type of model is a classification rule (or set of rules) that maps a set of relevant environmental features into specific categories or values. Classifying loan risk based on borrower profiles, consumer choice from purchase data, or supply levels based on operating conditions are all examples of this type of model-building task. Although current inductive approaches, such as ID3 and CN2, perform well on certain problems, their potential is limited by the incremental nature of their search. Genetic algorithms (GA) have shown great promise on complex search domains, and hence suggest a means for overcoming these limitations. However, effective use of genetic search in this context requires a framework that promotes the fundamental model-building objectives of predictive accuracy and model simplicity. In this article we describe COGIN, a GA-based inductive system that exploits the conventions of induction from examples to provide this framework. The novelty of COGIN lies in its use of training set coverage to simultaneously promote competition in various classification niches within the model and constrain overall model complexity. Experimental comparisons with NewID and CN2 provide evidence of the effectiveness of the COGIN framework and the viability of the GA approach.  相似文献   

10.
A rule quality measure is important to a rule induction system for determining when to stop generalization or specialization. Such measures are also important to a rule-based classification procedure for resolving conflicts among rules. We describe a number of statistical and empirical rule quality formulas and present an experimental comparison of these formulas on a number of standard machine learning datasets. We also present a meta-learning method for generating a set of formula-behavior rules from the experimental results which show the relationships between a formula's performance and the characteristics of a dataset. These formula-behavior rules are combined into formula-selection rules that can be used in a rule induction system to select a rule quality formula before rule induction. We will report the experimental results showing the effects of formula-selection on the predictive performance of a rule induction system.  相似文献   

11.
研究了利用Bayes定理发现分类规则的方法,用Bayes定理可以发现分类规则,然后用分类规则进行数据分类。结合实例针对概念性数据集及包含数值性属性和概念性属性的数据集两种情况进行讨论。通过实例说明Bayes定理是数据挖掘中一种有效的数据分类方法。  相似文献   

12.
Hybridization of fuzzy GBML approaches for pattern classification problems   总被引:4,自引:0,他引:4  
We propose a hybrid algorithm of two fuzzy genetics-based machine learning approaches (i.e., Michigan and Pittsburgh) for designing fuzzy rule-based classification systems. First, we examine the search ability of each approach to efficiently find fuzzy rule-based systems with high classification accuracy. It is clearly demonstrated that each approach has its own advantages and disadvantages. Next, we combine these two approaches into a single hybrid algorithm. Our hybrid algorithm is based on the Pittsburgh approach where a set of fuzzy rules is handled as an individual. Genetic operations for generating new fuzzy rules in the Michigan approach are utilized as a kind of heuristic mutation for partially modifying each rule set. Then, we compare our hybrid algorithm with the Michigan and Pittsburgh approaches. Experimental results show that our hybrid algorithm has higher search ability. The necessity of a heuristic specification method of antecedent fuzzy sets is also demonstrated by computational experiments on high-dimensional problems. Finally, we examine the generalization ability of fuzzy rule-based classification systems designed by our hybrid algorithm.  相似文献   

13.
基于信息熵的扩张矩阵的启发式算法   总被引:1,自引:0,他引:1  
示例学习中传统的扩张矩阵理论和启发式算法是建立在正反例子集一致、没有噪音的基础上的。然而实际应用领域中的噪音数据,导致许多归纳能力很差的规则产生。本文提出从统计学的角度,对扩张矩阵理论的定义加以扩充,利用信息熵和拉普拉斯错误估计函数构造了扩张矩阵启发式算法ECA。将该算法应用于几个实际领域的学习问题并与示例学习系统AE5及AQ15等进行了比较。实际结果表明,ECA生成的规则简单,归纳能力强,较为有  相似文献   

14.
Classification in imbalanced domains is a recent challenge in data mining. We refer to imbalanced classification when data presents many examples from one class and few from the other class, and the less representative class is the one which has more interest from the point of view of the learning task. One of the most used techniques to tackle this problem consists in preprocessing the data previously to the learning process. This preprocessing could be done through under-sampling; removing examples, mainly belonging to the majority class; and over-sampling, by means of replicating or generating new minority examples. In this paper, we propose an under-sampling procedure guided by evolutionary algorithms to perform a training set selection for enhancing the decision trees obtained by the C4.5 algorithm and the rule sets obtained by PART rule induction algorithm. The proposal has been compared with other under-sampling and over-sampling techniques and the results indicate that the new approach is very competitive in terms of accuracy when comparing with over-sampling and it outperforms standard under-sampling. Moreover, the obtained models are smaller in terms of number of leaves or rules generated and they can considered more interpretable. The results have been contrasted through non-parametric statistical tests over multiple data sets.  相似文献   

15.
16.
In concept learning and data mining tasks, the learner is typically faced with a choice of many possible hypotheses or patterns characterizing the input data. If one can assume that training data contain no noise, then the primary conditions a hypothesis must satisfy are consistency and completeness with regard to the data. In real-world applications, however, data are often noisy, and the insistence on the full completeness and consistency of the hypothesis is no longer valid. In such situations, the problem is to determine a hypothesis that represents the best trade-off between completeness and consistency. This paper presents an approach to this problem in which a learner seeks rules optimizing a rule quality criterion that combines the rule coverage (a measure of completeness) and training accuracy (a measure of inconsistency). These factors are combined into a single rule quality measure through a lexicographical evaluation functional (LEF). The method has been implemented in the AQ18 learning system for natural induction and pattern discovery, and compared with several other methods. Experiments have shown that the proposed method can be easily tailored to different problems and can simulate different rule learners by modifying the parameter of the rule quality criterion.  相似文献   

17.
It is well-known that heuristic search in ILP is prone to plateau phenomena. An explanation can be given after the work of Giordana and Saitta: the ILP covering test is NP-complete and therefore exhibits a sharp phase transition in its coverage probability. As the heuristic value of a hypothesis depends on the number of covered examples, the regions “yes” and “no” represent plateaus that need to be crossed during search without an informative heuristic value. Several subsequent works have extensively studied this finding by running several learning algorithms on a large set of artificially generated problems and argued that the occurrence of this phase transition dooms every learning algorithm to fail to identify the target concept. We note however that only generate-and-test learning algorithms have been applied and that this conclusion has to be qualified in the case of data-driven learning algorithms. Mostly building on the pioneering work of Winston on near-miss examples, we show that, on the same set of problems, a top-down data-driven strategy can cross any plateau if near-misses are supplied in the training set, whereas they do not change the plateau profile and do not guide a generate-and-test strategy. We conclude that the location of the target concept with respect to the phase transition alone is not a reliable indication of the learning problem difficulty as previously thought. Editors: Stephen Muggleton, Ramon Otero, Simon Colton.  相似文献   

18.

We introduce a rule-based approach for learning and recognition of complex actions in terms of spatio-temporal attributes of primitive event sequences. During learning, spatio-temporal decision trees are generated which satisfy relational constraints of the training data. The resulting rules are used to classify new dynamic pattern fragments, and general heuristic rules are used to combine classification evidences of different pattern fragments.  相似文献   

19.
为了更好地界定本体中的概念,提出一种基于遗传算法(Genetic Algorithm,GA)的本体概念分类规则的学习方法.从已有的本体库中获取实例作为训练样本,通过该算法寻找一组与数据样本集一致的规则.以一组规则集作为遗传算法的个体,即优化的目标,同时考虑到规则集的覆盖性、一致性、简洁性和多样性4个方面建立适应值函数,优化得到一组能够分类概念的规则集合.进而这组规则集可用于指导和丰富本体知识,例如当本体中引入新的实例时,可以通过此概念分类规则集确定实例所属的概念.对已有本体学习后的实验结果表明该算法收敛性很好,而且能获得较好的规则集.  相似文献   

20.
一种改进的规则知识获取方法   总被引:1,自引:0,他引:1  
知识获取是建立专家系统的最基本最重要的过程,但它又是研制和开发专家系统的“瓶颈”。文章提出了一种改进的规则知识机器自动获取技术,它将学习看作是在一个符号描述空间中的启发式搜索过程,能够通过归纳从专家决策的例子中确定决策规则,从而大大简化了从专家到机器的知识转换过程。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号