首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Relational rule learning algorithms are typically designed to construct classification and prediction rules. However, relational rule learning can be adapted also to subgroup discovery. This paper proposes a propositionalization approach to relational subgroup discovery, achieved through appropriately adapting rule learning and first-order feature construction. The proposed approach was successfully applied to standard ILP problems (East-West trains, King-Rook-King chess endgame and mutagenicity prediction) and two real-life problems (analysis of telephone calls and traffic accident analysis). Editors: Hendrik Blockeel, David Jensen and Stefan Kramer An erratum to this article is available at .  相似文献   

2.
This paper presents a subgroup discovery algorithm APRIORI-SD, developed by adapting association rule learning to subgroup discovery. The paper contributes to subgroup discovery, to a better understanding of the weighted covering algorithm, and the properties of the weighted relative accuracy heuristic by analyzing their performance in the ROC space. An experimental comparison with rule learners CN2, RIPPER, and APRIORI-C on UCI data sets demonstrates that APRIORI-SD produces substantially smaller rulesets, where individual rules have higher coverage and significance. APRIORI-SD is also compared to subgroup discovery algorithms CN2-SD and SubgroupMiner. The comparisons performed on U.K. traffic accident data show that APRIORI-SD is a competitive subgroup discovery algorithm.  相似文献   

3.
This paper presents ways to use subgroup discovery to generate actionable knowledge for decision support. Actionable knowledge is explicit symbolic knowledge, typically presented in the form of rules, that allows the decision maker to recognize some important relations and to perform an appropriate action, such as targeting a direct marketing campaign, or planning a population screening campaign aimed at detecting individuals with high disease risk. Different subgroup discovery approaches are outlined, and their advantages over using standard classification rule learning are discussed. Three case studies, a medical and two marketing ones, are used to present the lessons learned in solving problems requiring actionable knowledge generation for decision support.  相似文献   

4.
The primary goal of the research reported in this paper is to identify what criteria are responsible for the good performance of a heuristic rule evaluation function in a greedy top-down covering algorithm. We first argue that search heuristics for inductive rule learning algorithms typically trade off consistency and coverage, and we investigate this trade-off by determining optimal parameter settings for five different parametrized heuristics. In order to avoid biasing our study by known functional families, we also investigate the potential of using metalearning for obtaining alternative rule learning heuristics. The key results of this experimental study are not only practical default values for commonly used heuristics and a broad comparative evaluation of known and novel rule learning heuristics, but we also gain theoretical insights into factors that are responsible for a good performance. For example, we observe that consistency should be weighted more heavily than coverage, presumably because a lack of coverage can later be corrected by learning additional rules.  相似文献   

5.
Packet classification is one of the most challenging functions in Internet routers since it involves a multi-dimensional search that should be performed at wire-speed. Hierarchical packet classification is an effective solution which reduces the search space significantly whenever a field search is completed. However, the hierarchical approach using binary tries has two intrinsic problems: back-tracking and empty internal nodes. To avoid back-tracking, the hierarchical set-pruning trie applies rule copy, and the grid-of-tries uses pre-computed switch pointers. However, none of the known hierarchical algorithms simultaneously avoids empty internal nodes and back-tracking. This paper describes various packet classification algorithms and proposes a new efficient packet classification algorithm using the hierarchical approach. In the proposed algorithm, a hierarchical binary search tree, which does not involve empty internal nodes, is constructed for the pruned set of rules. Hence, both back-tracking and empty internal nodes are avoided in the proposed algorithm. Two refinement techniques are also proposed; one for reducing the rule copy caused by the set-pruning and the other for avoiding rule copy. Simulation results show that the proposed algorithm provides an improvement in search performance without increasing the memory requirement compared with other existing hierarchical algorithms.  相似文献   

6.
The most challenging problem in developing fuzzy rule-based classification systems is the construction of a fuzzy rule base for the target problem. In many practical applications, fuzzy sets that are of particular linguistic meanings, are often predefined by domain experts and required to be maintained in order to ensure interpretability of any subsequent inference results. However, learning fuzzy rules using fixed fuzzy quantity space without any qualification will restrict the accuracy of the resulting rules. Fortunately, adjusting the weights of fuzzy rules can help improve classification accuracy without degrading the interpretability. There have been different proposals for fuzzy rule weight tuning through the use of various heuristics with limited success. This paper proposes an alternative approach using Particle Swarm Optimisation in the search of a set of optimal rule weights, entailing high classification accuracy. Systematic experimental studies are carried out using common benchmark data sets, in comparison to popular rule based learning classifiers. The results demonstrate that the proposed approach can boost classification performance, especially when the size of the initially built rule base is relatively small, and is competitive to popular rule-based learning classifiers.  相似文献   

7.
8.
Mining fuzzy association rules for classification problems   总被引:3,自引:0,他引:3  
The effective development of data mining techniques for the discovery of knowledge from training samples for classification problems in industrial engineering is necessary in applications, such as group technology. This paper proposes a learning algorithm, which can be viewed as a knowledge acquisition tool, to effectively discover fuzzy association rules for classification problems. The consequence part of each rule is one class label. The proposed learning algorithm consists of two phases: one to generate large fuzzy grids from training samples by fuzzy partitioning in each attribute, and the other to generate fuzzy association rules for classification problems by large fuzzy grids. The proposed learning algorithm is implemented by scanning training samples stored in a database only once and applying a sequence of Boolean operations to generate fuzzy grids and fuzzy rules; therefore, it can be easily extended to discover other types of fuzzy association rules. The simulation results from the iris data demonstrate that the proposed learning algorithm can effectively derive fuzzy association rules for classification problems.  相似文献   

9.
In this paper, we examine the classification performance of fuzzy if-then rules selected by a GA-based multi-objective rule selection method. This rule selection method can be applied to high-dimensional pattern classification problems with many continuous attributes by restricting the number of antecedent conditions of each candidate fuzzy if-then rule. As candidate rules, we only use fuzzy if-then rules with a small number of antecedent conditions. Thus it is easy for human users to understand each rule selected by our method. Our rule selection method has two objectives: to minimize the number of selected fuzzy if-then rules and to maximize the number of correctly classified patterns. In our multi-objective fuzzy rule selection problem, there exist several solutions (i.e., several rule sets) called “non-dominated solutions” because two conflicting objectives are considered. In this paper, we examine the performance of our GA-based rule selection method by computer simulations on a real-world pattern classification problem with many continuous attributes. First we examine the classification performance of our method for training patterns by computer simulations. Next we examine the generalization ability for test patterns. We show that a fuzzy rule-based classification system with an appropriate number of rules has high generalization ability.  相似文献   

10.
Fuzzy rule induction in a set covering framework   总被引:1,自引:0,他引:1  
  相似文献   

11.
In this paper, we propose a software defect prediction model learning problem (SDPMLP) where a classification model selects appropriate relevant inputs, from a set of all available inputs, and learns the classification function. We show that the SDPMLP is a combinatorial optimization problem with factorial complexity, and propose two hybrid exhaustive search and probabilistic neural network (PNN), and simulated annealing (SA) and PNN procedures to solve it. For small size SDPMLP, exhaustive search PNN works well and provides an (all) optimal solution(s). However, for large size SDPMLP, the use of exhaustive search PNN approach is not pragmatic and only the SA–PNN allows us to solve the SDPMLP in a practical time limit. We compare the performance of our hybrid approaches with traditional classification algorithms and find that our hybrid approaches perform better than traditional classification algorithms.  相似文献   

12.
Using Rough Sets with Heuristics for Feature Selection   总被引:32,自引:0,他引:32  
Practical machine learning algorithms are known to degrade in performance (prediction accuracy) when faced with many features (sometimes attribute is used instead of feature) that are not necessary for rule discovery. To cope with this problem, many methods for selecting a subset of features have been proposed. Among such methods, the filter approach that selects a feature subset using a preprocessing step, and the wrapper approach that selects an optimal feature subset from the space of possible subsets of features using the induction algorithm itself as a part of the evaluation function, are two typical ones. Although the filter approach is a faster one, it has some blindness and the performance of induction is not considered. On the other hand, the optimal feature subsets can be obtained by using the wrapper approach, but it is not easy to use because of the complexity of time and space. In this paper, we propose an algorithm which is using rough set theory with greedy heuristics for feature selection. Selecting features is similar to the filter approach, but the evaluation criterion is related to the performance of induction. That is, we select the features that do not damage the performance of induction.  相似文献   

13.
Credit Assignment in Rule Discovery Systems Based on Genetic Algorithms   总被引:5,自引:0,他引:5  
In rule discovery systems, learning often proceeds by first assessing the quality of the system's current rules and then modifying rules based on that assessment. This paper addresses the credit assignment problem that arises when long sequences of rules fire between successive external rewards. The focus is on the kinds of rule assessment schemes which have been proposed for rule discovery systems that use genetic algorithms as the primary rule modification strategy. Two distinct approaches to rule learning with genetic algorithms have been previously reported, each approach offering a useful solution to a different level of the credit assignment problem. We describe a system, called RUDI, that exploits both approaches. We present analytic and experimental results that support the hypothesis that multiple levels of credit assignment can improve the performance of rule learning systems based on genetic algorithms.  相似文献   

14.
K. S. Leung  M. L. Wong 《Knowledge》1991,4(4):231-246
The knowledge-acquisition bottleneck obstructs the development of expert systems. Refinement of existing knowledge bases is a subproblem of the knowledge-acquisition problem. The paper presents a HEuristic REfinement System (HERES), which refines rules with mixed fuzzy and nonfuzzy concepts represented in a variant of the rule representation language Z-II automatically. HERES employs heuristics and analytical methods to guide its generation of plausible refinements. The functionality and effectiveness of HERES are verified through various case studies. It has been verified that HERES can successfully refine knowledge bases. The refinement methods can handle imprecise and uncertain examples and generate approximate rules. In this aspect, they are better than other famous learning algorithms such as ID315–18, AQ11, and INDUCE14, 19, 20 because HERES' methods are currently unique in processing inexact examples and creating approximate rules.  相似文献   

15.
This paper deals with learning first-order logic rules from data lacking an explicit classification predicate. Consequently, the learned rules are not restricted to predicate definitions as in supervised inductive logic programming. First-order logic offers the ability to deal with structured, multi-relational knowledge. Possible applications include first-order knowledge discovery, induction of integrity constraints in databases, multiple predicate learning, and learning mixed theories of predicate definitions and integrity constraints. One of the contributions of our work is a heuristic measure of confirmation, trading off novelty and satisfaction of the rule. The approach has been implemented in the Tertius system. The system performs an optimal best-first search, finding the k most confirmed hypotheses, and includes a non-redundant refinement operator to avoid duplicates in the search. Tertius can be adapted to many different domains by tuning its parameters, and it can deal either with individual-based representations by upgrading propositional representations to first-order, or with general logical rules. We describe a number of experiments demonstrating the feasibility and flexibility of our approach.  相似文献   

16.
Coronary artery disease (CAD) is one of the major causes of mortality worldwide. Knowledge about risk factors that increase the probability of developing CAD can help to understand the disease better and assist in its treatment. Recently, modern computer‐aided approaches have been used for the prediction and diagnosis of diseases. Swarm intelligence algorithms like particle swarm optimization (PSO) have demonstrated great performance in solving different optimization problems. As rule discovery can be modelled as an optimization problem, it can be mapped to an optimization problem and solved by means of an evolutionary algorithm like PSO. An approach for discovering classification rules of CAD is proposed. The work is based on the real‐world CAD data set and aims at the detection of this disease by producing the accurate and effective rules. The proposed algorithm is a hybrid binary‐real PSO, which includes the combination of categorical and numerical encoding of a particle and a different approach for calculating the velocity of particles. The rules were developed from randomly generated particles, which take random values in the range of each attribute in the rule. Two different feature selection methods based on multi‐objective evolutionary search and PSO were applied on the data set, and the most relevant features were selected by the algorithms. The accuracy of two different rule sets were evaluated. The rule set with 11 features obtained more accurate results than the rule set with 13 features. Our results show that the proposed approach has the ability to produce effective rules with highest accuracy for the detection of CAD.  相似文献   

17.
针对于鲸鱼优化算法(WOA)多样性不足、两搜索阶段信息交流效率低、不平衡的问题,这里借用武装部队协同作战机理,提出一种新的WOA用于社区发现。为解决包围捕食阶段多样性不足问题,引入“邻居潜力”学习模型,提高WOA的全局搜索能力和学习广度;为解决两捕食阶段信息交流效率低问题,提出鲸鱼指挥官领导的气泡网捕食,确保搜索信息有效利用;为解决两种捕食机制不平衡的问题,采用改进的学习自动机引导鲸鱼种群向有希望区域移动。同时,考虑到复杂网络社区发现是离散问题,提出了一种基于拓扑特性的新编码离散演化规则。最后,通过真实数据集测试并与其他算法比较,结果表明,所提算法相较于对比算法具有更优的寻优能力,验证了算法的有效性。  相似文献   

18.
19.
Evolutionary algorithms are adaptive methods based on natural evolution that may be used for search and optimization. As data reduction in knowledge discovery in databases (KDDs) can be viewed as a search problem, it could be solved using evolutionary algorithms (EAs). In this paper, we have carried out an empirical study of the performance of four representative EA models in which we have taken into account two different instance selection perspectives, the prototype selection and the training set selection for data reduction in KDD. This paper includes a comparison between these algorithms and other nonevolutionary instance selection algorithms. The results show that the evolutionary instance selection algorithms consistently outperform the nonevolutionary ones, the main advantages being: better instance reduction rates, higher classification accuracy, and models that are easier to interpret.  相似文献   

20.
Many studies have shown that rule-based classifiers perform well in classifying categorical and sparse high-dimensional databases. However, a fundamental limitation with many rule-based classifiers is that they find the rules by employing various heuristic methods to prune the search space and select the rules based on the sequential database covering paradigm. As a result, the final set of rules that they use may not be the globally best rules for some instances in the training database. To make matters worse, these algorithms fail to fully exploit some more effective search space pruning methods in order to scale to large databases. In this paper, we present a new classifier, HARMONY, which directly mines the final set of classification rules. HARMONY uses an instance-centric rule-generation approach and it can assure that, for each training instance, one of the highest-confidence rules covering this instance is included in the final rule set, which helps in improving the overall accuracy of the classifier. By introducing several novel search strategies and pruning methods into the rule discovery process, HARMONY also has high efficiency and good scalability. Our thorough performance study with some large text and categorical databases has shown that HARMONY outperforms many well-known classifiers in terms of both accuracy and computational efficiency and scales well with regard to the database size  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号