首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Data mining with an ant colony optimization algorithm   总被引:10,自引:0,他引:10  
The paper proposes an algorithm for data mining called Ant-Miner (ant-colony-based data miner). The goal of Ant-Miner is to extract classification rules from data. The algorithm is inspired by both research on the behavior of real ant colonies and some data mining concepts as well as principles. We compare the performance of Ant-Miner with CN2, a well-known data mining algorithm for classification, in six public domain data sets. The results provide evidence that: 1) Ant-Miner is competitive with CN2 with respect to predictive accuracy, and 2) the rule lists discovered by Ant-Miner are considerably simpler (smaller) than those discovered by CN2  相似文献   

2.
This paper is a discussion of two continuous learning approaches for improving classification accuracy for an intuitive reasoner algorithm. The reasoner predicted the value of a given target variable by multiple iterations of forward-chained, rule-based inference. Each rule in the reasoner’s rule set had associated with it a weight, referred to here as “Strength of Belief” (SB). The value of SB of a rule indicated the certainty level of that rule. In each iteration of reasoning, any instances of similar values for a given variable were replaced by a single consolidated datum and the SB associated with the consolidated datum was increased. At the end of the reasoning process, the class (value) of the target variable which had the highest SB was reported as the conclusion. The rule set for the reasoner was generated based on a training data set that contained 80% of the data in a weather database comprising 50 years worth of hourly measurements for 54 weather variables. Each rule was induced based on only a small subset of the weather data. The intuitive reasoner was tested by using the induced rules to predict a number of pre-selected target variables using 275 test cases created from the test data. The first continuous learning approach was to identify relevant input variables for the reasoner, and the second was to rebalance the rule set used by the reasoner by adjusting the SB associated with each of the rules. Because of the way the rules were induced, the resulting rules did not contain any information about the relevance of the 53 possible input variables to the task of predicting a given target variable for previously unseen cases. A method was developed to identify which input variables were most relevant to the task based on the induced rule set. This method resulted in higher prediction accuracy of the intuitive reasoner than using a set of randomly chosen input variables for four of six target variables. The second continuous learning approach was intended to address the class imbalance problem in the rule set. The intuitive reasoner appeared to over-fit classes (values) which had frequent representation in the rule set. To address this problem, a heuristic was developed that generated adjustment factors for the SB values of the rules. The use of this heuristic improved the classification accuracy of the intuitive reasoner for four of the six target variables.  相似文献   

3.
4.
基于蚁群算法的分类规则挖掘算法   总被引:5,自引:0,他引:5  
提出了一种基于蚁群算法的分类规则挖掘算法。算法实质上是一种序列覆盖算法:蚁群搜索一个规则,移去它覆盖的样例,再重复这一过程,从而得到共同覆盖样例的一组规则。针对蚁群算法计算时间长的缺点,提出了一种变异算子。对两个公用数据的实验及其与C4.5和Ant-Miner的对比表明,算法能够发现更好的分类规则,包括预测能力更强,有更少规则的规则集,以及形式更简单的规则。实验同时显示变异算子有效节省了计算时间。  相似文献   

5.
The aim of this study was to use a machine learning approach combining fuzzy modeling with an immune algorithm to model sport training, in particular swimming. A proposed algorithm mines the available data and delivers the results in a form of a set of fuzzy rules “IF (fuzzy conditions) THEN (class)”. Fuzzy logic is a powerful method to cope with continuous data, to overcome problem of overlapping class definitions, and to improve the rule comprehensibility. Sport training is modeled at the level of microcycle and training unit by 12 independent attributes. The data was collected in two months (February-March 2008), among swimmers from swimming sections in Wroc?aw, Poland. The swimmers had minimum of 7 years of training and reached the II class level in swimming classification from 2005 to 2008. The goal of the performed experiments was to find the rules answering the question - how does the training unit influence swimmer’s feelings while being in water the next day? The fuzzy rules were inferred for two different scales of the class to be predicted. The effectiveness of the learned set of rules reached 68.66%. The performance, in terms of classification accuracy, of the proposed approach was compared with traditional classifier schemes. The accuracy of the result of compared methods is significantly lower than the accuracy of fuzzy rules obtained by a method presented in this study (paired t-test, P < 0.05).  相似文献   

6.
Recursive neural network rule extraction for data with mixed attributes   总被引:1,自引:0,他引:1  
In this paper, we present a recursive algorithm for extracting classification rules from feedforward neural networks (NNs) that have been trained on data sets having both discrete and continuous attributes. The novelty of this algorithm lies in the conditions of the extracted rules: the rule conditions involving discrete attributes are disjoint from those involving continuous attributes. The algorithm starts by first generating rules with discrete attributes only to explain the classification process of the NN. If the accuracy of a rule with only discrete attributes is not satisfactory, the algorithm refines this rule by recursively generating more rules with discrete attributes not already present in the rule condition, or by generating a hyperplane involving only the continuous attributes. We show that for three real-life credit scoring data sets, the algorithm generates rules that are not only more accurate but also more comprehensible than those generated by other NN rule extraction methods.  相似文献   

7.
纪霞  李龙澍 《控制与决策》2013,28(12):1837-1842

提出一种基于属性分辨度的不完备决策表规则提取算法, 它是一种例化方向的方法. 首先从空集开始, 逐步 选择当前最重要的条件属性对对象集分类, 从广义决策值唯一的相容块提取确定规则, 从其他的相容块提取不确定 规则; 然后设计属性必要性判断步骤去除每条规则的冗余属性; 最后通过规则约简过程来简化所获得的规则, 增强规 则的泛化能力. 实验结果表明, 所提出的算法效率更高, 并且所获得的规则简洁有效.

  相似文献   

8.
For learning a Bayesian network classifier, continuous attributes usually need to be discretized. But the discretization of continuous attributes may bring information missing, noise and less sensitivity to the changing of the attributes towards class variables. In this paper, we use the Gaussian kernel function with smoothing parameter to estimate the density of attributes. Bayesian network classifier with continuous attributes is established by the dependency extension of Naive Bayes classifiers. We also analyze the information provided to a class for each attributes as a basis for the dependency extension of Naive Bayes classifiers. Experimental studies on UCI data sets show that Bayesian network classifiers using Gaussian kernel function provide good classification accuracy comparing to other approaches when dealing with continuous attributes.  相似文献   

9.
一种基于类别属性关联程度最大化离散算法   总被引:2,自引:0,他引:2  
针对现有离散化算法难以兼顾计算速度和求解质量这一难题,提出一种新的基于类别属性关联程度最大化监督离散化算法.该算法考虑了类别、属性值的空间分布特征,根据类别与属性之间的内在联系构造离散化框架,使离散化后类别和属性的关联程度最大.实验结果表明,基于类别属性关联程度最大化离散算法在保证计算速度的前提下能有效提高分类精度,减少分类规则个数.  相似文献   

10.
针对规则集学习问题,提出一种遵循典型AQ覆盖算法框架(AQ Covering Algorithm)的蚁群规则集学习算法(Ant-AQ)。在Ant-AQ算法中,AQ覆盖框架中的柱状搜索特化过程被蚁群搜索特化过程替代,从某种程度上减少了陷入局优的情况。在对照测试中,Ant-AQ算法分别和已有的经典规则集学习算法(CN2、AQ-15)以及R.S.Parpinelli等提出的另一种基于蚁群优化的规则学习算法 Ant-Miner在若干典型规则学习问题数据集上进行了比较。实验结果表明:首先,Ant-AQ算法在总体性能比较上要优于经典规则学习算法,其次,Ant-AQ算法在预测准确度这样关键的评价指标上优于Ant-Miner算法。  相似文献   

11.
连续属性离散化在机器学习和数据挖掘领域中有着重要的作用。连续属性离散化方法是否合理决定着对信息的表达和提取的准确性。Chi2算法在对连续属性进行离散化处理时,无冲突的数据能够得到较好的结果,但是,对不协调和不完全的数据实验结果不是很理想。利用了Bayseian模型允许一定程度错误分类存在的性质,对Chi2算法进行了改进。改进后的Chi2算法不仅更适合不协调和不完全的数据,还使得区间的合并更加合理。实验结果证明了算法的有效性。  相似文献   

12.
基于“3σ”规则的贝叶斯分类器   总被引:1,自引:0,他引:1  
在软测量建模问题中为了提高模型的估计精度,通常需要将原始数据集分类,以构造多个子模型。数据分类中利用朴素贝叶斯分类器简单高效的优点,首先对连续的类变量进行类别范围划分,然后用概率论中的3σ规则对连续的属性变量离散。可以消除训练样本中干扰数据的影响,利用遗传算法从训练样本集中优选样本。对连续变量的离散和样本的优选作为对数据的预处理,预处理后的训练样本构建贝叶斯分类器。通过对UC I数据集和双酚A生产过程在线监测数据集的实验仿真,实验结果表明,遗传算法优选样本集的3σ规则朴素贝叶斯分类方法比其它方法有更高的分类精度。  相似文献   

13.
The renowned k-nearest neighbor decision rule is widely used for classification tasks, where the label of any new sample is estimated based on a similarity criterion defined by an appropriate distance function. It has also been used successfully for regression problems where the purpose is to predict a continuous numeric label. However, some alternative neighborhood definitions, such as the surrounding neighborhood, have considered that the neighbors should fulfill not only the proximity property, but also a spatial location criterion. In this paper, we explore the use of the k-nearest centroid neighbor rule, which is based on the concept of surrounding neighborhood, for regression problems. Two support vector regression models were executed as reference. Experimentation over a wide collection of real-world data sets and using fifteen odd different values of k demonstrates that the regression algorithm based on the surrounding neighborhood significantly outperforms the traditional k-nearest neighborhood method and also a support vector regression model with a RBF kernel.  相似文献   

14.
Learning from data that are too big to fit into memory poses great challenges to currently available learning approaches. Averaged n-Dependence Estimators (AnDE) allows for a flexible learning from out-of-core data, by varying the value of n (number of super parents). Hence, AnDE is especially appropriate for learning from large quantities of data. Memory requirement in AnDE, however, increases combinatorially with the number of attributes and the parameter n. In large data learning, number of attributes is often large and we also expect high n to achieve low-bias classification. In order to achieve the lower bias of AnDE with higher n but with less memory requirement, we propose a memory constrained selective AnDE algorithm, in which two passes of learning through training examples are involved. The first pass performs attribute selection on super parents according to available memory, whereas the second one learns an AnDE model with parents only on the selected attributes. Extensive experiments show that the new selective AnDE has considerably lower bias and prediction error relative to A\(n'\)DE, where \(n' = n-1\), while maintaining the same space complexity and similar time complexity. The proposed algorithm works well on categorical data. Numerical data sets need to be discretized first.  相似文献   

15.
提出了一种处理海量的不完备决策表的方法。将基于互信息的属性重要度作为启发式信息,利用遗传算法对不完备的原始决策表中的条件属性进行约简,形成包含missing值的决策表,称为优化决策表。利用原始决策表自身的信息,通过属性扩展,从优化决策表中抽取一致性决策规则,而无须计算missing值。该方法在UCI的8个数据集上的实验结果优于EMAV方法,是一种有效的从海量不完备决策表中抽取规则的方法。  相似文献   

16.
17.
This paper presents some new approaches for computing graph prototypes in the context of the design of a structural nearest prototype classifier. Four kinds of prototypes are investigated and compared: set median graphs, generalized median graphs, set discriminative graphs and generalized discriminative graphs. They differ according to (i) the graph space where they are searched for and (ii) the objective function which is used for their computation. The first criterion allows to distinguish set prototypes which are selected in the initial graph training set from generalized prototypes which are generated in an infinite set of graphs. The second criterion allows to distinguish median graphs which minimize the sum of distances to all input graphs of a given class from discriminative graphs, which are computed using classification performance as criterion, taking into account the inter-class distribution. For each kind of prototype, the proposed approach allows to identify one or many prototypes per class, in order to manage the trade-off between the classification accuracy and the classification time.Each graph prototype generation/selection is performed through a genetic algorithm which can be specialized to each case by setting the appropriate encoding scheme, fitness and genetic operators.An experimental study performed on several graph databases shows the superiority of the generation approach over the selection one. On the other hand, discriminative prototypes outperform the generative ones. Moreover, we show that the classification rates are improved while the number of prototypes increases. Finally, we show that discriminative prototypes give better results than the median graph based classifier.  相似文献   

18.
Artificial neural networks often achieve high classification accuracy rates, but they are considered as black boxes due to their lack of explanation capability. This paper proposes the new rule extraction algorithm RxREN to overcome this drawback. In pedagogical approach the proposed algorithm extracts the rules from trained neural networks for datasets with mixed mode attributes. The algorithm relies on reverse engineering technique to prune the insignificant input neurons and to discover the technological principles of each significant input neuron of neural network in classification. The novelty of this algorithm lies in the simplicity of the extracted rules and conditions in rule are involving both discrete and continuous mode of attributes. Experimentation using six different real datasets namely iris, wbc, hepatitis, pid, ionosphere and creditg show that the proposed algorithm is quite efficient in extracting smallest set of rules with high classification accuracy than those generated by other neural network rule extraction methods.  相似文献   

19.
Many approaches attempt to improve naive Bayes and have been broadly divided into five main categories: (1) structure extension; (2) attribute weighting; (3) attribute selection; (4) instance weighting; (5) instance selection, also called local learning. In this paper, we work on the approach of structure extension and single out a random Bayes model by augmenting the structure of naive Bayes. We called it random one-dependence estimators, simply RODE. In RODE, each attribute has at most one parent from other attributes and this parent is randomly selected from log2m (where m is the number of attributes) attributes with the maximal conditional mutual information. Our work conducts the randomness into Bayesian network classifiers. The experimental results on a large number of UCI data sets validate its effectiveness in terms of classification, class probability estimation, and ranking.  相似文献   

20.
Preprocessing methods for handling problems with features containing continuous attributes are discussed for learning a classification algorithm based on the JSM method. Discretization methods for continuous parameters that do not make use of class information on feature distribution are compared to entropy-based methods employing class labels in interval partitioning. An entropy-information-based method for selecting attributes is also discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号