首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
《Applied Soft Computing》2007,7(3):1102-1111
Classification and association rule discovery are important data mining tasks. Using association rule discovery to construct classification systems, also known as associative classification, is a promising approach. In this paper, a new associative classification technique, Ranked Multilabel Rule (RMR) algorithm is introduced, which generates rules with multiple labels. Rules derived by current associative classification algorithms overlap in their training objects, resulting in many redundant and useless rules. However, the proposed algorithm resolves the overlapping between rules in the classifier by generating rules that does not share training objects during the training phase, resulting in a more accurate classifier. Results obtained from experimenting on 20 binary, multi-class and multi-label data sets show that the proposed technique is able to produce classifiers that contain rules associated with multiple classes. Furthermore, the results reveal that removing overlapping of training objects between the derived rules produces highly competitive classifiers if compared with those extracted by decision trees and other associative classification techniques, with respect to error rate.  相似文献   

2.
属性约简与规则分类学习是粗糙集理论研究和应用的重要内容。文中充分利用量子计算加速算法速度和混合蛙跳算法高效协同搜索等优势,提出一种基于动态交叉协同的量子蛙跳属性约简与分类学习的级联算法。该算法用量子态比特进行蛙群个体编码,以动态量子角旋转调整策略实现属性染色体快速约简,并在粗糙熵阈值分类标准内采用量子蛙群混合交叉协同进化机制提取和约简分类规则、组合决策规则链等,最后构造属性约简和分类学习双重功能级联模型。仿真实验验证该算法不仅具有较高的全局优化性能,且属性约简与规则分类学习的精度和效率均超过同类算法。  相似文献   

3.
Credit Assignment in Rule Discovery Systems Based on Genetic Algorithms   总被引:5,自引:0,他引:5  
In rule discovery systems, learning often proceeds by first assessing the quality of the system's current rules and then modifying rules based on that assessment. This paper addresses the credit assignment problem that arises when long sequences of rules fire between successive external rewards. The focus is on the kinds of rule assessment schemes which have been proposed for rule discovery systems that use genetic algorithms as the primary rule modification strategy. Two distinct approaches to rule learning with genetic algorithms have been previously reported, each approach offering a useful solution to a different level of the credit assignment problem. We describe a system, called RUDI, that exploits both approaches. We present analytic and experimental results that support the hypothesis that multiple levels of credit assignment can improve the performance of rule learning systems based on genetic algorithms.  相似文献   

4.
阐述了传统遗传算法的基本思想、原理和步骤及其在数据挖掘(规则集发现)中的应用,给出了基于遗传算法的知识规则挖掘算法的基本思想和关键问题,包括知识规则表示、适应度函数定义等,继而提出多种群并行进化结构,利用精英重组策略,产生池进化模型以及自适应参数的手段调整并行遗传算法进行数据挖掘。在算法具体实现过程中,采用了动态变异交叉概率等方法,有效避免了并行遗传算法中早熟现象的发生。以北美香菇数据为例,进行并行遗传算法挖掘分类规则,实验说明了该算法在发现和进化规则方面的有效性。  相似文献   

5.
提出了基于属性重要性的关联分类方法.与传统算法不同的是根据属性重要性程度生成类别关联规则;并且在构造分类器时改进了CBA算法中对于具有相同支持度、置信度规则选择时的随机性.实验结果证明,用该方法得到的分类规则与传统的关联分类算法相比,复杂度低,且有效提高了分类效果.  相似文献   

6.
Ant colony optimization (ACO) algorithms have been successfully applied in data classification, which aim at discovering a list of classification rules. However, due to the essentially random search in ACO algorithms, the lists of classification rules constructed by ACO-based classification algorithms are not fixed and may be distinctly different even using the same training set. Those differences are generally ignored and some beneficial information cannot be dug from the different data sets, which may lower the predictive accuracy. To overcome this shortcoming, this paper proposes a novel classification rule discovery algorithm based on ACO, named AntMinermbc, in which a new model of multiple rule sets is presented to produce multiple lists of rules. Multiple base classifiers are built in AntMinermbc, and each base classifier is expected to remedy the weakness of other base classifiers, which can improve the predictive accuracy by exploiting the useful information from various base classifiers. A new heuristic function for ACO is also designed in our algorithm, which considers both of the correlation and coverage for the purpose to avoid deceptive high accuracy. The performance of our algorithm is studied experimentally on 19 publicly available data sets and further compared to several state-of-the-art classification approaches. The experimental results show that the predictive accuracy obtained by our algorithm is statistically higher than that of the compared targets.  相似文献   

7.
The cAnt-Miner algorithm is an Ant Colony Optimization (ACO) based technique for classification rule discovery in problem domains which include continuous attributes. In this paper, we propose several extensions to cAnt-Miner. The main extension is based on the use of multiple pheromone types, one for each class value to be predicted. In the proposed μcAnt-Miner algorithm, an ant first selects a class value to be the consequent of a rule and the terms in the antecedent are selected based on the pheromone levels of the selected class value; pheromone update occurs on the corresponding pheromone type of the class value. The pre-selection of a class value also allows the use of more precise measures for the heuristic function and the dynamic discretization of continuous attributes, and further allows for the use of a rule quality measure that directly takes into account the confidence of the rule. Experimental results on 20 benchmark datasets show that our proposed extension improves classification accuracy to a statistically significant extent compared to cAnt-Miner, and has classification accuracy similar to the well-known Ripper and PART rule induction algorithms.  相似文献   

8.
9.
Mining association rules plays an important role in data mining and knowledge discovery since it can reveal strong associations between items in databases. Nevertheless, an important problem with traditional association rule mining methods is that they can generate a huge amount of association rules depending on how parameters are set. However, users are often only interested in finding the strongest rules, and do not want to go through a large amount of rules or wait for these rules to be generated. To address those needs, algorithms have been proposed to mine the top-k association rules in databases, where users can directly set a parameter k to obtain the k most frequent rules. However, a major issue with these techniques is that they remain very costly in terms of execution time and memory. To address this issue, this paper presents a novel algorithm named ETARM (Efficient Top-k Association Rule Miner) to efficiently find the complete set of top-k association rules. The proposed algorithm integrates two novel candidate pruning properties to more effectively reduce the search space. These properties are applied during the candidate selection process to identify items that should not be used to expand a rule based on its confidence, to reduce the number of candidates. An extensive experimental evaluation on six standard benchmark datasets show that the proposed approach outperforms the state-of-the-art TopKRules algorithm both in terms of runtime and memory usage.  相似文献   

10.
为了减少偏好度量过程中的人为干预,同时提高偏好度量算法的效率和准确性,提出一种基于信任系统的偏好协同度量框架。首先,提出了规则间的距离和规则集的内部距离等概念来具体化规则之间的关系。在此基础上,提出了基于规则集平均内部距离的规则集聚合算法PRA,旨在保证损失最少信息的情况下筛选出最具代表性的全体用户的共同偏好,即共识偏好。之后,提出Common belief的概念和一种改进的信任系统,使用共识偏好作为信任系统的证据,在考虑用户一致性的同时还允许用户保留个性化信息。在信任系统下,提出了基于信任系统的有趣度度量标准,并量化了偏好的信任度和偏离度,用于描述用户偏好和信任系统的一致或相悖程度,并将用户偏好分为泛化偏好或个性化偏好,最终依据信任度和偏离度得出有趣度,从而找出最有趣的规则。在计算有趣度的过程中,提出了一个可以使用不同信任度公式来计算有趣度的可扩展的计算框架。为了进一步验证度量框架的准确性和有效性,以加权的余弦相似度公式和相关系数公式为例,提出了IMCos算法和IMCov算法。实验结果表明,信任度和偏离度有效地反映了偏好的不同特征,并且与两种最新的算法CONTENUM和TKO相比,度量框架发现的Top-K规则在召回率、准确率和F1-Measure等指标上均更优。  相似文献   

11.
针对MLKNN算法仅对独立标签进行处理,忽略现实世界中标签之间相关性这一问题,提出了一种基于关联规则的MLKNN多标签分类算法(FP-MLKNN)。该算法采用关联规则算法挖掘标签之间的高阶相关性,并用标签之间的关联规则改进MLKNN算法,以达到提升分类性能的目的。首先,使用MLKNN算法求样本的特征置信度;采用关联规则算法挖掘生成一系列强关联规则,进而将2种算法进行融合来构造多标签分类器,对新标签进行预测;在此基础上,将本文提出的算法与MLKNN、AdaBoostMH和BPMLL这3种算法进行实验对比。实验结果表明,本文所提算法在yeast、emotions和enron数据集上的分类性能均优于这3种算法,具有较好的分类效果。  相似文献   

12.
Induction of descriptive fuzzy classifiers with the Logitboost algorithm   总被引:3,自引:3,他引:0  
Recently, Adaboost has been compared to greedy backfitting of extended additive models in logistic regression problems, or “Logitboost". The Adaboost algorithm has been applied to learn fuzzy rules in classification problems, and other backfitting algorithms to learn fuzzy rules in modeling problems but, up to our knowledge, there are not previous works that extend the Logitboost algorithm to learn fuzzy rules in classification problems.In this work, Logitboost is applied to learn fuzzy rules in classification problems, and its results are compared with that of Adaboost and other fuzzy rule learning algorithms. Contradicting the expected results, it is shown that the basic extension of the backfitting algorithm to learn classification rules may produce worse results than Adaboost does. We suggest that this is caused by the stricter requirements that Logitboost demands to the weak learners, which are not fulfilled by fuzzy rules. Finally, it is proposed a prefitting based modification of the Logitboost algorithm that avoids this problem  相似文献   

13.
14.
The degree of malignancy in brain glioma is assessed based on magnetic resonance imaging (MRI) findings and clinical data before operation. These data contain irrelevant features, while uncertainties and missing values also exist. Rough set theory can deal with vagueness and uncertainty in data analysis, and can efficiently remove redundant information. In this paper, a rough set method is applied to predict the degree of malignancy. As feature selection can improve the classification accuracy effectively, rough set feature selection algorithms are employed to select features. The selected feature subsets are used to generate decision rules for the classification task. A rough set attribute reduction algorithm that employs a search method based on particle swarm optimization (PSO) is proposed in this paper and compared with other rough set reduction algorithms. Experimental results show that reducts found by the proposed algorithm are more efficient and can generate decision rules with better classification performance. The rough set rule-based method can achieve higher classification accuracy than other intelligent analysis methods such as neural networks, decision trees and a fuzzy rule extraction algorithm based on Fuzzy Min-Max Neural Networks (FRE-FMMNN). Moreover, the decision rules induced by rough set rule induction algorithm can reveal regular and interpretable patterns of the relations between glioma MRI features and the degree of malignancy, which are helpful for medical experts.  相似文献   

15.
基于自适应加权的文本关联分类   总被引:1,自引:0,他引:1  
在文本关联分类研究中,训练样本特征词的分布情况对分类结果影响很大.即使是同一种关联分类算法,在不同的样本集上使用,分类效果也可能明显不同.为此,本文利用加权方法改善文本关联分类器的稳定性,设计实现了基于规则加权的关联分类算法(WARC)和基于样本加权的关联分类算法(SWARC).WARC算法通过规则自适应加权调整强弱不均的分类规则;SWARC算法则自适应地调整训练样本的权重,从根本上改善不同类别样本特征词分布不均的情况.实验结果表明,无论是WARC还是SWARC算法,经过权重调整后的文本分类质量明显提高,特别是SWARC算法分类质量的提高极为显著.  相似文献   

16.
Coronary artery disease (CAD) is one of the major causes of mortality worldwide. Knowledge about risk factors that increase the probability of developing CAD can help to understand the disease better and assist in its treatment. Recently, modern computer‐aided approaches have been used for the prediction and diagnosis of diseases. Swarm intelligence algorithms like particle swarm optimization (PSO) have demonstrated great performance in solving different optimization problems. As rule discovery can be modelled as an optimization problem, it can be mapped to an optimization problem and solved by means of an evolutionary algorithm like PSO. An approach for discovering classification rules of CAD is proposed. The work is based on the real‐world CAD data set and aims at the detection of this disease by producing the accurate and effective rules. The proposed algorithm is a hybrid binary‐real PSO, which includes the combination of categorical and numerical encoding of a particle and a different approach for calculating the velocity of particles. The rules were developed from randomly generated particles, which take random values in the range of each attribute in the rule. Two different feature selection methods based on multi‐objective evolutionary search and PSO were applied on the data set, and the most relevant features were selected by the algorithms. The accuracy of two different rule sets were evaluated. The rule set with 11 features obtained more accurate results than the rule set with 13 features. Our results show that the proposed approach has the ability to produce effective rules with highest accuracy for the detection of CAD.  相似文献   

17.
对传统包分类算法中的规则形式化进行改进,在研究包分类算法中规则转换方法的基础上,提出一种基于集合运算的非匹配规则转换算法,将该算法与其他范围规则转换算法进行性能比较,分析这些算法的时空复杂度,同时进行仿真。实验结果表明,该算法产生的规则数目小于其他算法。  相似文献   

18.
在信息化评估过程中,传统关联分类算法无法优先发现短规则,且分类精度对规则次序的依赖较强。为此,提出基于子集支持度和多规则分类的关联分类算法,将训练集按待分类属性归类,利用子集支持度挖掘关联规则,通过计算类平均支持度对测试集进行分类。实验结果表明,该算法发现规则的能力和分类精度均优于传统方法。  相似文献   

19.
刘洋  张卓  周清雷 《计算机科学》2014,41(12):164-167
医疗健康数据通常属性较多,且存在连续型、离散型并存的混合数据,这在很大程度上限制了知识发现方法对医疗健康数据的挖掘效率。以模糊粗糙集理论为基础,研究混合数据上的分类规则挖掘方法,通过引入规则获取算法的泛化阈值,来控制获取规则集的大小和复杂程度,提高粗糙集知识发现方法在医疗健康数据上的分类效率。最后通过对比实验验证了该算法在医疗决策表上挖掘规则的有效性。  相似文献   

20.
郑盼丽  戴牡红 《计算机系统应用》2012,21(11):218-221,193
研究了一种基于文法引导遗传编程(GGP)的自动数据挖掘算法.规则归纳算法是一种典型的数据分类方法.采用文法引导的遗传编程对规则归纳算法进行改进,从而提出了一种规则自动提取的算法.最后结合电视购物项目,给出了基于文法引导的遗传编程自动提取规则的实例.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号