首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
基于改进Apriori算法的关联规则挖掘研究   总被引:2,自引:0,他引:2  
朱其祥  徐勇  张林 《微机发展》2006,16(7):102-104
关联规则挖掘研究是数据挖掘研究的一项重要的内容。经典的关联规则提取算法———Apriori算法及其改进算法存在着一些不足,一是会产生大量的候选项目集,二是在扫描数据库时需要很大的I/O负载。通过对关联规则产生过程的实际实验分析发现,可以采取利用频繁k-1项集Lk-1对候选k项集Ck进行预先剪枝、及在扫描数据库过程中忽略对频繁项集的产生无贡献的交易记录的方法来改进关联规则提取的效率。  相似文献   

2.
针对现有关联分类算法资源消耗大、规则剪枝难、分类模型复杂的缺陷,提出了一种基于分类修剪的关联分类算法改进方案ACCP.根据分类属性值的不同对分类规则前项进行分块挖掘,并对频繁项集挖掘过程和规则修剪进行了改进,有效提高了分类准确率和算法运行效率.实验结果表明,此算法改进方案相比传统CBA算法和C4.5决策树算法有着更高的分类准确率,取得了较好的应用效果.  相似文献   

3.
传统关联规则挖掘在面临分类决策问题时,易出现非频繁规则遗漏、预测精度不高的问题。为得到正确合理且更为完整的规则,提出了一种改进方法 DT-AR(decision tree-association rule algorithm),利用决策树剪枝策略对关联规则集进行补充。该方法利用FP-Growth(frequent pattern growth)算法得到关联规则集,利用C4.5算法构建后剪枝决策树并提取分类规则,在进行置信度迭代筛选后与关联规则集取并集修正,利用置信度作为权重系数采取投票法进行分类。实验结果表明,与传统关联规则挖掘和决策树剪枝方法相比,该方法得到的规则在数据集分类结果上更准确。  相似文献   

4.
《Applied Soft Computing》2007,7(3):1102-1111
Classification and association rule discovery are important data mining tasks. Using association rule discovery to construct classification systems, also known as associative classification, is a promising approach. In this paper, a new associative classification technique, Ranked Multilabel Rule (RMR) algorithm is introduced, which generates rules with multiple labels. Rules derived by current associative classification algorithms overlap in their training objects, resulting in many redundant and useless rules. However, the proposed algorithm resolves the overlapping between rules in the classifier by generating rules that does not share training objects during the training phase, resulting in a more accurate classifier. Results obtained from experimenting on 20 binary, multi-class and multi-label data sets show that the proposed technique is able to produce classifiers that contain rules associated with multiple classes. Furthermore, the results reveal that removing overlapping of training objects between the derived rules produces highly competitive classifiers if compared with those extracted by decision trees and other associative classification techniques, with respect to error rate.  相似文献   

5.
6.
Mining association rules plays an important role in data mining and knowledge discovery since it can reveal strong associations between items in databases. Nevertheless, an important problem with traditional association rule mining methods is that they can generate a huge amount of association rules depending on how parameters are set. However, users are often only interested in finding the strongest rules, and do not want to go through a large amount of rules or wait for these rules to be generated. To address those needs, algorithms have been proposed to mine the top-k association rules in databases, where users can directly set a parameter k to obtain the k most frequent rules. However, a major issue with these techniques is that they remain very costly in terms of execution time and memory. To address this issue, this paper presents a novel algorithm named ETARM (Efficient Top-k Association Rule Miner) to efficiently find the complete set of top-k association rules. The proposed algorithm integrates two novel candidate pruning properties to more effectively reduce the search space. These properties are applied during the candidate selection process to identify items that should not be used to expand a rule based on its confidence, to reduce the number of candidates. An extensive experimental evaluation on six standard benchmark datasets show that the proposed approach outperforms the state-of-the-art TopKRules algorithm both in terms of runtime and memory usage.  相似文献   

7.
The degree of malignancy in brain glioma is assessed based on magnetic resonance imaging (MRI) findings and clinical data before operation. These data contain irrelevant features, while uncertainties and missing values also exist. Rough set theory can deal with vagueness and uncertainty in data analysis, and can efficiently remove redundant information. In this paper, a rough set method is applied to predict the degree of malignancy. As feature selection can improve the classification accuracy effectively, rough set feature selection algorithms are employed to select features. The selected feature subsets are used to generate decision rules for the classification task. A rough set attribute reduction algorithm that employs a search method based on particle swarm optimization (PSO) is proposed in this paper and compared with other rough set reduction algorithms. Experimental results show that reducts found by the proposed algorithm are more efficient and can generate decision rules with better classification performance. The rough set rule-based method can achieve higher classification accuracy than other intelligent analysis methods such as neural networks, decision trees and a fuzzy rule extraction algorithm based on Fuzzy Min-Max Neural Networks (FRE-FMMNN). Moreover, the decision rules induced by rough set rule induction algorithm can reveal regular and interpretable patterns of the relations between glioma MRI features and the degree of malignancy, which are helpful for medical experts.  相似文献   

8.
In recent years, a few sequential covering algorithms for classification rule discovery based on the ant colony optimization meta-heuristic (ACO) have been proposed. This paper proposes a new ACO-based classification algorithm called AntMiner-C. Its main feature is a heuristic function based on the correlation among the attributes. Other highlights include the manner in which class labels are assigned to the rules prior to their discovery, a strategy for dynamically stopping the addition of terms in a rule’s antecedent part, and a strategy for pruning redundant rules from the rule set. We study the performance of our proposed approach for twelve commonly used data sets and compare it with the original AntMiner algorithm, decision tree builder C4.5, Ripper, logistic regression technique, and a SVM. Experimental results show that the accuracy rate obtained by AntMiner-C is better than that of the compared algorithms. However, the average number of rules and average terms per rule are higher.  相似文献   

9.
孙蕾  李军怀 《计算机应用》2008,28(7):1692-1695
针对几种典型分类算法中存在的诸如分类器性能较低和算法效率不高等问题,提出了一种基于正交法和扩展χ^2检验的分类算法ERAC。算法首先通过正交法产生所有的频繁项集和关联规则,然后采用一种扩展χ^2检验来对规则进行分级和修剪,有效减少分类器的规则数目。试验结果表明,该算法与CBA等算法相比较具有较高的分类准确率和运行效率。  相似文献   

10.
将T检验思想引入隐私保护数据挖掘算法,提出基于影响度的隐私保护关联规则挖掘算法.将影响度作为关联规则生成准则,以减少冗余规则和不相关规则,提高挖掘效率;通过调整事务间敏感关联规则的项目,实现敏感规则隐藏.实验结果表明,该算法能使规则损失率和增加率降低到6%以下.  相似文献   

11.
Ant colony optimization (ACO) algorithms have been successfully applied in data classification, which aim at discovering a list of classification rules. However, due to the essentially random search in ACO algorithms, the lists of classification rules constructed by ACO-based classification algorithms are not fixed and may be distinctly different even using the same training set. Those differences are generally ignored and some beneficial information cannot be dug from the different data sets, which may lower the predictive accuracy. To overcome this shortcoming, this paper proposes a novel classification rule discovery algorithm based on ACO, named AntMinermbc, in which a new model of multiple rule sets is presented to produce multiple lists of rules. Multiple base classifiers are built in AntMinermbc, and each base classifier is expected to remedy the weakness of other base classifiers, which can improve the predictive accuracy by exploiting the useful information from various base classifiers. A new heuristic function for ACO is also designed in our algorithm, which considers both of the correlation and coverage for the purpose to avoid deceptive high accuracy. The performance of our algorithm is studied experimentally on 19 publicly available data sets and further compared to several state-of-the-art classification approaches. The experimental results show that the predictive accuracy obtained by our algorithm is statistically higher than that of the compared targets.  相似文献   

12.
遥感图像分类是遥感领域的研究热点之一.提出了一种基于自适应区间划分的模糊关联遥感图像分类方法(fuzzy associative remote sensing classification,FARSC).算法根据遥感图像分类的特点,利用模糊C均值聚类算法自适应地建立连续型属性模糊区间,使用新的剪枝策略对项集进行筛选从而避免生成无用规则,采用一种新的规则重要性度量方法对多模糊分类规则进行融合,从而有效地提高分类效率和精确度.在UCI数据和遥感图像上所作实验结果表明,算法具有较高的分类精度以及对样本数量变化的不敏感性,对于解决遥感图像分类问题,FARSC算法具有较高的实用性,是一种有效的遥感图像分类方法.  相似文献   

13.
Cost Complexity-Based Pruning of Ensemble Classifiers   总被引:1,自引:0,他引:1  
In this paper we study methods that combine multiple classification models learned over separate data sets. Numerous studies posit that such approaches provide the means to efficiently scale learning to large data sets, while also boosting the accuracy of individual classifiers. These gains, however, come at the expense of an increased demand for run-time system resources. The final ensemble meta-classifier may consist of a large collection of base classifiers that require increased memory resources while also slowing down classification throughput. Here, we describe an algorithm for pruning (i.e., discarding a subset of the available base classifiers) the ensemble meta-classifier as a means to reduce its size while preserving its accuracy and we present a technique for measuring the trade-off between predictive performance and available run-time system resources. The algorithm is independent of the method used initially when computing the meta-classifier. It is based on decision tree pruning methods and relies on the mapping of an arbitrary ensemble meta-classifier to a decision tree model. Through an extensive empirical study on meta-classifiers computed over two real data sets, we illustrate our pruning algorithm to be a robust and competitive approach to discarding classification models without degrading the overall predictive performance of the smaller ensemble computed over those that remain after pruning. Received 30 August 2000 / Revised 7 March 2001 / Accepted in revised form 21 May 2001  相似文献   

14.
The paper presents results of application of a rule induction and pruning algorithm for classification of a microseismic hazard sate in coal mines. Due to imbalanced distribution of examples describing states “hazardous” and “safe”, the special algorithm was used for induction and rule pruning. The algorithm selects optimal parameters‘ values influencing rule induction and pruning based on training and tuning sets. A rule quality measure which decides about a form and classification abilities of rules that are induced is the basic parameter of the algorithm. The specificity and sensitivity of a classifier were used to evaluate its quality. Conducted tests show that the admitted method of rules induction and classifier’s quality evaluation enables to get better results of classification of microseismic hazards than by methods currently used in mining practice. Results obtained by the rules-based classifier were also compared with results got by a decision tree induction algorithm and by a neuro-fuzzy system.  相似文献   

15.
一种基于多维集的关联模式挖掘算法   总被引:2,自引:0,他引:2  
大多数维间关联规则挖掘算法如基于数据立方体的关联规则挖掘算法都假定对象的属性取值只具有单值性.将对象的属性取值扩展到多值,据此提出多维集的概念和基于多维集关联规则的语义特征.在此语义特征下,提出了一个多维集的关联规则挖掘算法.该算法利用多维集关联规则的限制特征,能够在数据集缩减的同时进行侯选集的三重剪枝,因此,具有比直接使用apriori等算法更好的性能,分析了算法的性能和正确性、完备性,并通过实验对算法有效性进行了对比.  相似文献   

16.
决策树算法及其核心技术   总被引:1,自引:0,他引:1  
杨学兵  张俊 《微机发展》2007,17(1):43-45
决策树是归纳学习和数据挖掘的重要方法,通常用来形成分类器和预测模型。概述了决策树分类算法,指出了决策树算法的核心技术:测试属性的选择和树枝修剪技术。通过对当前数据挖掘中具有代表性的优秀分类算法进行分析和比较,总结出了各种算法的特性,为使用者选择算法或研究者改进算法提供了依据。最后,通过一个实例说明决策树分类在实际生产中的应用。  相似文献   

17.
刘晓平 《计算机仿真》2005,22(12):76-79
用于知识发现的大部分数据挖掘工具均采用规则发现和决策树分类技术来发现数据模式和规则。该文通过采用基于仿真属性的离散化方法,基于概率统计的未知属性与噪声数据处理方法以及基于误差的剪枝算法,实现了用于自动生成决策树的通用算法模板。利用该模板,决策树算法的设计者可以快速验证为解决特定决策问题而设计的新算法。构造决策树的基本机制是算法的设计者利用其自己定义的公式来初始化通用算法模板。然后利用该系统提供的交互式图形环境,针对不同的决策问题测试该算法,从而找出适合特定问题的算法。  相似文献   

18.
一种快速有效的分布式开采多层关联规则的算法   总被引:6,自引:0,他引:6  
关联规则(association rules)是数据开采的重要研究内容,建立项目的层次关系可以发现更加有意义的规则,主要研究分布式环境下开采多层关联规则的问题,提出了一种快速有效的MLFDM算法,采用的技术包括分布式编码交易表的有效修剪,侯选集的产生及修剪技术,侯选项集的全局支持数的计算方法等,论述了它的原理,具体实现方法及其几个改进算法,实验结果表明,算法MLFDM是有效的,并对MLFDM算法的几个变种进行了讨论。  相似文献   

19.
Knowledge-based systems such as expert systems are of particular interest in medical applications as extracted if-then rules can provide interpretable results. Various rule induction algorithms have been proposed to effectively extract knowledge from data, and they can be combined with classification methods to form rule-based classifiers. However, most of the rule-based classifiers can not directly handle numerical data such as blood pressure. A data preprocessing step called discretization is required to convert such numerical data into a categorical format. Existing discretization algorithms do not take into account the multimodal class densities of numerical variables in datasets, which may degrade the performance of rule-based classifiers. In this paper, a new Gaussian Mixture Model based Discretization Algorithm (GMBD) is proposed that preserve the most frequent patterns of the original dataset by taking into account the multimodal distribution of the numerical variables. The effectiveness of GMBD algorithm was verified using six publicly available medical datasets. According to the experimental results, the GMBD algorithm outperformed five other static discretization methods in terms of the number of generated rules and classification accuracy in the associative classification algorithm. Consequently, our proposed approach has a potential to enhance the performance of rule-based classifiers used in clinical expert systems.  相似文献   

20.
CAIM discretization algorithm   总被引:8,自引:0,他引:8  
The task of extracting knowledge from databases is quite often performed by machine learning algorithms. The majority of these algorithms can be applied only to data described by discrete numerical or nominal attributes (features). In the case of continuous attributes, there is a need for a discretization algorithm that transforms continuous attributes into discrete ones. We describe such an algorithm, called CAIM (class-attribute interdependence maximization), which is designed to work with supervised data. The goal of the CAIM algorithm is to maximize the class-attribute interdependence and to generate a (possibly) minimal number of discrete intervals. The algorithm does not require the user to predefine the number of intervals, as opposed to some other discretization algorithms. The tests performed using CAIM and six other state-of-the-art discretization algorithms show that discrete attributes generated by the CAIM algorithm almost always have the lowest number of intervals and the highest class-attribute interdependency. Two machine learning algorithms, the CLIP4 rule algorithm and the decision tree algorithm, are used to generate classification rules from data discretized by CAIM. For both the CLIP4 and decision tree algorithms, the accuracy of the generated rules is higher and the number of the rules is lower for data discretized using the CAIM algorithm when compared to data discretized using six other discretization algorithms. The highest classification accuracy was achieved for data sets discretized with the CAIM algorithm, as compared with the other six algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号