首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 453 毫秒
1.
基于蚁群算法的分类规则挖掘算法   总被引:5,自引:0,他引:5  
提出了一种基于蚁群算法的分类规则挖掘算法。算法实质上是一种序列覆盖算法:蚁群搜索一个规则,移去它覆盖的样例,再重复这一过程,从而得到共同覆盖样例的一组规则。针对蚁群算法计算时间长的缺点,提出了一种变异算子。对两个公用数据的实验及其与C4.5和Ant-Miner的对比表明,算法能够发现更好的分类规则,包括预测能力更强,有更少规则的规则集,以及形式更简单的规则。实验同时显示变异算子有效节省了计算时间。  相似文献   

2.
针对规则集学习问题,提出一种遵循典型AQ覆盖算法框架(AQ Covering Algorithm)的蚁群规则集学习算法(Ant-AQ)。在Ant-AQ算法中,AQ覆盖框架中的柱状搜索特化过程被蚁群搜索特化过程替代,从某种程度上减少了陷入局优的情况。在对照测试中,Ant-AQ算法分别和已有的经典规则集学习算法(CN2、AQ-15)以及R.S.Parpinelli等提出的另一种基于蚁群优化的规则学习算法 Ant-Miner在若干典型规则学习问题数据集上进行了比较。实验结果表明:首先,Ant-AQ算法在总体性能比较上要优于经典规则学习算法,其次,Ant-AQ算法在预测准确度这样关键的评价指标上优于Ant-Miner算法。  相似文献   

3.
在Ant-Miner算法框架中通过对信息素更新和路径选择概率等策略的改进,并结合乒乓球技战术分析的特点,建立了基于改进蚁群算法的乒乓球技战术分类挖掘模型,并进行了实例分析,与乒乓球技战术关联规则挖掘相比较,该算法在挖掘效果和规则质量方面有很大的优势,并已应用于国家乒乓球队比赛和训练中,效果良好。  相似文献   

4.
The cAnt-Miner algorithm is an Ant Colony Optimization (ACO) based technique for classification rule discovery in problem domains which include continuous attributes. In this paper, we propose several extensions to cAnt-Miner. The main extension is based on the use of multiple pheromone types, one for each class value to be predicted. In the proposed μcAnt-Miner algorithm, an ant first selects a class value to be the consequent of a rule and the terms in the antecedent are selected based on the pheromone levels of the selected class value; pheromone update occurs on the corresponding pheromone type of the class value. The pre-selection of a class value also allows the use of more precise measures for the heuristic function and the dynamic discretization of continuous attributes, and further allows for the use of a rule quality measure that directly takes into account the confidence of the rule. Experimental results on 20 benchmark datasets show that our proposed extension improves classification accuracy to a statistically significant extent compared to cAnt-Miner, and has classification accuracy similar to the well-known Ripper and PART rule induction algorithms.  相似文献   

5.
针对Ant-Mine算法提出一种新的条件选择策略-双条件选择策略。将该策略应用于Ant-Miner算法中,并与原Ant-Miner算法在两个公开的数据集上进行实验比较,结果表明应用了双条件选择策略的算法较原算法不仅具有更快的运行速度,而且获得了更高的预测精度。  相似文献   

6.
一种基于决策表的分类规则挖掘新算法   总被引:2,自引:0,他引:2  
The mining of classification rules is an important field in Data Mining. Decision table of rough sets theory is an efficient tool for mining classification rules. The elementary concepts corresponding to decision table of Rough Sets Theory are introduced in this paper. A new algorithm for mining classification rules based on Decision Table is presented, along with a discernable function in reduction of attribute values, and a new principle for accuracy of rules. An example of its application to the car‘s classification problem is included, and the accuracy of rules discovered is analyzed. The potential fields for its application in data mining are also discussed.  相似文献   

7.
蚁群优化算法作为群智能理论的主要算法之一,已经成功应用在众多研究领域的优化问题上,但是在遥感数据处理领域还是一个新的研究课题。蚁群优化具有自组织、合作、通信等智能化优点,对数据无需统计分布参数的先验知识,因此在遥感数据处理领域具有很大的潜在优势。介绍了将蚁群优化分类规则挖掘算法应用到遥感图像分类研究领域的理论与算法流程。并采用北京地区的CBERS遥感数据作为实验数据,通过蚁群优化算法构造分类规则,对选择的遥感数据进行了分类实验,并和最大似然分类方法进行对比,实验结果表明,蚁群优化分类规则挖掘算法为遥感图像的分类提供了一种新方法。  相似文献   

8.
An extended Chi2 algorithm for discretization of real value attributes   总被引:11,自引:0,他引:11  
The variable precision rough sets (VPRS) model is a powerful tool for data mining, as it has been widely applied to acquire knowledge. Despite its diverse applications in many domains, the VPRS model unfortunately cannot be applied to real-world classification tasks involving continuous attributes. This requires a discretization method to preprocess the data. Discretization is an effective technique to deal with continuous attributes for data mining, especially for the classification problem. The modified Chi2 algorithm is one of the modifications to the Chi2 algorithm, replacing the inconsistency check in the Chi2 algorithm by using the quality of approximation, coined from the rough sets theory (RST), in which it takes into account the effect of degrees of freedom. However, the classification with a controlled degree of uncertainty, or a misclassification error, is outside the realm of RST. This algorithm also ignores the effect of variance in the two merged intervals. In this study, we propose a new algorithm, named the extended Chi2 algorithm, to overcome these two drawbacks. By running the software of See5, our proposed algorithm possesses a better performance than the original and modified Chi2 algorithms.  相似文献   

9.
大数据时代,数据的共享与挖掘存在隐私泄露的安全隐患。针对使用K-匿名隐藏实现隐私保护会大幅降低数据分类挖掘性能问题,提出一种基于随机森林特征重要性的K-匿名特征选择算法(RFKA)用于分类挖掘。使用随机森林特征重要性度量特征的分类性能;采用前向序列搜索策略每次选择不破坏K-匿名且分类性能最大的特征加入特征子集;使用特征子集对应的数据集构建模型进行分类实验。实验结果表明,该算法能更有效地平衡K-匿名和分类挖掘性能,且算法运行效率更高。  相似文献   

10.
杜超  王志海  江晶晶  孙艳歌 《软件学报》2017,28(11):2891-2904
基于模式的贝叶斯分类模型是解决数据挖掘领域分类问题的一种有效方法.然而,大多数基于模式的贝叶斯分类器只考虑模式在目标类数据集中的支持度,而忽略了模式在对立类数据集合中的支持度.此外,对于高速动态变化的无限数据流环境,在静态数据集下的基于模式的贝叶斯分类器就不能适用.为了解决这些问题,提出了基于显露模式的数据流贝叶斯分类模型EPDS(Bayesian classifier algorithm based on emerging pattern for data stream).该模型使用一个简单的混合森林结构来维护内存中事务的项集,并采用一种快速的模式抽取机制来提高算法速度.EPDS采用半懒惰式学习策略持续更新显露模式,并为待分类事务在每个类下建立局部分类模型.大量实验结果表明,该算法比其他数据流分类模型有较高的准确度.  相似文献   

11.
基于增量式遗传算法的分类规则挖掘   总被引:12,自引:1,他引:11  
分类知识发现是数据挖掘的一项重要任务,目前研究各种高性能和高可扩展性的分类算法是数据挖掘面临的主要问题之一。将遗传算法与分类规则挖掘问题相结合,提出了一种基于遗传算法的增量式的分类规则挖掘方法,并通过实例证明了该方法的有效性。此外,还提出了一种分类规则约简方法,使挖掘的结果更简洁、更易理解。  相似文献   

12.
分类问题是数据挖掘中的一项重要课题,然而目前对于癌症数据的分类研究还相对较少。近年来提出的强跳跃显露模式SJEP是一种具有很强区分能力的新模式,对于癌症数据的分类具有明显的优势。为了使癌症数据的分类精确度得以进一步提升,本文引入集成学习的思想,对原有的Boosting算法做出一些改进,并将改进后的Boosting算法与SP-树分类算法相结合,提出一种以SP-树分类算法作为基学习算法的SP_Boost算法。  相似文献   

13.
Our objective is a comparison of two data mining approaches to dealing with imbalanced data sets. The first approach is based on saving the original rule set, induced by the LEM2 (Learning from Example Module) algorithm, and changing the rule strength for all rules for the smaller class (concept) during classification. In the second approach, rule induction is split: the rule set for the larger class is induced by LEM2, while the rule set for the smaller class is induced by EXPLORE, another data mining algorithm. Results of our experiments show that both approaches increase the sensitivity compared to the original LEM2. However, the difference in performance of both approaches is statistically insignificant. Thus the appropriate approach for dealing with imbalanced data sets should be selected individually for a specific data set.  相似文献   

14.
Data mining is a method for extracting useful information that is necessary for a system from a database. As the types of data processed by the system are diversified, the transformed pattern mining techniques for processing these type of data have been proposed. Unlike the traditional pattern mining methods, erasable pattern mining is a technique for finding the patterns that can be removed by coming with a small profit. Erasable pattern mining should be able to process data by considering both the environment that the data are generated from and the characteristics of the data. An uncertain database is a database that is composed of uncertain data. Since erasable patterns discovered from uncertain data contain significant information, these patterns need to be extracted. In addition, databases gradually increase, because the data from various fields is generated and accumulated over data streams. Data streams should be processed as intelligently as possible to provide the useful data to the system in real time. In this paper, we propose an efficient erasable pattern mining algorithm that processes uncertain data that is generated over data streams. The uncertain erasable patterns discovered through the suggested technique are more meaningful information by considering the probability of the item and the profit. Moreover, the proposed method can perform efficient mining operations by using both tree and list structures. The performance of the suggested algorithm is verified through the performance tests compared with state-of-the-art algorithms using real data sets and synthetic data sets.  相似文献   

15.
针对GAC-RDB分类算法只能应用于单机版数据仓库的局限性,为了能够更方便、快捷地在云计算平台上开展数据挖掘工作,基于分布式数据仓库HBase,结合GAC-RDB分类算法的实现机理,制定适合分布式平台的运行策略,使用原生HiveQL语言提出了一种分布式GAC-RDB分类算法。实验显示,随着集群中节点的不断增加,算法的运行时间稳步下降。结果表明,在保证算法准确率的前提下,分布式数据仓库能够有效提高GACRDB分类算法的扩展性和运行效率,相对于MapReduce框架,HiveQL语言降低了对数据挖掘从业人员的技术要求,更大程度地减少了算法的开发时间,为挖掘海量数据提供了新的解决方案。  相似文献   

16.
From a data mining perspective, sequence classification is to build a classifier using frequent sequential patterns. However, mining for a complete set of sequential patterns on a large dataset can be extremely time-consuming and the large number of patterns discovered also makes the pattern selection and classifier building very time-consuming. The fact is that, in sequence classification, it is much more important to discover discriminative patterns than a complete pattern set. In this paper, we propose a novel hierarchical algorithm to build sequential classifiers using discriminative sequential patterns. Firstly, we mine for the sequential patterns which are the most strongly correlated to each target class. In this step, an aggressive strategy is employed to select a small set of sequential patterns. Secondly, pattern pruning and serial coverage test are done on the mined patterns. The patterns that pass the serial test are used to build the sub-classifier at the first level of the final classifier. And thirdly, the training samples that cannot be covered are fed back to the sequential pattern mining stage with updated parameters. This process continues until predefined interestingness measure thresholds are reached, or all samples are covered. The patterns generated in each loop form the sub-classifier at each level of the final classifier. Within this framework, the searching space can be reduced dramatically while a good classification performance is achieved. The proposed algorithm is tested in a real-world business application for debt prevention in social security area. The novel sequence classification algorithm shows the effectiveness and efficiency for predicting debt occurrences based on customer activity sequence data.  相似文献   

17.
针对SQL数据挖掘在复杂动力学系统故障诊断中的模式分类问题,以决策树参数优化为例,开展SQL数据挖掘分类算法参数优化研究。目前数据挖掘中的各类算法参数往往根据经验值设定,预测精度不高;只用遗传算法进行参数优化,分类预测结果容易发生振荡和早熟现象。采用改进的退火遗传算法对SQL数据挖掘中的决策树算法参数进行优化,解决了人工经验设置参数效率低下、精度不高的问题,同时实现了全局搜索,快速收敛到全局最优解。  相似文献   

18.
裘国永  张娇 《计算机应用研究》2012,29(10):3685-3687
分析和研究了自适应降维算法在高维数据挖掘中的应用。针对已有数据挖掘算法因维灾难导致的在处理高维数据时准确率和聚类质量都较低的情况,将二分K-均值聚类和SVM决策树算法结合在一起,提出了一种适用于高维数据聚类的自适应方法 BKM-SVMDT。该算法能保证二分K-均值聚类是在低维数据空间中进行,其结果再反过来帮助SVM在高维空间中的执行,这样反复执行以取得较好的分类精度和效率。标准数据集的实验结果证明了该方法的有效性。  相似文献   

19.
讨论了在多值属性关系中进行关联规则挖掘的应用特点,提出利用数据整理和数值编码的方式对关联 规则挖掘算法进行优化。将目标数据属性按其在算法中的作用划分,并分别进行转换和编码;然后对数据先进 行聚类,再在聚类结果中发掘频繁项目集;最后利用聚类后关联规则快速更新算法获取关联规则。算法分析和 实验结果表明,该算法比传统的关联规则挖掘算法更有效率。  相似文献   

20.
一种基于事务时间分割的关联规则增量式更新方法   总被引:1,自引:0,他引:1  
文章介绍了一种增量式关联规则更新方法,其核心思想是,将长事务以时间分割,分成一个连续的情节集合,当前情节期间获得的信息,依赖于当前的事务子集以及前面情节期间已经发现的信息。仅使用更新的事务和前面阶段的挖掘结果,增量式地产生频集。用Apriori类算法作为局部过程来产生频集,给出了具体的动态挖掘算法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号