首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Researchers realized the importance of integrating fuzziness into association rules mining in databases with binary and quantitative attributes. However, most of the earlier algorithms proposed for fuzzy association rules mining either assume that fuzzy sets are given or employ a clustering algorithm, like CURE, to decide on fuzzy sets; for both cases the number of fuzzy sets is pre-specified. In this paper, we propose an automated method to decide on the number of fuzzy sets and for the autonomous mining of both fuzzy sets and fuzzy association rules. We achieve this by developing an automated clustering method based on multi-objective Genetic Algorithms (GA); the aim of the proposed approach is to automatically cluster values of a quantitative attribute in order to obtain large number of large itemsets in less time. We compare the proposed multi-objective GA based approach with two other approaches, namely: 1) CURE-based approach, which is known as one of the most efficient clustering algorithms; 2) Chien et al. clustering approach, which is an automatic interval partition method based on variation of density. Experimental results on 100 K transactions extracted from the adult data of USA census in year 2000 showed that the proposed automated clustering method exhibits good performance over both CURE-based approach and Chien et al.’s work in terms of runtime, number of large itemsets and number of association rules.  相似文献   

2.
Association Rule Mining is one of the important data mining activities and has received substantial attention in the literature. Association rule mining is a computationally and I/O intensive task. In this paper, we propose a solution approach for mining optimized fuzzy association rules of different orders. We also propose an approach to define membership functions for all the continuous attributes in a database by using clustering techniques. Although single objective genetic algorithms are used extensively, they degenerate the solution. In our approach, extraction and optimization of fuzzy association rules are done together using multi-objective genetic algorithm by considering the objectives such as fuzzy support, fuzzy confidence and rule length. The effectiveness of the proposed approach is tested using computer activity dataset to analyze the performance of a multi processor system and network audit data to detect anomaly based intrusions. Experiments show that the proposed method is efficient in many scenarios.
V. S. AnanthanarayanaEmail:
  相似文献   

3.
In the domain of association rules mining (ARM) discovering the rules for numerical attributes is still a challenging issue. Most of the popular approaches for numerical ARM require a priori data discretization to handle the numerical attributes. Moreover, in the process of discovering relations among data, often more than one objective (quality measure) is required, and in most cases, such objectives include conflicting measures. In such a situation, it is recommended to obtain the optimal trade-off between objectives. This paper deals with the numerical ARM problem using a multi-objective perspective by proposing a multi-objective particle swarm optimization algorithm (i.e., MOPAR) for numerical ARM that discovers numerical association rules (ARs) in only one single step. To identify more efficient ARs, several objectives are defined in the proposed multi-objective optimization approach, including confidence, comprehensibility, and interestingness. Finally, by using the Pareto optimality the best ARs are extracted. To deal with numerical attributes, we use rough values containing lower and upper bounds to show the intervals of attributes. In the experimental section of the paper, we analyze the effect of operators used in this study, compare our method to the most popular evolutionary-based proposals for ARM and present an analysis of the mined ARs. The results show that MOPAR extracts reliable (with confidence values close to 95%), comprehensible, and interesting numerical ARs when attaining the optimal trade-off between confidence, comprehensibility and interestingness.  相似文献   

4.
In this paper, a genetic algorithm (GA) is proposed as a search strategy for not only positive but also negative quantitative association rule (AR) mining within databases. Contrary to the methods used as usual, ARs are directly mined without generating frequent itemsets. The proposed GA performs a database-independent approach that does not rely upon the minimum support and the minimum confidence thresholds that are hard to determine for each database. Instead of randomly generated initial population, uniform population that forces the initial population to be not far away from the solutions and distributes it in the feasible region uniformly is used. An adaptive mutation probability, a new operator called uniform operator that ensures the genetic diversity, and an efficient adjusted fitness function are used for mining all interesting ARs from the last population in only single run of GA. The efficiency of the proposed GA is validated upon synthetic and real databases.  相似文献   

5.
在对关联规则冗余问题产生机理分析的基础上,提出了针对于支持度阀值设置的惩罚函数和一个改进的遗传算法。该改进算法采用了频繁项分布、素因子编码、择偶和共享函数等新颖技术,使染色体总是能在频繁项密集区进行挖掘,从而对组合搜索空间进行了有效修剪。并且对事务进行了数值转换,有效地压缩了事务数据库存储空间,提高了运算速度。从实验效果来看,改进的挖掘方法在发现有价值规则的效率与精准率方面具有一定优势。  相似文献   

6.
一种基于模糊关联规则挖掘的攻击识别系统   总被引:1,自引:0,他引:1  
降低攻击识别中的漏报率和误报率是现在一个急需解决的问题。论文分析了攻击识别的需求与模糊关联规则挖掘的有关概念,并且以此为基础构建了一个攻击识别系统。该系统不但能够很好地满足攻击识别的要求,而且还能同时对异常攻击和滥用攻击进行识别,并且在很大程度上降低了攻击识别中的漏报率和误报率,极大地增强了信息系统的生存力。  相似文献   

7.
This paper presents an investigation into two fuzzy association rule mining models for enhancing prediction performance. The first model (the FCM–Apriori model) integrates Fuzzy C-Means (FCM) and the Apriori approach for road traffic performance prediction. FCM is used to define the membership functions of fuzzy sets and the Apriori approach is employed to identify the Fuzzy Association Rules (FARs). The proposed model extracts knowledge from a database for a Fuzzy Inference System (FIS) that can be used in prediction of a future value. The knowledge extraction process and the performance of the model are demonstrated through two case studies of road traffic data sets with different sizes. The experimental results show the merits and capability of the proposed KD model in FARs based knowledge extraction. The second model (the FCM–MSapriori model) integrates FCM and a Multiple Support Apriori (MSapriori) approach to extract the FARs. These FARs provide the knowledge base to be utilized within the FIS for prediction evaluation. Experimental results have shown that the FCM–MSapriori model predicted the future values effectively and outperformed the FCM–Apriori model and other models reported in the literature.  相似文献   

8.
一种高效的基于采样的关联规则挖掘算法   总被引:1,自引:0,他引:1       下载免费PDF全文
在事务数据集中发现项目间的关联规则是数据挖掘的一个经典问题,但传统的关联规则挖掘方法对于大事务数据集而言,执行效率相对较低。已经有研究表明,采样技术能有效地改善挖掘效率。在分析现有采样方法的基础上,提出了一种新的基于采样的高效关联规则挖掘算法ESMA。该算法采用了更加有效的双向采样策略。通过实验分析表明,该算法明显地加快了大事务数据库中采样的速度,从而降低了CPU时间,而且具有很好的可扩展性。  相似文献   

9.
提出一种基于免疫原理的人工免疫算法,用于模糊关联规则的挖掘.该算法通过借鉴生物免疫系统中的克隆选择原理来实施优化操作,它直接从给出的数据中,通过优化机制自动确定每个属性对应的模糊集合,使推导出的满足条件的模糊关联规则数目最多.将实际数据集和相关算法进行性能比较,实验结果表明了所提出算法的有效性.  相似文献   

10.
基于矩阵的增量式关联规则挖掘算法   总被引:1,自引:1,他引:0  
关联规则是数据挖掘的重要研究内容之一。针对数据库数据增加的同时最小支持度发生改变的关联规则更新维护问题,提出了一种基于矩阵的增量式关联规则挖掘算法IUBM。该算法采用简单的数组和位运算,在执行关联规则的更新时,既不用多次扫描数据库,也不产生庞大的候选项集。实例表明,该算法的时间复杂度和空间复杂度大大降低。  相似文献   

11.
基于频繁模式树的分布式关联规则挖掘算法   总被引:1,自引:0,他引:1  
何波 《控制与决策》2012,27(4):618-622
提出一种基于频繁模式树的分布式关联规则挖掘算法(DMARF).DMARF算法设置了中心结点,利用局部频繁模式树让各计算机结点快速获取局部频繁项集,然后与中心结点交互实现数据汇总,最终获得全局频繁项集.DMARF算法采用顶部和底部策略,能大幅减少候选项集,降低通信量.理论分析和实验结果均表明了DMARF算法是快速而有效的.  相似文献   

12.
Fuzzy rules optimization is always a problem for a complex fuzzy model. For a simple 2-inputs-1-output fuzzy model, the designer has to select the most optimum set of fuzzy rules from more than 10 000 combinations. The authors have developed fuzzy models for machinability data selection (Int. J. Flexible Autom. Integrated Manuf. 5 (1 and 2) (1997) 79). There are more than 2×1029 possible sets of rules for each model. The situation would be more complicated if there were a further increase in the number of inputs and/or outputs. The fuzzy rules (Turning Handbook of High-Efficiency Metal Cutting, General Electric Co., Detroit) were selected based on trial and error and/or intuition. Genetic optimization has been suggested in this paper to further optimize the fuzzy rules. The development of a Fuzzy Genetic Optimization algorithm is presented and discussed. An object-oriented library to handle fuzzy rules optimization with genetic optimization has been developed. The effect of constraint rules is also presented and discussed. Comparisons between the results from the optimized models and literature are made.  相似文献   

13.
传统的关联规则挖掘是单向的,不能确定相互依赖的规则,找到的规则不一定是有意义的,甚至是错误的。鉴于此,本文在分析的基础上,提出双向关联规则挖掘算法。并根据其相关性找出对我们有意义的规则。  相似文献   

14.
基于半空间和GA的关联规则快速挖掘算法   总被引:2,自引:0,他引:2       下载免费PDF全文
提出了一种利用半空间模型和遗传算法(GA)对关联规则进行快速挖掘的方法。传统关联规则挖掘算法往往受到数据类型、关联规则的实际意义等约束,大大限制了知识获取的能力。而此方法不再受到上述限制的困扰,并且可以挖掘出用户感兴趣的规则,尤其对于大规模样本集的效果也是相当不错的。  相似文献   

15.
关联挖掘中的时效度研究   总被引:1,自引:0,他引:1  
传统的关联挖掘算法,以支持度和置信度作为评价标准来衡量规则是否有价值。然而,这种模式不能体现出数据的时效敏感特性,如Web数据和长期积累数据。文中将首次建立一个全新的时基模型来重新估计数据规则的价值,并给出时效度(time validity)作为新的规则价值衡量标准。最后,给出了基于这个新的时基模型的一种新并行算法。这种算法使得我们在挖掘过程中使用增量挖掘,而且使得用户可以通过互操作来优化挖掘过程。  相似文献   

16.
一种新的基于FP-Tree的关联规则增量式更新算法   总被引:2,自引:0,他引:2  
挖掘关联规则是数据挖掘研究的一个重要方面,目前已经提出了许多算法用于高效地发现大规模数据库中的关联规则,而维护已发现的关联规则同样是重要的.针对在事务数据库增加和最小支持度同时发生变化的情况下,如何进行关联规则的更新问题进行了研究,提出了一种新的基于频繁模式树的关联规则增量式更新算法,并对该算法进行了分析和讨论.  相似文献   

17.
在含负项目的一般化关联规则的挖掘中,由于负项目的引入使得频繁项集的搜索空间变得更加巨大,而同时挖掘出的关联规则数量也随之增大,但其中很多规则对用户来说是不感兴趣的,而且可能包含一些冗余和错误的规则。因此提出了最大支持度的概念,用来约束频繁项集的挖掘,排除没有意义的关联规则同时也提高了挖掘的效率。在挖掘中对正负项目分别采用不同的最小支持度,使得挖掘更加灵活。并通过实验证明改进是行之有效的。  相似文献   

18.
In this research, a data clustering algorithm named as non-dominated sorting genetic algorithm-fuzzy membership chromosome (NSGA-FMC) based on K-modes method which combines fuzzy genetic algorithm and multi-objective optimization was proposed to improve the clustering quality on categorical data. The proposed method uses fuzzy membership value as chromosome. In addition, due to this innovative chromosome setting, a more efficient solution selection technique which selects a solution from non-dominated Pareto front based on the largest fuzzy membership is integrated in the proposed algorithm. The multiple objective functions: fuzzy compactness within a cluster (π) and separation among clusters (sep) are used to optimize the clustering quality. A series of experiments by using three UCI categorical datasets were conducted to compare the clustering results of the proposed NSGA-FMC with two existing methods: genetic algorithm fuzzy K-modes (GA-FKM) and multi-objective genetic algorithm-based fuzzy clustering of categorical attributes (MOGA (π, sep)). Adjusted Rand index (ARI), π, sep, and computation time were used as performance indexes for comparison. The experimental result showed that the proposed method can obtain better clustering quality in terms of ARI, π, and sep simultaneously with shorter computation time.  相似文献   

19.
Linguistic rules in natural language are useful and consistent with human way of thinking. They are very important in multi-criteria decision making due to their interpretability. In this paper, our discussions concentrate on extracting linguistic rules from data sets. In the end, we firstly analyze how to extract complex linguistic data summaries based on fuzzy logic. Then, we formalize linguistic rules based on complex linguistic data summaries, in which, the degree of confidence of linguistic rules from a data set can be explained by linguistic quantifiers and its linguistic truth from the fuzzy logical point of view. In order to obtain a linguistic rule with a higher degree of linguistic truth, a genetic algorithm is used to optimize the number and parameters of membership functions of linguistic values. Computational results show that the proposed method is an alternative method for extracting linguistic rules with linguistic truth from data sets.  相似文献   

20.
This article presents a multi-objective genetic algorithm which considers the problem of data clustering. A given dataset is automatically assigned into a number of groups in appropriate fuzzy partitions through the fuzzy c-means method. This work has tried to exploit the advantage of fuzzy properties which provide capability to handle overlapping clusters. However, most fuzzy methods are based on compactness and/or separation measures which use only centroid information. The calculation from centroid information only may not be sufficient to differentiate the geometric structures of clusters. The overlap-separation measure using an aggregation operation of fuzzy membership degrees is better equipped to handle this drawback. For another key consideration, we need a mechanism to identify appropriate fuzzy clusters without prior knowledge on the number of clusters. From this requirement, an optimization with single criterion may not be feasible for different cluster shapes. A multi-objective genetic algorithm is therefore appropriate to search for fuzzy partitions in this situation. Apart from the overlap-separation measure, the well-known fuzzy Jm index is also optimized through genetic operations. The algorithm simultaneously optimizes the two criteria to search for optimal clustering solutions. A string of real-coded values is encoded to represent cluster centers. A number of strings with different lengths varied over a range correspond to variable numbers of clusters. These real-coded values are optimized and the Pareto solutions corresponding to a tradeoff between the two objectives are finally produced. As shown in the experiments, the approach provides promising solutions in well-separated, hyperspherical and overlapping clusters from synthetic and real-life data sets. This is demonstrated by the comparison with existing single-objective and multi-objective clustering techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号