首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 578 毫秒
1.
Most of the existing association rule mining algorithms are able to extract knowledge from databases with attributes of binary values. However, in real‐world applications, databases are usually composed of continuous values such as height, length or weight. If the attributes are continuous, the algorithms are commonly integrated with a discretization method that transforms them into discrete attributes. Discretization is a process of transforming a continuous attribute value into a finite number of intervals and assigning each interval into a discrete numerical value. However, the user most often must specify the number of intervals, or provide some heuristic rules to be used while discretization, and then it is difficult to get the highest attribute interdependency and at the same time get the lowest number of intervals. In this paper we present an association rule mining algorithm that is suited for continuous valued attributes commonly found in scientific and statistical databases. We propose a method using a new graph‐based evolutionary algorithm named ‘genetic network programming (GNP)’ that can deal with continuous values directly, that is, without using any discretization method as a preprocessing step. GNP represents its individuals using graph structures and evolves them in order to find a solution; this feature contributes to creating very compact programs and implicitly memorizing past action sequences. In the proposed method using GNP, the significance of the extracted association rules is measured by the use of χ2 test, and only important association rules are stored in a pool all together through generations. Results of experiments conducted on a real‐life database suggest that the proposed method provides an effective technique for handling continuous attributes. Copyright © 2008 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.  相似文献   

2.
基于模糊多目标遗传优化算法的节假日电力负荷预测   总被引:10,自引:1,他引:10  
多目标遗传优化算法的一个优点就是可在一次迭代计算中寻找到问题的多个非劣最优解。该文应用多目标遗传算法和关联规则算法提出一个基于模糊规则的电力负荷模式分类系统。在此分类系统中采用多目标遗传优化算法从众多模糊分类规则中自动挑选出具有较好识别性能和可解释性的模糊规则,并利用模糊关联规则挖掘通过启发式规则选择改善遗传算法的搜索性能。经仿真试验表明此分类系统具有较好的分类性能,可为节假日负荷预测提供更为充分的历史数据,从而改善其负荷预测性能。  相似文献   

3.
During the last years, several association rule‐based classification methods have been proposed, these algorithms may quickly generate accurate rules. However, the generated rules are often very large in terms of the number of rules and usually complex and hardly understandable for users. Among all the rules generated by the algorithms, only some of them are likely to be of any interest to the domain expert analyzing the data. Most of the rules are either redundant, irrelevant or obvious. In this paper, a new method for selecting the interesting class association rules is proposed by an evolutionary method named genetic relation algorithm. The algorithm evaluates the relevance and interestingness of the discovered association rules by the relationships between the rules in each generation using a specific measure of distance among them giving a reduced set of rules as the result in the final generation. This small rule set has the following properties: (i) accurate as it has at least the same classification accuracy as the complete association rule set, (ii) interesting because of the diversity of rules and (iii) comprehensible because it is more understandable for the users as the number of attributes involved in the rules is also small. The efficiency of the proposed method is compared with other conventional methods including genetic network programming‐based mining using ten databases and the experimental results show that it outperforms others keeping a good balance between the classification accuracy and the comprehensibility of the rules. © 2011 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.  相似文献   

4.
Because of the expansion of the Internet in recent years, computer systems are exposed to an increasing number and type of security threats. How to detect network intrusions effectively becomes an important technique. This paper proposes a class association rule mining approach based on genetic network programming (GNP) for detecting network intrusions. This approach can deal with both discrete and continuous attributes in network‐related data. And it can be flexibly applied to both misuse detection and anomaly detection. Experimental results with KDD99Cup and DARPA98 database from MIT Lincoln Laboratory shows that the proposed method provides a competitive high detection rate (DR) compared to other machine learning techniques. © 2010 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.  相似文献   

5.
Genetic network programming (GNP)‐based class association rule mining has been demonstrated to be efficient for misuse and anomaly detection. However, misuse detection is weak in detecting brand new attacks, while anomaly detection has a defect of high positive false rate. In this paper, a unified detection method is proposed to integrate misuse detection and anomaly detection to overcome their disadvantages. In addition, GNP‐based class association rule mining method extracts an overwhelming number of rules which contain much redundant and irrelevant information. Therefore, in this paper, an efficient class association rule‐pruning method is proposed based on matching degree and genetic algorithm (GA). In the first stage, a matching degree‐based method is applied to preprune the rules in order to improve the efficiency of the GA. In the second stage, the GA is implemented to pick up the effective rules among the rules remaining in the first stage. Simulations on KDDCup99 show the high performance of the proposed method. © 2012 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.  相似文献   

6.
Intertransaction class association rule (interCAR) has the ability to find the relationships among attributes from different transactions, which has shown its effectiveness for stock market prediction. A crisp interCAR mining method based on Genetic Network Programming (GNP) has been studied in our previous work. But, the crisp method loses much useful information in the discretization and it has many unstable factors influencing the prediction results, so more information is desired in order to make the prediction safer and more efficient. In this paper, a fuzzy interCAR mining method is proposed to keep as much information as possible in the data transformation. Besides, the proposed method has ability that the trading actions bring large profits. The proposed method is applied to Tokyo Stock Exchange, where we compared it with the crisp method as well as some other methods. © 2011 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.  相似文献   

7.
Quantitative attributes are partitioned into several fuzzy sets by using fuzzy c-means algorithm. Fuzzy c-means algorithm can embody the actual distribution of the data, and fuzzy sets can soften the partition boundary. Then, we improve the search technology of apriori algorithm and present the algorithm for mining fuzzy association rules. As the database size becomes larger and larger, a better way is to mine fuzzy association rules in parallel. In the parallel mining algorithm, quantitative attributes are partitioned into several fuzzy sets by using parallel fuzzy c-means algorithm. Boolean parallel algorithm is improved to discover frequent fuzzy attribute set, and the fuzzy association rules with at least a minimum confidence are generated on all processors. The experiment results implemented on the distributed linked PC/workstation show that the parallel mining algorithm has fine scaleup, sizeup and speedup. Last, we discuss the application of fuzzy association rules in the classification. The example shows that the accuracy of classification systems of the fuzzy association rules is better than that of the two popular classification methods: C4.5 and CBA. __________ Translated from Journal of Southeast University (Natural Science Edition), 2005, 35(2): 165–170 (in Chinese)  相似文献   

8.
粗集理论是一种新型处理模糊和不确定知识的数学工具,其中属性约简是它的核心内容。粗集理论的主要思想是在保持分类能力不变的前提下,通过属性约简,导出问题的决策或分类规则。深入研究粗集理论,在属性约简过程中加入了启发式信息,大大提高了挖掘效率,得出了一种新的决策规则挖掘算法。实例分析表明,该算法能够发现良好的决策规则。  相似文献   

9.
Genetic network programming (GNP)‐based time‐related association rules mining method provides a useful mean to investigate future traffic volume of road networks and hence helps us to develop traffic navigation system. Further improvements have been proposed in this paper about the time‐related association rule mining using generalized GNP with multibranches and full‐paths (MBFP) algorithm. For fully utilizing the potential ability of GNP structure, the mechanism of generalized GNP with MBFP is studied. The aim of this algorithm is to better handle association rule extraction from the databases with high efficiency in a variety of time‐related applications, especially in the traffic volume prediction problems. The generalized algorithm which can find the important time‐related association rules is described, and experimental results are presented considering a traffic prediction problem. © 2011 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.  相似文献   

10.
基于最小二乘支持向量机的短期负荷预测   总被引:9,自引:4,他引:5  
提出了结合粗糙集(rough sets,RS)理论和遗传算法(genetic algorithm,GA)的最小二乘支持向量机(least squares support vector machines,LS-SVM)短期负荷预测模型和算法。由于影响负荷预测精度的因素众多, 该模型采用RS理论进行历史数据的预处理,对各条件属性进行约简分析。属性约简采用GA进行寻优,以确定与负荷密切相关的因素,作为LS-SVM的有效输入变量。在预测过程中,通过GA对LS-SVM的模型参数进行自适应寻优,从而提高负荷预测精度,避免LS-SVM对经验的依赖以及预测过程中对模型参数的盲目选择。采用上述方法对山东电网负荷进行了预测分析,结果证明了该方法的有效性。  相似文献   

11.
In this paper, new evolutionary computation methods named genetic relation algorithm (GRA) and genetic network programming (GNP) have been applied to the portfolio selection problem. The number of brands in the stock market is generally very large, therefore, techniques for selecting the effective portfolio are likely to be of interest in the financial field. In order to pick up the most efficient portfolio, the proposed model considers the correlation coefficient between stock brands as strength, which indicates the relation between nodes in GRA. The algorithm evaluates the relationships between stock brands using a specific measure of strength and generates the optimal portfolio in the final generation. Then, the selected portfolio is further optimized by the stock trading model of GNP. In a sense, the proposed model is an integrated intelligent model. A comprehensive analysis of the results is provided, and it is clarified that the proposed model can obtain much higher profits than other traditional methods. © 2011 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.  相似文献   

12.
粗糙集理论及其在短期电力负荷预测中的应用   总被引:2,自引:0,他引:2  
影响负荷预测精度的因素众多,为了找到负荷值与各种外在因素之间的关系,利用粗糙集理论对各条件属性进行属性约简分析,在属性约简算法中采用遗传算法进行寻优计算,找到与负荷直接相关的因素,然后将它作为模糊神经网络的输入矢量进行负荷预测.经仿真分析证明预测精度和速度都得到改善.  相似文献   

13.
电力系统中海量暂态扰动的分析与治理需要以高效准确的扰动分类为基础。现有扰动识别方法缺少合理的特征选择环节,分类器过于复杂,不能满足高效分类的需要。提出一种新的电能质量扰动特征选择方法。首先,对原始信号使用S变换进行预处理,提取具有代表性的25种扰动信号特征构建原始特征集合;然后,根据极限学习机识别准确率构造用于扰动特征选择的遗传算法适应度函数;最后,用遗传算法来进行迭代运算,确定最优特征集合。实验证明,新方法能够有效去除冗余特征,在保证分类准确率前提下,有效降低分类器复杂度,提高分类效率。  相似文献   

14.
一种基于粗糙集与小波变换的电能质量分类方法   总被引:6,自引:0,他引:6  
针对目前电能质量问题分类常用方法中存在判断过程复杂且计算量大的问题。提出将小波变换和粗糙集理论相结合解决电能质量分类问题的方法。首先.利用小波变换提取扰动信号的特征矢量样本数据:然后。应用模糊C均值聚类的方法将所提取的连续的特征矢量样本数据离散化。得到离散化后的分类知识规则表;最后。采用粗糙集理论中的属性与属性值约简算法,获得判断电能质量分类的核心规则知识。通过对Matlab下的模拟信号数据进行仿真实验。结果表明该方法可直接由信号数据快速准确地判断出信号所属的电能质量类型.简单易行。  相似文献   

15.
关联规则挖掘在电厂设备故障监测中应用   总被引:5,自引:0,他引:5  
关联规则挖掘是数据挖掘的重要分支,其通过描述数据库中不同数据属性之间所存在的潜在关系规则,找出满足给定支持度阀值和置信度阀值多个域之间的依赖关系。随着电厂设备运行期间各种故障的发生,各状态监测点参数也会发生相应变化,利用关联规则挖掘算法,找出故障发生时故障现象与故障类别之间的关联关系,更好地对设备进行故障监测与诊断。阐述了关联规则挖掘的主要概念,对挖掘时最常用的Apriori算法进行探讨,并以汽轮机凝汽器的一种典型故障为例说明了算法的执行情况,对挖掘结果进行了解释。结果验证了所用方法的可行性与正确性。  相似文献   

16.
基于配用电信息系统数据和关联规则算法,提出一种诊断中压配电网分支线断线不接地故障的方法。通过分析相互关联的配用电信息系统数据,提出基于数据特征选择的关联规则挖掘方法,并通过卡方分裂算法将连续型特征量转换为布尔型特征量,同时采用MSApriori算法解决故障信息中的稀有项问题,然后在此基础上应用kulc准则消除冗余规则以形成约简的代表规则家族。以华东某地区配用电信息系统中的历史数据为依据进行实际算例分析,结果说明所提出的方法能够大量减少无效挖掘,显著提高效率和准确度,适用于中压配电网断线故障的在线诊断。  相似文献   

17.
Genetic programming for knowledge discovery in chest-pain diagnosis   总被引:10,自引:0,他引:10  
Explores a promising data mining approach. Despite the small number of examples available in the authors' application domain (taking into account the large number of attributes), the results of their experiments can be considered very promising. The discovered rules had good performance concerning predictive accuracy, considering both the rule set as a whole and each individual rule. Furthermore, what is more important from a data mining viewpoint, the system discovered some comprehensible rules. It is interesting to note that the system achieved very consistent results by working from “tabula rasa,” without any background knowledge, and with a small number of examples. The authors emphasize that their system is still in an experiment in the research stage of development. Therefore, the results presented here should not be used alone for real-world diagnoses without consulting a physician. Future research includes a careful selection of attributes in a preprocessing step, so as to reduce the number of attributes (and the corresponding search space) given to the GP. Attribute selection is a very active research area in data mining. Given the results obtained so far, GP has been demonstrated to be a really useful data mining tool, but future work should also include the application of the GP system proposed here to other data sets, to further validate the results reported in this article  相似文献   

18.
基于粗糙集理论知识,对关联规则挖掘算法作出一定的改进。该算法的主要思想是把集合的近似质量作为迭代准则,初始约简集是所有的条件属性集合,在保证近似质量不变的前提下通过逐步缩减的方式来求取约简集,保证了所求的约简不会减弱对问题的分类决策能力。约简后得到新的决策表,在此基础上应用基于贪心思想的Apriori算法挖掘关联规则。算法的主要优势是在不影响对问题分类决策能力的前提下,以较小的属性和候选项集数目以及有限的扫描次数生成决策规则。通过应用实例和实验分析验证了算法的有效性。  相似文献   

19.
在对运动目标检测构建出精准的背景模型的方法中,k均值聚类算法是一种快速且简单有效的划分法,对于大型数据集,可伸缩且高效k均值聚类算法被广泛应用。但是,该算法会对初始聚类中心的变化表现得敏感,聚类中心的变化常会使得算法误差较大。本文将介绍一种对初始聚类中心选择改进法:利用遗传算法能高效地全局搜索出最优解这一特点,克服了k均值聚类算法易陷入局部最优解的缺点。改进后的遗传算法MAGA能快速地提取出最优初始聚类中心,通过实验仿真总结出基于MAGA的k均值聚类建模精确度比较高,对检测小而多的运动目标存在很大优势。  相似文献   

20.
针对当前电网企业电费回收风险,提出了一种基于改进随机森林的电力用户欠费风险分析预警方法。首先,针对欠费用户、正常缴费用户的类别分布不均衡问题,采用SMOTE算法优化原始用户样本分布;接着,选择信息值计算各属性与目标类别属性的相关性,进而优化节点属性的选择;然后,针对影响随机森林分类准确率和性能的主要参数:树的规模nTree、叶子节点的最小样本数minLeaf和属性子集的数量K,采用加温模拟退火算法搜寻最优参数组合;最后,采用改进的随机森林算法对用户未来是否欠费进行分析预测,得到潜在欠费高风险用户。将该方法与逻辑回归、决策树等常用分类算法进行了对比分析,结果验证了该方法的有效性。#$NL关键词:电力用户;欠费风险预测;随机森林算法;SMOTE;信息值;参数组合;加温模拟退火算法  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号