首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
Developing rule extraction algorithms from machine learning techniques such as artificial neural networks and support vector machines (SVMs), which are considered incomprehensible black-box models, is an important topic in current research. This study proposes a rule extraction algorithm from SVMs that uses a kernel-based clustering algorithm to integrate all support vectors and genetic algorithms into extracted rule sets. This study uses measurements of accuracy, sensitivity, specificity, coverage, fidelity and comprehensibility to evaluate the performance of the proposed method on the public credit screening data sets. Results indicate that the proposed method performs better than other rule extraction algorithms. Thus, the proposed algorithm is an essential analysis tool that can be effectively used in data mining fields.  相似文献   

2.
In this paper, we propose a novel algorithm for rule extraction from support vector machines (SVMs), termed SQRex-SVM. The proposed method extracts rules directly from the support vectors (SVs) of a trained SVM using a modified sequential covering algorithm. Rules are generated based on an ordered search of the most discriminative features, as measured by interclass separation. Rule performance is then evaluated using measured rates of true and false positives and the area under the receiver operating characteristic (ROC) curve (AUC). Results are presented on a number of commonly used data sets that show the rules produced by SQRex-SVM exhibit both improved generalization performance and smaller more comprehensible rule sets compared to both other SVM rule extraction techniques and direct rule learning techniques.  相似文献   

3.
In this paper, we propose a novel algorithm for rule extraction from support vector machines (SVMs), termed SQRex-SVM. The proposed method extracts rules directly from the support vectors (SVs) of a trained SVM using a modified sequential covering algorithm. Rules are generated based on an ordered search of the most discriminative features, as measured by interclass separation. Rule performance is then evaluated using measured rates of true and false positives and the area under the receiver operating characteristic (ROC) curve (AUC). Results are presented on a number of commonly used data sets that show the rules produced by SQRex-SVM exhibit both improved generalization performance and smaller more comprehensible rule sets compared to both other SVM rule extraction techniques and direct rule learning techniques  相似文献   

4.
5.
Recent research shows that rule based models perform well while classifying large data sets such as data streams with concept drifts. A genetic algorithm is a strong rule based classification algorithm which is used only for mining static small data sets. If the genetic algorithm can be made scalable and adaptable by reducing its I/O intensity, it will become an efficient and effective tool for mining large data sets like data streams. In this paper a scalable and adaptable online genetic algorithm is proposed to mine classification rules for the data streams with concept drifts. Since the data streams are generated continuously in a rapid rate, the proposed method does not use a fixed static data set for fitness calculation. Instead, it extracts a small snapshot of the training example from the current part of data stream whenever data is required for the fitness calculation. The proposed method also builds rules for all the classes separately in a parallel independent iterative manner. This makes the proposed method scalable to the data streams and also adaptable to the concept drifts that occur in the data stream in a fast and more natural way without storing the whole stream or a part of the stream in a compressed form as done by the other rule based algorithms. The results of the proposed method are comparable with the other standard methods which are used for mining the data streams.  相似文献   

6.
Artificial neural network (ANN) is one of the most widely used techniques in classification data mining. Although ANNs can achieve very high classification accuracies, their explanation capability is very limited. Therefore one of the main challenges in using ANNs in data mining applications is to extract explicit knowledge from them. Based on this motivation, a novel approach is proposed in this paper for generating classification rules from feed forward type ANNs. Although there are several approaches in the literature for classification rule extraction from ANNs, the present approach is fundamentally different from them. In the previous studies, ANN training and rule extraction is generally performed independently in a sequential (hierarchical) manner. However, in the present study, training and rule extraction phases are integrated within a multiple objective evaluation framework for generating accurate classification rules directly. The proposed approach makes use of differential evolution algorithm for training and touring ant colony optimization algorithm for rule extracting. The proposed algorithm is named as DIFACONN-miner. Experimental study on the benchmark data sets and comparisons with some other classical and state-of-the art rule extraction algorithms has shown that the proposed approach has a big potential to discover more accurate and concise classification rules.  相似文献   

7.
对传统包分类算法中的规则形式化进行改进,在研究包分类算法中规则转换方法的基础上,提出一种基于集合运算的非匹配规则转换算法,将该算法与其他范围规则转换算法进行性能比较,分析这些算法的时空复杂度,同时进行仿真。实验结果表明,该算法产生的规则数目小于其他算法。  相似文献   

8.
In this paper, a novel automatic image annotation system is proposed, which integrates two sets of support vector machines (SVMs), namely the multiple instance learning (MIL)-based and global-feature-based SVMs, for annotation. The MIL-based bag features are obtained by applying MIL on the image blocks, where the enhanced diversity density (DD) algorithm and a faster searching algorithm are applied to improve the efficiency and accuracy. They are further input to a set of SVMs for finding the optimum hyperplanes to annotate training images. Similarly, global color and texture features, including color histogram and modified edge histogram, are fed into another set of SVMs for categorizing training images. Consequently, two sets of image features are constructed for each test image and are, respectively, sent to the two sets of SVMs, whose outputs are incorporated by an automatic weight estimation method to obtain the final annotation results. Our proposed annotation approach demonstrates a promising performance for an image database of 12 000 general-purpose images from COREL, as compared with some current peer systems in the literature.  相似文献   

9.
Sparse kernel SVMs via cutting-plane training   总被引:1,自引:0,他引:1  
We explore an algorithm for training SVMs with Kernels that can represent the learned rule using arbitrary basis vectors, not just the support vectors (SVs) from the training set. This results in two benefits. First, the added flexibility makes it possible to find sparser solutions of good quality, substantially speeding-up prediction. Second, the improved sparsity can also make training of Kernel SVMs more efficient, especially for high-dimensional and sparse data (e.g. text classification). This has the potential to make training of Kernel SVMs tractable for large training sets, where conventional methods scale quadratically due to the linear growth of the number of SVs. In addition to a theoretical analysis of the algorithm, we also present an empirical evaluation.  相似文献   

10.
基于分治法的快速确定规则获取算法   总被引:2,自引:0,他引:2  
值约简是Rough集理论研究的一个重要内容,目前已有的算法很难快速处理大数据集。文中通过在属性空间上对论域对象的分解,提出一种基于分治法的快速确定规则获取算法,并给出实例说明。该算法可直接从离散的决策表获取确定决策规则,在数据服从均匀分布的条件下,算法的时间复杂度低于n2,适合大数据集的确定规则获取。实验结果说明算法的高效性。  相似文献   

11.
An online incremental learning support vector machine for large-scale data   总被引:1,自引:1,他引:0  
Support Vector Machines (SVMs) have gained outstanding generalization in many fields. However, standard SVM and most of modified SVMs are in essence batch learning, which make them unable to handle incremental learning or online learning well. Also, such SVMs are not able to handle large-scale data effectively because they are costly in terms of memory and computing consumption. In some situations, plenty of Support Vectors (SVs) are produced, which generally means a long testing time. In this paper, we propose an online incremental learning SVM for large data sets. The proposed method mainly consists of two components: the learning prototypes (LPs) and the learning Support Vectors (LSVs). LPs learn the prototypes and continuously adjust prototypes to the data concept. LSVs are to get a new SVM by combining learned prototypes with trained SVs. The proposed method has been compared with other popular SVM algorithms and experimental results demonstrate that the proposed algorithm is effective for incremental learning problems and large-scale problems.  相似文献   

12.
两类模糊推理算法的连续性和逼近性   总被引:9,自引:0,他引:9  
徐蔚鸿  谢中科  杨静宇  叶有培 《软件学报》2004,15(10):1485-1492
对Zadeh的模糊推理合成法则(CRI算法)和全蕴涵三I算法(三I算法)是否满足连续性和逼近性问题进行了细致的研究,进一步讨论了这两类算法对逼近误差的传播性能.为此,把模糊推理算法看成是模糊集合到模糊集合的映射,选用海明距离作为两模糊集的距离.证明了在模糊假言推理和模糊拒取式推理情形,这两类算法都拥有连续性.指出三I算法在已知规则的前件和后件是正规集的条件下总是满足逼近性,而CRI算法只有当它满足还原性时才拥有逼近性.在满足逼近性的条件下,两类算法都不会放大逼近误差.结果对构建模糊控制系统和模糊专家系统时选用和分析模糊推理算法有一定的指导作用.  相似文献   

13.
关联规则算法的实现与改进   总被引:11,自引:0,他引:11  
关联规则作为一种数据挖掘的工具,它能够发现数据项集之间有趣的关联。在关联规则的算法中,Apriori算法是其中的关键算法之一。面对大量复杂的数据集,怎样选择数据结构,怎样优化处理过程,对于此算法的性能将会十分重要。该文首先介绍了关联规则的原理和Apriori算法的实现,然后提出了对该算法的若干改进,例如:采用树型结构存取频繁项集,使用三种缓存优化的方法等。这些优化都能够在整体上提高算法的效率。对于大数据项,试验显示,这些改进能够正确、有效、快速地实现Apriori算法。  相似文献   

14.
A new relational learning system using novel rule selection strategies   总被引:1,自引:0,他引:1  
Mahmut Uludag  Mehmet R. Tolun   《Knowledge》2006,19(8):765-771
This paper describes a new rule induction system, rila, which can extract frequent patterns from multiple connected relations. The system supports two different rule selection strategies, namely the select early and select late strategies. Pruning heuristics are used to control the number of hypotheses generated during the learning process. Experimental results are provided on the mutagenesis and the segmentation data sets. The present rule induction algorithm is also compared to the similar relational learning algorithms. Results show that the algorithm is comparable to similar algorithms.  相似文献   

15.
快速的支持向量机多类分类研究   总被引:1,自引:0,他引:1       下载免费PDF全文
研究了支持向量机多类算法DAGSVM(Direct Acyclic Graph SVM)的速度优势,提出了结合DAGSVM和简化支持向量技术的一种快速支持向量机多类分类方法。该方法一方面减少了一次分类所需的两类支持向量机的数量,另一方面减少了支持向量的数量。实验采用UCI和Statlog数据库的多类数据,并和四种多类方法进行比较,结果表明该方法能有效地加快分类速度。  相似文献   

16.
传统支持向量机通常关注于数据分布的边缘样本,支持向量通常在这些边缘样本中产生。本文提出一个新的支持向量算法,该算法的支持向量从全局的数据分布中产生,其稀疏性能在大部分数据集上远远优于经典支持向量机算法。该算法在多类问题上的时间复杂度仅等价于原支持向量机算法的二值问题,解决了设计多类算法时变量数目庞大或者二值子分类器数目过多的问题。  相似文献   

17.
王明  宋顺林 《计算机应用》2010,30(9):2332-2334
发现频繁项集是关联规则挖掘的主要途径,也是关联规则挖掘算法研究的重点。关联规则挖掘的经典Apriori算法及其改进算法大致可以归为基于SQL和基于内存两类。为了提高挖掘效率,在仔细分析了基于内存算法存在效率瓶颈的基础上,提出了一种发现频繁项集的改进算法。该算法使用了一种快速产生和验证候选项集的方法,提高了生成项目集的速度。实验结果显示该算法能有效提高挖掘效率。  相似文献   

18.
Machine learning support for medical decision making is truly helpful only when it meets two conditions: high prediction accuracy and a good explanation of how the diagnosis was reached. Support vector machines (SVMs) successfully achieve the first target due to a kernel-based engine; evolutionary algorithms (EAs) can greatly accomplish the second owing to their adaptable nature. In this context, the current paper puts forward a two-step hybridized methodology, where learning is accurately performed by the SVMs and a comprehensible emulation of the resulting decision model is generated by EAs in the form of propositional rules, while referring only those indicators that highly influence the class separation. An individual highlighting of the medical attributes that trigger a specific diagnosis for a current patient record is additionally obtained; this feature thus increases the confidence of the physician in the resulting automated diagnosis. Without loss of generality, we aim to model three breast cancer instances, for reasons of both high incidence of the disease and the large application of state of the art artificial intelligence methods for this medical task. As such, the prediction of a benign/malignant condition as well as the recurrence/nonrecurrence of a cancer event are studied on the Wisconsin corresponding data sets from the UCI Machine Learning Repository. The proposed hybridization reached its goals. Rule prototypes evolve against a SVM consistent training data, while diversity among the different classes is implicitly preserved. Feature selection eventually leads to a resulting rule set where only the significant medical indicators together with the discriminating threshold values are referred, while individual relevance of attributes can be additionally obtained for each patient. The gain is thus dual: the EA benefits from a noise-free SVM preprocessed data and the resulting SVM model is able to output rules in a comprehensible, concise format for the physician.  相似文献   

19.
Artificial neural networks (ANNs) are mathematical models inspired from the biological nervous system. They have the ability of predicting, learning from experiences and generalizing from previous examples. An important drawback of ANNs is their very limited explanation capability, mainly due to the fact that knowledge embedded within ANNs is distributed over the activations and the connection weights. Therefore, one of the main challenges in the recent decades is to extract classification rules from ANNs. This paper presents a novel approach to extract fuzzy classification rules (FCR) from ANNs because of the fact that fuzzy rules are more interpretable and cope better with pervasive uncertainty and vagueness with respect to crisp rules. A soft computing based algorithm is developed to generate fuzzy rules based on a data mining tool (DIFACONN-miner), which was recently developed by the authors. Fuzzy DIFACONN-miner algorithm can extract fuzzy classification rules from datasets containing both categorical and continuous attributes. Experimental research on the benchmark datasets and comparisons with other fuzzy rule based classification (FRBC) algorithms has shown that the proposed algorithm yields high classification accuracies and comprehensible rule sets.  相似文献   

20.
Fuzzy rule interpolation is an important research topic in sparse fuzzy rule-based systems. In this paper, we present a new method for dealing with fuzzy rule interpolation in sparse fuzzy rule-based systems based on the principle membership functions and uncertainty grade functions of interval type-2 fuzzy sets. The proposed method deals with fuzzy rule interpolation based on the principle membership functions and the uncertainty grade functions of interval type-2 fuzzy sets. It can deal with fuzzy rule interpolation with polygonal interval type-2 fuzzy sets and can handle fuzzy rule interpolation with multiple antecedent variables. We also use some examples to compare the fuzzy interpolative reasoning results of the proposed method with the ones of an existing method. The experimental result shows that the proposed method gets more reasonable results than the existing method for fuzzy rule interpolation based on interval type-2 fuzzy sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号