首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
A hybrid coevolutionary algorithm for designing fuzzy classifiers   总被引:1,自引:0,他引:1  
Rule learning is one of the most common tasks in knowledge discovery. In this paper, we investigate the induction of fuzzy classification rules for data mining purposes, and propose a hybrid genetic algorithm for learning approximate fuzzy rules. A novel niching method is employed to promote coevolution within the population, which enables the algorithm to discover multiple rules by means of a coevolutionary scheme in a single run. In order to improve the quality of the learned rules, a local search method was devised to perform fine-tuning on the offspring generated by genetic operators in each generation. After the GA terminates, a fuzzy classifier is built by extracting a rule set from the final population. The proposed algorithm was tested on datasets from the UCI repository, and the experimental results verify its validity in learning rule sets and comparative advantage over conventional methods.  相似文献   

2.
刘晓平 《计算机仿真》2006,23(4):103-105,113
数据挖掘是从大量原始数据中抽取隐藏知识的过程。大部分数据挖掘工具采用规则发现和决策树分类技术来发现数据模式和规则,其核心是归纳算法。与传统统计方法相比,基于机器学习技术得到的分类结果具有较好的可解释性。在针对特定的数据集进行数据挖掘时,如果缺乏相应的领域知识,用户或决策者就很难确定选择何种归纳算法。因此,需要尝试各种算法。借助MLC++,决策者能够轻而易举地比较不同分类算法对特定数据集的有效性,从而选择合适的分类算法。同时,系统开发人员也可以利用MLC++设计各种混合算法。  相似文献   

3.
As a broad subfield of artificial intelligence, machine learning is concerned with the development of algorithms and techniques that allow computers to learn. These methods such as fuzzy logic, neural networks, support vector machines, decision trees and Bayesian learning have been applied to learn meaningful rules; however, the only drawback of these methods is that it often gets trapped into a local optimal. In contrast with machine learning methods, a genetic algorithm (GA) is guaranteeing for acquiring better results based on its natural evolution and global searching. GA has given rise to two new fields of research where global optimization is of crucial importance: genetic based machine learning (GBML) and genetic programming (GP). This article adopts the GBML technique to provide a three-phase knowledge extraction methodology, which makes continues and instant learning while integrates multiple rule sets into a centralized knowledge base. Moreover, the proposed system and GP are both applied to the theoretical and empirical experiments. Results for both approaches are presented and compared. This paper makes two important contributions: (1) it uses three criteria (accuracy, coverage, and fitness) to apply the knowledge extraction process which is very effective in selecting an optimal set of rules from a large population; (2) the experiments prove that the rule sets derived by the proposed approach are more accurate than GP.  相似文献   

4.
The comprehensibility aspect of rule discovery is of emerging interest in the realm of knowledge discovery in databases. Of the many cognitive and psychological factors relating the comprehensibility of knowledge, we focus on the use of human amenable concepts as a representation language in expressing classification rules. Existing work in neural logic networks (or neulonets) provides impetus for our research; its strength lies in its ability to learn and represent complex human logic in decision-making using symbolic-interpretable net rules. A novel technique is developed for neulonet learning by composing net rules using genetic programming. Coupled with a sequential covering approach for generating a list of neulonets, the straightforward extraction of human-like logic rules from each neulonet provides an alternate perspective to the greater extent of knowledge that can potentially be expressed and discovered, while the entire list of neulonets together constitute an effective classifier. We show how the sequential covering approach is analogous to association-based classification, leading to the development of an association-based neulonet classifier. Empirical study shows that associative classification integrated with the genetic construction of neulonets performs better than general association-based classifiers in terms of higher accuracies and smaller rule sets. This is due to the richness in logic expression inherent in the neulonet learning paradigm.  相似文献   

5.
Relational rule learning algorithms are typically designed to construct classification and prediction rules. However, relational rule learning can be adapted also to subgroup discovery. This paper proposes a propositionalization approach to relational subgroup discovery, achieved through appropriately adapting rule learning and first-order feature construction. The proposed approach was successfully applied to standard ILP problems (East-West trains, King-Rook-King chess endgame and mutagenicity prediction) and two real-life problems (analysis of telephone calls and traffic accident analysis). Editors: Hendrik Blockeel, David Jensen and Stefan Kramer An erratum to this article is available at .  相似文献   

6.
Credit Assignment in Rule Discovery Systems Based on Genetic Algorithms   总被引:5,自引:0,他引:5  
In rule discovery systems, learning often proceeds by first assessing the quality of the system's current rules and then modifying rules based on that assessment. This paper addresses the credit assignment problem that arises when long sequences of rules fire between successive external rewards. The focus is on the kinds of rule assessment schemes which have been proposed for rule discovery systems that use genetic algorithms as the primary rule modification strategy. Two distinct approaches to rule learning with genetic algorithms have been previously reported, each approach offering a useful solution to a different level of the credit assignment problem. We describe a system, called RUDI, that exploits both approaches. We present analytic and experimental results that support the hypothesis that multiple levels of credit assignment can improve the performance of rule learning systems based on genetic algorithms.  相似文献   

7.
Knowledge discovery refers to identifying hidden and valid patterns in data and it can be used to build knowledge inference systems. Decision tree is one such successful technique for supervised learning and extracting knowledge or rules. This paper aims at developing a decision tree model to predict the occurrence of diabetes disease. Traditional decision tree algorithms have a problem with crisp boundaries. Much better decision rules can be identified from these clinical data sets with the use of the fuzzy decision boundaries. The key step in the construction of a decision tree is the identification of split points and in this work best split points are identified using the Gini index. Authors propose a method to minimize the calculation of Gini indices by identifying false split points and used the Gaussian fuzzy function because the clinical data sets are not crisp. As the efficiency of the decision tree depends on many factors such as number of nodes and the length of the tree, pruning of decision tree plays a key role. The modified Gini index-Gaussian fuzzy decision tree algorithm is proposed and is tested with Pima Indian Diabetes (PID) clinical data set for accuracy. This algorithm outperforms other decision tree algorithms.  相似文献   

8.
Data mining usually means the methodologies and tools for the efficient new knowledge discovery from databases. In this paper, a genetic algorithms (GAs) based approach to assess breast cancer pattern is proposed for extracting the decision rules including the predictors, the corresponding inequality and threshold values simultaneously so as to building a decision-making model with maximum prediction accuracy. Early many studies of handling the breast cancer diagnostic problems used the statistical related techniques. As the diagnosis of breast cancer is highly nonlinear in nature, it is hard to develop a comprehensive model taking into account all the independent variables using conventional statistical approaches. Recently, numerous studies have demonstrated that neural networks (NNs) are more reliable than the traditional statistical approaches and the dynamic stress method. The usefulness of using NNs have been reported in literatures but the most obstacle is the in the building and using the model in which the classification rules are hard to be realized. We compared our results against a commercial data mining software, and we show experimentally that the proposed rule extraction approach is promising for improving prediction accuracy and enhancing the modeling simplicity. In particular, our approach is capable of extracting rules which can be developed as a computer model for prediction or classification of breast cancer potential like expert systems.  相似文献   

9.
Generating a Condensed Representation for Association Rules   总被引:1,自引:0,他引:1  
Association rule extraction from operational datasets often produces several tens of thousands, and even millions, of association rules. Moreover, many of these rules are redundant and thus useless. Using a semantic based on the closure of the Galois connection, we define a condensed representation for association rules. This representation is characterized by frequent closed itemsets and their generators. It contains the non-redundant association rules having minimal antecedent and maximal consequent, called min-max association rules. We think that these rules are the most relevant since they are the most general non-redundant association rules. Furthermore, this representation is a basis, i.e., a generating set for all association rules, their supports and their confidences, and all of them can be retrieved needless accessing the data. We introduce algorithms for extracting this basis and for reconstructing all association rules. Results of experiments carried out on real datasets show the usefulness of this approach. In order to generate this basis when an algorithm for extracting frequent itemsets—such as Apriori for instance—is used, we also present an algorithm for deriving frequent closed itemsets and their generators from frequent itemsets without using the dataset.  相似文献   

10.
杜柏阳  孔祥玉  罗家宇 《自动化学报》2021,47(12):2815-2822
并行主成分提取算法在信号特征提取中具有十分重要的作用, 采用加权规则将主子空间(Principal subspace, PS)提取算法转变为并行主成分提取算法是很有效的方式, 但研究加权规则对状态矩阵影响的理论分析非常少. 对加权规则影响的分析不仅可以提供加权规则下的主成分提取算法动力学的详细认知, 而且对于其他子空间跟踪算法转变为并行主成分提取算法的可实现性给出判断条件. 本文通过比较Oja的主子空间跟踪算法和加权Oja并行主成分提取算法, 通过两种算法的差异分析了加权规则对算法提取矩阵方向的影响. 首先, 针对二维输入信号, 研究了提取两个主成分时加权规则的信息准则对状态矩阵方向的作用方式. 进而, 针对大于二维输入信号的情况, 给出加权规则影响多个主成分提取方式的讨论. 最后, MATLAB仿真验证了所提出理论的有效性.  相似文献   

11.
Artificial neural network (ANN) is one of the most widely used techniques in classification data mining. Although ANNs can achieve very high classification accuracies, their explanation capability is very limited. Therefore one of the main challenges in using ANNs in data mining applications is to extract explicit knowledge from them. Based on this motivation, a novel approach is proposed in this paper for generating classification rules from feed forward type ANNs. Although there are several approaches in the literature for classification rule extraction from ANNs, the present approach is fundamentally different from them. In the previous studies, ANN training and rule extraction is generally performed independently in a sequential (hierarchical) manner. However, in the present study, training and rule extraction phases are integrated within a multiple objective evaluation framework for generating accurate classification rules directly. The proposed approach makes use of differential evolution algorithm for training and touring ant colony optimization algorithm for rule extracting. The proposed algorithm is named as DIFACONN-miner. Experimental study on the benchmark data sets and comparisons with some other classical and state-of-the art rule extraction algorithms has shown that the proposed approach has a big potential to discover more accurate and concise classification rules.  相似文献   

12.
An efficient algorithm for discovering frequent subgraphs   总被引:8,自引:0,他引:8  
Over the years, frequent itemset discovery algorithms have been used to find interesting patterns in various application areas. However, as data mining techniques are being increasingly applied to nontraditional domains, existing frequent pattern discovery approaches cannot be used. This is because the transaction framework that is assumed by these algorithms cannot be used to effectively model the data sets in these domains. An alternate way of modeling the objects in these data sets is to represent them using graphs. Within that model, one way of formulating the frequent pattern discovery problem is that of discovering subgraphs that occur frequently over the entire set of graphs. We present a computationally efficient algorithm, called FSG, for finding all frequent subgraphs in large graph data sets. We experimentally evaluate the performance of FSG using a variety of real and synthetic data sets. Our results show that despite the underlying complexity associated with frequent subgraph discovery, FSG is effective in finding all frequently occurring subgraphs in data sets containing more than 200,000 graph transactions and scales linearly with respect to the size of the data set.  相似文献   

13.
This paper presents a genetic fuzzy system for the data mining task of subgroup discovery, the subgroup discovery iterative genetic algorithm (SDIGA), which obtains fuzzy rules for subgroup discovery in disjunctive normal form. This kind of fuzzy rule allows us to represent knowledge about patterns of interest in an explanatory and understandable form that can be used by the expert. Experimental evaluation of the algorithm and a comparison with other subgroup discovery algorithms show the validity of the proposal. SDIGA is applied to a market problem studied in the University of Mondragon, Spain, in which it is necessary to extract automatically relevant and interesting information that helps to improve fair planning policies. The application of SDIGA to this problem allows us to obtain novel and valuable knowledge for experts.  相似文献   

14.
为增强知识图谱表示的预测精度和可解释性,通过改进由表示学习、规则学习和规则融合三个模块组成的IterE框架,提出一种适用各种表示学习算法的联合FOL规则的知识图谱表示学习方法,针对规则学习和融合模块,基于三元组打分函数改进规则置信度计算方法,扩展适用性,并改进软标签计算方法,放松融合要求,扩大融合的数据增量,迭代实现表示更新规则和规则增强表示。链路预测和生成解释实验表明,随着逻辑规则的加入,该方法提高了基模型的预测精度和可解释性,且在越稀疏的数据集中对提高稀疏实体表示的帮助越大。  相似文献   

15.
《Knowledge》2002,15(1-2):85-94
Lists of if–then rules (i.e. ordered rule sets) are among the most expressive and intelligible representations for inductive learning algorithms. Two extreme strategies searching for such a list of rules can be distinguished: (i) local strategies primarily based on a step-by-step search for the optimal list of rules, and (ii) global strategies primarily based on a one-strike search for the optimal list of rules. Both approaches have their disadvantages. In this paper we present an intermediate strategy. A sequential covering strategy is combined with a one-strike genetic search for the next most promising rule. To achieve this, a new rule-fitness function is introduced. Experimental results on benchmark problems are presented and the performance of our intermediate approach is compared with other rule learning algorithms. Finally, GeSeCo's performance is compared to a more local strategy on a set of tasks in which the information value of individual attributes is varied.  相似文献   

16.
在知识互联的大数据环境下,初步构建的领域知识图谱可展示该领域知识的结构化信息,但实体之间隐含的潜在关系并未在图谱中得到充分表达。为解决领域知识图谱实体关系丰富和扩展问题,提出一种基于实体间关联规则分析与主题分析的关系发现方法。应用与领域实体相关的数据,通过实体间关联规则分析与实体相关数据集间主题分布相似度分析获取领域实体间潜在关系,将新发现的关系融合到初步构建的知识图谱中,实现领域知识图谱的潜在关系扩展。实验结果表明,该方法能够发现部门实体间的共性,挖掘出隐藏在领域实体间的关系,可有效地应用于领域实体间关系发现,丰富领域知识图谱。  相似文献   

17.
18.
Developing rule extraction algorithms from machine learning techniques such as artificial neural networks and support vector machines (SVMs), which are considered incomprehensible black-box models, is an important topic in current research. This study proposes a rule extraction algorithm from SVMs that uses a kernel-based clustering algorithm to integrate all support vectors and genetic algorithms into extracted rule sets. This study uses measurements of accuracy, sensitivity, specificity, coverage, fidelity and comprehensibility to evaluate the performance of the proposed method on the public credit screening data sets. Results indicate that the proposed method performs better than other rule extraction algorithms. Thus, the proposed algorithm is an essential analysis tool that can be effectively used in data mining fields.  相似文献   

19.
Data-mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Most conventional data-mining algorithms identify the relationships among transactions using binary values, however, transactions with quantitative values are commonly seen in real-world applications. This paper thus proposes a new data-mining algorithm for extracting interesting knowledge from transactions stored as quantitative values. The proposed algorithm integrates fuzzy set concepts and the apriori mining algorithm to find interesting fuzzy association rules in given transaction data sets. Experiments with student grades at I-Shou University were also made to verify the performance of the proposed algorithm.  相似文献   

20.
基于改进神经网络的WEB数据挖掘研究   总被引:2,自引:1,他引:1  
人工神经网络是在现代神经生物学研究成果的基础上发展起来的一种模拟人脑信息处理机制的网络系统,它不但具有处理数值数据的一般计算能力,而且还具有处理知识的思维、学习、记忆能力.基于神经网络的数据挖掘过程由数据准备、规则提取和规则评估三个阶段组成.研究了分解型规则抽取算法,在分析了分解型算法后,利用关联法对输入输出神经元进行关联计算,按关联度排完序之后,用神经网络进行结点选择,可以大大减少神经网络的输入结点个数数据集中数据的验证,表明了方法的有效性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号