首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 20 毫秒
1.
Some of the most influential factors in the quality of the solutions found by an evolutionary algorithm (EA) are a correct coding of the search space and an appropriate evaluation function of the potential solutions. EAs are often used to learn decision rules from datasets, which are encoded as individuals in the genetic population. In this paper, the coding of the search space for the obtaining of those decision rules is approached, i.e., the representation of the individuals of the genetic population and also the design of specific genetic operators. Our approach, called "natural coding," uses one gene per feature in the dataset (continuous or discrete). The examples from the datasets are also encoded into the search space, where the genetic population evolves, and therefore the evaluation process is improved substantially. Genetic operators for the natural coding are formally defined as algebraic expressions. Experiments with several datasets from the University of California at Irvine (UCI) machine learning repository show that as the genetic operators are better guided through the search space, the number of rules decreases considerably while maintaining the accuracy, similar to that of hybrid coding, which joins the well-known binary and real representations to encode discrete and continuous attributes, respectively. The computational cost associated with the natural coding is also reduced with regard to the hybrid representation. Our algorithm, HlDER*, has been statistically tested against C4.5 and C4.5 Rules, and performed well. The knowledge models obtained are simpler, with very few decision rules, and therefore easier to understand, which is an advantage in many domains. The experiments with high-dimensional datasets showed the same good behavior, maintaining the quality of the knowledge model with respect to prediction accuracy.  相似文献   

2.
3.
Alzheimer’s disease is a complex progressive neurodegenerative brain disorder, being its prevalence expected to rise over the next decades. Unconventional strategies for elucidating the genetic mechanisms are necessary due to its polygenic nature. In this work, the input information sources are five: a public DNA microarray that measures expression levels of control and patient samples, repositories of known genes associated to Alzheimer’s disease, additional data, Gene Ontology and finally, a literature review or expert knowledge to validate the results. As methodology to identify genes highly related to this disease, we present the integration of three machine learning techniques: particularly, we have used decision trees, quantitative association rules and hierarchical cluster to analyze Alzheimer’s disease gene expression profiles to identify genes highly linked to this neurodegenerative disease, through changes in their expression levels between control and patient samples. We propose an ensemble of decision trees and quantitative association rules to find the most suitable configurations of the multi-objective evolutionary algorithm GarNet, in order to overcome the complex parametrization intrinsic to this type of algorithms. To fulfill this goal, GarNet has been executed using multiple configuration settings and the well-known C4.5 has been used to find the minimum accuracy to be satisfied. Then, GarNet is rerun to identify dependencies between genes and their expression levels, so we are able to distinguish between healthy individuals and Alzheimer’s patients using the configurations that overcome the minimum threshold of accuracy defined by C4.5 algorithm. Finally, a hierarchical cluster analysis has been used to validate the obtained gene-Alzheimer’s Disease associations provided by GarNet. The results have shown that the obtained rules were able to successfully characterize the underlying information, grouping relevant genes for Alzheimer Disease. The genes reported by our approach provided two well defined groups that perfectly divided the samples between healthy and Alzheimer’s Disease patients. To prove the relevance of the obtained results, a statistical test and gene expression fold-change were used. Furthermore, this relevance has been summarized in a volcano plot, showing two clearly separated and significant groups of genes that are up or down-regulated in Alzheimer’s Disease patients. A biological knowledge integration phase was performed based on the information fusion of systematic literature review, enrichment Gene Ontology terms for the described genes found in the hippocampus of patients. Finally, a validation phase with additional data and a permutation test is carried out, being the results consistent with previous studies.  相似文献   

4.
关联分类具有较高的分类精度和较强的适应性,然而由于分类器是由一组高置信度的规则构成,有时会存在过度拟合问题。提出了基于规则兴趣度的关联分类(ACIR)。它扩展了TD-FP-growth算法,使之有效地挖掘训练集,产生满足最小支持度和最小置信度的有趣的规则。通过剪枝选择一个小规则集构造分类器。在规则剪枝过程中,采用规则兴趣度来评价规则的质量,综合考虑规则的预测精度和规则中项的兴趣度。实验结果表明该方法在分类精度上优于See5、CBA和CMAR,并且具有较好的可理解性和扩展性。  相似文献   

5.
Hierarchical unsupervised fuzzy clustering   总被引:5,自引:0,他引:5  
A recursive algorithm for hierarchical fuzzy partitioning is presented. The algorithm has the advantages of hierarchical clustering, while maintaining fuzzy clustering rules. Each pattern can have a nonzero membership in more than one subset of the data in the hierarchy. Optimal feature extraction and reduction is optionally reapplied for each subset. Combining hierarchical and fuzzy concepts is suggested as a natural feasible solution to the cluster validity problem of real data. The convergence and membership conservation of the algorithm are proven. The algorithm is shown to be effective for a variety of data sets with a wide dynamic range of both covariance matrices and number of members in each class  相似文献   

6.
Unifying Instance-Based and Rule-Based Induction   总被引:13,自引:0,他引:13  
Domingos  Pedro 《Machine Learning》1996,24(2):141-168
Several well-developed approaches to inductive learning low exist, but each has specific limitations that are hard to overcome. Multi-strategy learning attempts to tackle this problem combining multiple methods in one algorithm. This article describes a unification of two widely-used empirical approaches: rule induction and instance-based learning. In the new algorithm, instances are treated as maximally specific rules, and classification is oerformed using a best-match strategy. Rules are learned by gradually generalizing instances until no improvement in apparent accuracy is obtained. Theoretical analysis shows this approach to be efficient. It is implemented in the RISE 3.1 system. In an extensive empirical study, RISE consistently achieves higher accuracies than state-of-the-art representatives of both its parent approaches (PEBLS and CN2), as well as a decision tree learner (C4.5). Lesion studies show that eachoof RISE's components is essential to this performance. Most significantly, in 14 of the 30 domains studied, RISE is more accurate than the best of PEBLS and CN2, showing that a significant synergy can be obtained by combining multiple empirical methods.  相似文献   

7.
An evolutionary approach for finding existing relationships among several variables of a multidimensional time series is presented in this work. The proposed model to discover these relationships is based on quantitative association rules. This algorithm, called QARGA (Quantitative Association Rules by Genetic Algorithm), uses a particular codification of the individuals that allows solving two basic problems. First, it does not perform a previous attribute discretization and, second, it is not necessary to set which variables belong to the antecedent or consequent. Therefore, it may discover all underlying dependencies among different variables. To evaluate the proposed algorithm three experiments have been carried out. As initial step, several public datasets have been analyzed with the purpose of comparing with other existing evolutionary approaches. Also, the algorithm has been applied to synthetic time series (where the relationships are known) to analyze its potential for discovering rules in time series. Finally, a real-world multidimensional time series composed by several climatological variables has been considered. All the results show a remarkable performance of QARGA.  相似文献   

8.
描述实时系统需求的模型   总被引:4,自引:0,他引:4  
本文提出一个描述实时系统需求的模型。在这个模型中,层次式有穷状态机械表示成规则和模板的形式,且一个模板对应于一个状态机。由于与状态机相关的规则和信息可被写入到模板中,故用此模型写出的需求规格说明书可由多个模板组成,而且易于理解和阅读。最后,本文讨论了此模型的特点。  相似文献   

9.
《Knowledge》2006,19(6):388-395
The objective of this study is to present a new algorithm, REX-1, developed for automatic knowledge acquisition in Inductive Learning. It aims at eliminating the pitfalls and disadvantages of the techniques and algorithms currently in use. The proposed algorithm makes use of the direct rule extraction approach, rather than the decision tree. For this purpose, it uses a set of examples to induce general rules. Using some widely used set of examples such as IRIS, Balance and Balloons, Monk, Splice, Promoter, Lenses, Zoo, and Vote, our algorithm is compared with other well-known algorithms such as ID3, C4.5, ILA, and Rules Family.  相似文献   

10.
首先对C4.5算法做了介绍,然后针对案例工程对算法提出一些优化措施,运用C4.5算法对案例市政道路工程工程量清单费用进行建模与分析研究,提取分类规则,最后随机验证分类规则的普遍适用性。通过随机验证可知该分类规则具有普遍适用性,可以帮助工程造价相关从业人员进行快速预测,提高决策分析的工作效率。  相似文献   

11.
Building a high accuracy classifier for classification is a problem in real applications. One high accuracy classifier used for this purpose is based on association rules. In the past, some researches showed that classification based on association rules (or class-association rules – CARs) has higher accuracy than that of other rule-based methods such as ILA and C4.5. However, mining CARs consumes more time because it mines a complete rule set. Therefore, improving the execution time for mining CARs is one of the main problems with this method that needs to be solved. In this paper, we propose a new method for mining class-association rule. Firstly, we design a tree structure for the storage frequent itemsets of datasets. Some theorems for pruning nodes and computing information in the tree are developed after that, and then, based on the theorems, we propose an efficient algorithm for mining CARs. Experimental results show that our approach is more efficient than those used previously.  相似文献   

12.
Coronary artery disease (CAD) is one of the major causes of mortality worldwide. Knowledge about risk factors that increase the probability of developing CAD can help to understand the disease better and assist in its treatment. Recently, modern computer‐aided approaches have been used for the prediction and diagnosis of diseases. Swarm intelligence algorithms like particle swarm optimization (PSO) have demonstrated great performance in solving different optimization problems. As rule discovery can be modelled as an optimization problem, it can be mapped to an optimization problem and solved by means of an evolutionary algorithm like PSO. An approach for discovering classification rules of CAD is proposed. The work is based on the real‐world CAD data set and aims at the detection of this disease by producing the accurate and effective rules. The proposed algorithm is a hybrid binary‐real PSO, which includes the combination of categorical and numerical encoding of a particle and a different approach for calculating the velocity of particles. The rules were developed from randomly generated particles, which take random values in the range of each attribute in the rule. Two different feature selection methods based on multi‐objective evolutionary search and PSO were applied on the data set, and the most relevant features were selected by the algorithms. The accuracy of two different rule sets were evaluated. The rule set with 11 features obtained more accurate results than the rule set with 13 features. Our results show that the proposed approach has the ability to produce effective rules with highest accuracy for the detection of CAD.  相似文献   

13.
杨萍  杨明  孙志挥 《计算机工程与应用》2003,39(13):204-205,211
Rough集理论提供了一种新的处理不精确、不完全与不相容知识的数学方法。从不一致决策表中快速而有效地挖掘出缺省规则是决策规则挖掘研究的一个热点。MDRBR算法采用单一的规则支持度阈值进行缺省规则的挖掘,这不利于有效地挖掘出用户感兴趣的缺省规则。为此,该文对MDRBR算法进行了改进,并提出了一种基于多重支持度的的缺省规则挖掘算法-MSMDRBR算法,MSMDRBR算法可依据多重支持度阈值合理地取舍决策规则,因而具有一定的实用意义。  相似文献   

14.
Inspired by the work of Brooks, many researchers involved in programming robots have turned to the behaviour-based approach. At present, the behaviours are designed by hand and hard-wired into the architecture. The work presented in this paper looks at using an evolutionary algorithm approach (based on the genetic algorithm) to construct behaviours. Building from well-defined primitive behaviours, hierarchies can be evolved to produce more complex behaviour. The behaviours in the evolutionary system are tested in simulation, but the best are then tested on a mobile robot for grounding in the real world. This allows the evolutionary process to rapidly drive the development of the behaviours using simulation while also ensuring their suitability in the real world. In the paper we show how this evolutionary process evolves practical hierarchical behaviours for the detection of a goal object in a series of mazes.  相似文献   

15.
介绍了在没有数据分布先验知识的情况下,用进化方法直接从训练数据中建立紧致模糊分类系统的方法。使用VISIT算法获取每个个体模糊系统,再用遗传算法从中搜索最优的模糊系统。规则和隶属函数是在进化过程中自动建立和优化的。为了同时有效地评价系统的精度和紧致性,用一个模糊专家系统作适应度函数。在2个基准分类问题上的实验结果表明了新方法的有效性。  相似文献   

16.
A genetic algorithm aiming the optimal design of composite structures under non-linear behaviour is presented. The approach addresses the optimal material/stacking sequence in laminate construction and material distribution topology in composite structures as a multimodal optimization problem. The proposed evolutionary process is based on a sequential hierarchical relation between subpopulations evolving in separated isolation stages followed by migration. Improvements based on the species conservation paradigm are performed to avoid genetic tendencies due to elitist strategies used in the hierarchical subpopulations. The concept of species is associated with material distribution topology in composite structures, and an enlarged master population with age structure is considered concurrently with the hierarchical topology. Rules based on species concept are imposed on either isolation or migration stages to overcome the predominance of a species and to guarantee the diversity. A mutation process controlled by the stress field is implemented, improving the local genetic search. The proposed model allows multiple solutions for the optimal design problem.  相似文献   

17.
《Intelligent Data Analysis》1998,2(1-4):165-185
Classification, which involves finding rules that partition a given dataset into disjoint groups, is one class of data mining problems. Approaches proposed so far for mining classification rules from databases are mainly decision tree based on symbolic learning methods. In this paper, we combine artificial neural network and genetic algorithm to mine classification rules. Some experiments have demonstrated that our method generates rules of better performance than the decision tree approach and the number of extracted rules is fewer than that of C4.5.  相似文献   

18.
针对模糊规则的自动获取一直是模糊系统的一个瓶颈问题,提出一种基于递阶结构的混合编码遗传算法与进化规划相结合的模糊加权神经网络学习新算法,利用该算法同时优化模糊加权神经网络的结构和参数,最后说明了从网络中提取模糊规则的方法,从而自动获得最优的模糊规则。分析和实验结果表明,本文方法在规则提取和分类准确性等方面比其他方法更好。  相似文献   

19.
One of the major challenges in the content-based information retrieval and machine learning techniques is to-build-the-so-called “semantic classifier” which is able to effectively and efficiently classify semantic concepts in a large database. This paper dealt with semantic image classification based on hierarchical Fuzzy Association Rules (FARs) mining in the image database. Intuitively, an association rule is a unique and significant combination of image features and a semantic concept, which determines the degree of correlation between features and concept. The main idea behind this approach is that any image visual concept has some associated features, so that, there are strong correlations between the concepts and their corresponding features. Regardless of the semantic gap, an image concept appears when the corresponding features emerge in an image and vice versa. Specially, this paper’s contribution was to propose a novel Fuzzy Association Rule for improving traditional association rules. Moreover, it was concerned with establishing a hierarchical fuzzy rule base in the training phase and setup corresponding fuzzy inference engine in order to classify images in the testing phase. The presented approach was independent from image segmentation and can be applied on multi-label images. Experimental results on a database of 6000 general-purpose images demonstrated the superiority of the proposed algorithm.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号