首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
孙娟  王熙照 《计算机工程》2006,32(12):210-211,231
决策树归纳学习算法是机器学习领域中解决分类问题的最有效工具之一。由于决策树算法自身的缺陷了,因此需要进行相应的简化来提高预测精度。模糊决策树算法是对决策树算法的一种改进,它更加接近人的思维方式。文章通过实验分析了模糊决策树、规则简化与模糊规则简化;模糊决策树与模糊预剪枝算法的异同,对决策树的大小、算法的训练准确率与测试准确率进行比较,分析了模糊决策树的性能,为改进该算法提供了一些有益的线索。  相似文献   

2.
不完备模糊系统的优势关系粗糙集与知识约简   总被引:1,自引:0,他引:1  
以不完备模糊决策系统为研究对象,根据拓展的优势关系,构建了粗糙模糊集模型,以获取不完备模糊决策系统中的"at least"和"atmost"决策规则.为了获取简化的"at least"和"at most"规则,在不完备模糊决策系统中,提出了两种相对约简(相对下近似约简与相对上近似约简)的概念,给出了求得这两种约简的判定定理及区分函数,并进行了实例分析.  相似文献   

3.
Multi-label learning originated from the investigation of text categorization problem, where each document may belong to several predefined topics simultaneously. In multi-label learning, the training set is composed of instances each associated with a set of labels, and the task is to predict the label sets of unseen instances through analyzing training instances with known label sets. In this paper, a multi-label lazy learning approach named ML-KNN is presented, which is derived from the traditional K-nearest neighbor (KNN) algorithm. In detail, for each unseen instance, its K nearest neighbors in the training set are firstly identified. After that, based on statistical information gained from the label sets of these neighboring instances, i.e. the number of neighboring instances belonging to each possible class, maximum a posteriori (MAP) principle is utilized to determine the label set for the unseen instance. Experiments on three different real-world multi-label learning problems, i.e. Yeast gene functional analysis, natural scene classification and automatic web page categorization, show that ML-KNN achieves superior performance to some well-established multi-label learning algorithms.  相似文献   

4.
Instance-Based Learning Algorithms   总被引:46,自引:1,他引:45  
Storing and using specific instances improves the performance of several supervised learning algorithms. These include algorithms that learn decision trees, classification rules, and distributed networks. However, no investigation has analyzed algorithms that use only specific instances to solve incremental learning tasks. In this paper, we describe a framework and methodology, called instance-based learning, that generates classification predictions using only specific instances. Instance-based learning algorithms do not maintain a set of abstractions derived from specific instances. This approach extends the nearest neighbor algorithm, which has large storage requirements. We describe how storage requirements can be significantly reduced with, at most, minor sacrifices in learning rate and classification accuracy. While the storage-reducing algorithm performs well on several real-world databases, its performance degrades rapidly with the level of attribute noise in training instances. Therefore, we extended it with a significance test to distinguish noisy instances. This extended algorithm's performance degrades gracefully with increasing noise levels and compares favorably with a noise-tolerant decision tree algorithm.  相似文献   

5.
Feature selection for multi-label naive Bayes classification   总被引:4,自引:0,他引:4  
In multi-label learning, the training set is made up of instances each associated with a set of labels, and the task is to predict the label sets of unseen instances. In this paper, this learning problem is addressed by using a method called Mlnb which adapts the traditional naive Bayes classifiers to deal with multi-label instances. Feature selection mechanisms are incorporated into Mlnb to improve its performance. Firstly, feature extraction techniques based on principal component analysis are applied to remove irrelevant and redundant features. After that, feature subset selection techniques based on genetic algorithms are used to choose the most appropriate subset of features for prediction. Experiments on synthetic and real-world data show that Mlnb achieves comparable performance to other well-established multi-label learning algorithms.  相似文献   

6.
Formal Concept Analysis of real set formal contexts is a generalization of classical formal contexts. By dividing the attributes into condition attributes and decision attributes, the notion of real decision formal contexts is introduced. Based on an implication mapping, problems of rule acquisition and attribute reduction of real decision formal contexts are examined. The extraction of “if–then” rules from the real decision formal contexts, and the approach to attribute reduction of the real decision formal contexts are discussed. By the proposed approach, attributes which are non-essential to the maximal s rules or l rules (to be defined later in the text) can be removed. Furthermore, discernibility matrices and discernibility functions for computing the attribute reducts of the real decision formal contexts are constructed to determine all attribute reducts of the real set formal contexts without affecting the results of the acquired maximal s rules or l rules.  相似文献   

7.
Multi-label learning deals with the problem where each instance is associated with multiple labels simultaneously. The task of this learning paradigm is to predict the label set for each unseen instance, through analyzing training instances with known label sets. In this paper, a neural network based multi-label learning algorithm named Ml-rbf is proposed, which is derived from the traditional radial basis function (RBF) methods. Briefly, the first layer of an Ml-rbf neural network is formed by conducting clustering analysis on instances of each possible class, where the centroid of each clustered groups is regarded as the prototype vector of a basis function. After that, second layer weights of the Ml-rbf neural network are learned by minimizing a sum-of-squares error function. Specifically, information encoded in the prototype vectors corresponding to all classes are fully exploited to optimize the weights corresponding to each specific class. Experiments on three real-world multi-label data sets show that Ml-rbf achieves highly competitive performance to other well-established multi-label learning algorithms.  相似文献   

8.
Since preference order is a crucial feature of data concerning decision situations, the classical rough set model has been generalized by replacing the indiscernibility relation with a dominance relation. The purpose of this paper is to further investigate the dominance-based rough set in incomplete interval-valued information system, which contains both incomplete and imprecise evaluations of objects. By considering three types of unknown values in the incomplete interval-valued information system, a data complement method is used to transform the incomplete interval-valued information system into a traditional one. To generate the optimal decision rules from the incomplete interval-valued decision system, six types of relative reducts are proposed. Not only the relationships between these reducts but also the practical approaches to compute these reducts are then investigated. Some numerical examples are employed to substantiate the conceptual arguments.  相似文献   

9.
属性约简自寻优算法   总被引:25,自引:1,他引:24  
属性约简是知识获取中的关键问题之一。为了能够较为有效地获得较优的属性约简,首先在粗糙集理论的基础上构造出了相对差异比较表,然后把它与启发性知识相结合分别设计出了3个算法:属性约简的改进算法(AR1),属性约简判定的完备算法(RJ)和属性约简的改进增强算法(AR2);接着,将这些算法作为子算法并吸收了基因算法的基本思想和模拟退火算法的具体操作,设计出了属性约简自寻优算法(ADSOA);最后,将该算法应用于中医类风湿关节炎诊断决策表的约简。实验结果表明,属性约简自寻优算法能够以较大的概率和较高的效率获得较优的属性约简,对于某些具体问题来说甚至能够获得最佳的属性约简;这也同时表明相对差异比较表的提出对于进一步构造效率更高的属性约简算法具有较大的实际意义。  相似文献   

10.
Rough set theory (RST) has been the subject of much study and numerous applications in many areas. However, most previous studies on rough sets have focused on finding rules where the decision attribute has a flat, rather than hierarchical structure. In practical applications, attributes are often organized hierarchically to represent general/specific meanings. This paper (1) determines the optimal decision attribute in a hierarchical level-search procedure, level by level, (2) merges the two stages, generating reducts and inducting decision rules, into a one-shot solution that reduces the need for memory space and the computational complexity and (3) uses a revised strength index to identify meaningful reducts and to improve their accuracy. The selection of a green fleet is used to validate the superiority of the proposed approach and its potential benefits to a decision-making process for transportation industry.  相似文献   

11.
12.
Incremental Induction of Decision Trees   总被引:36,自引:11,他引:25  
This article presents an incremental algorithm for inducing decision trees equivalent to those formed by Quinlan's nonincremental ID3 algorithm, given the same training instances. The new algorithm, named ID5R, lets one apply the ID3 induction process to learning tasks in which training instances are presented serially. Although the basic tree-building algorithms differ only in how the decision trees are constructed, experiments show that incremental training makes it possible to select training instances more carefully, which can result in smaller decision trees. The ID3 algorithm and its variants are compared in terms of theoretical complexity and empirical behavior.  相似文献   

13.
Attribute selection with fuzzy decision reducts   总被引:2,自引:0,他引:2  
Rough set theory provides a methodology for data analysis based on the approximation of concepts in information systems. It revolves around the notion of discernibility: the ability to distinguish between objects, based on their attribute values. It allows to infer data dependencies that are useful in the fields of feature selection and decision model construction. In many cases, however, it is more natural, and more effective, to consider a gradual notion of discernibility. Therefore, within the context of fuzzy rough set theory, we present a generalization of the classical rough set framework for data-based attribute selection and reduction using fuzzy tolerance relations. The paper unifies existing work in this direction, and introduces the concept of fuzzy decision reducts, dependent on an increasing attribute subset measure. Experimental results demonstrate the potential of fuzzy decision reducts to discover shorter attribute subsets, leading to decision models with a better coverage and with comparable, or even higher accuracy.  相似文献   

14.
This paper tackles the difficult but important task of objective algorithm performance assessment for optimization. Rather than reporting average performance of algorithms across a set of chosen instances, which may bias conclusions, we propose a methodology to enable the strengths and weaknesses of different optimization algorithms to be compared across a broader instance space. The results reported in a recent Computers and Operations Research paper comparing the performance of graph coloring heuristics are revisited with this new methodology to demonstrate (i) how pockets of the instance space can be found where algorithm performance varies significantly from the average performance of an algorithm; (ii) how the properties of the instances can be used to predict algorithm performance on previously unseen instances with high accuracy; and (iii) how the relative strengths and weaknesses of each algorithm can be visualized and measured objectively.  相似文献   

15.
In pattern classification problem, one trains a classifier to recognize future unseen samples using a training dataset. Practically, one should not expect the trained classifier could correctly recognize samples dissimilar to the training dataset. Therefore, finding the generalization capability of a classifier for those unseen samples may not help in improving the classifiers accuracy. The localized generalization error model was proposed to bound above the generalization mean square error for those unseen samples similar to the training dataset only. This error model is derived based on the stochastic sensitivity measure(ST-SM)of the classifiers. We present the ST-SMS for various Gaussian based classifiers: radial basis function neural networks and support vector machine in this paper. At the end of this work, we compare the decision boundaries visualization using the training samples yielding the largest sensitivity measures and the one using support vectors in the input space.  相似文献   

16.
Steganography algorithms recognition is a sub-section of steganalysis. Analysis shows when a steganalysis detector trained on one cover source is applied to images from an unseen source, generally the detection performance decreases. To tackle with this problem, this paper proposes a steganalytic scheme for steganography algorithms recognition. For a given testing image, a match image of the testing image is achieved. The match image is generated by performing a Gaussian filtering on the testing image to remove the possible stego signal. Then the match image is embedded in with recognized steganography algorithms. A CNN model trained on a training set is used to extract deep features from testing image and match images. Computing similarity between features with inner product operation or weighted-χ2, the final decision is made according to similarity between testing feature and each class of match feature. The proposed scheme can also detect steganography algorithms unknown in training set. Experiments show that, comparing with directly used CNN model, the proposed scheme achieves considerable improvement on testing accuracy when detecting images come from unseen source.  相似文献   

17.
This paper presents a novel host-based combinatorial method based on k-Means clustering and ID3 decision tree learning algorithms for unsupervised classification of anomalous and normal activities in computer network ARP traffic. The k-Means clustering method is first applied to the normal training instances to partition it into k clusters using Euclidean distance similarity. An ID3 decision tree is constructed on each cluster. Anomaly scores from the k-Means clustering algorithm and decisions of the ID3 decision trees are extracted. A special algorithm is used to combine results of the two algorithms and obtain final anomaly score values. The threshold rule is applied for making the decision on the test instance normality. Experiments are performed on captured network ARP traffic. Some anomaly criteria has been defined and applied to the captured ARP traffic to generate normal training instances. Performance of the proposed approach is evaluated using five defined measures and empirically compared with the performance of individual k-Means clustering and ID3 decision tree classification algorithms and the other proposed approaches based on Markovian chains and stochastic learning automata. Experimental results show that the proposed approach has specificity and positive predictive value of as high as 96 and 98%, respectively.  相似文献   

18.

在序决策信息系统中, 定义区间为支配一个特定的对象同时又被另一个特定的对象所支配的所有对象的集合. 以区间为基本知识颗粒, 建立新的优势关系粗糙集模型, 并由此获取决策值为特定区间范围的区间决策规则. 提出区间的约简的概念, 构造区分函数计算区间的约简, 并由此计算优化区间决策规则. 该方法比初始的优势关系粗糙集方法适应性更强, 且所得区间决策规则可直接应用于序信息系统的分类问题.

  相似文献   

19.
基于数据库系统的Rough集模型的扩展   总被引:1,自引:0,他引:1  
刘启和  陈雷霆  闵帆  蔡洪斌 《控制与决策》2006,21(12):1374-1378
针对基于数据库系统的Rough集模型中的知识约简算法对一致决策表适用,而对不一致决策表不适用的局限性,给出了将不一致决策表转换为一致决策表的算法,证明该算法能保持核和约简集合不变,并分析了该算法的时间复杂度,在此基础上,利用数据库系统的集合操作和SQL语言描述转换算法,将基于数据库系统的Rough集模型中的知识约筒算法扩展到不一致决策表.理论分析和实验结果表明,扩展后的算法仍是高效的.  相似文献   

20.
刘晓平 《计算机仿真》2006,23(4):103-105,113
数据挖掘是从大量原始数据中抽取隐藏知识的过程。大部分数据挖掘工具采用规则发现和决策树分类技术来发现数据模式和规则,其核心是归纳算法。与传统统计方法相比,基于机器学习技术得到的分类结果具有较好的可解释性。在针对特定的数据集进行数据挖掘时,如果缺乏相应的领域知识,用户或决策者就很难确定选择何种归纳算法。因此,需要尝试各种算法。借助MLC++,决策者能够轻而易举地比较不同分类算法对特定数据集的有效性,从而选择合适的分类算法。同时,系统开发人员也可以利用MLC++设计各种混合算法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号