首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

It is known that Naive Bayesian classifier (NB) works very well on some domains, and poorly on others. The performance of NB suffers in domains that involve correlated features. C4.5 decision trees, on the other hand, typically perform better than the Naive Bayesian algorithm on such domains. This paper describes a Selective Bayesian classifier (SBC) that simply uses only those features that C4.5 would use in its decision tree when learning a small example of a training set, a combination of the two different natures of classifiers. Experiments conducted on ten data sets indicate that SBC performs markedly better than NB on all domains, and SBC outperforms C4.5 on many data sets of which C4.5 outperform NB. Augmented Bayesian classifier (ABC) is also tested on the same data, and SBC appears to perform as well as ABC. SBC also can eliminate, in most cases, more than half of the original attributes, which can greatly reduce the size of the training and test data as well as the running time. Further, the SBC algorithm typically learns faster than both C4.5 and NB, needing fewer training examples to reach a high accuracy of classifications.  相似文献   

2.
属性加权的朴素贝叶斯集成分类器   总被引:2,自引:1,他引:1  
为提高朴素贝叶斯分类器的分类精度和泛化能力,提出了基于属性相关性的加权贝叶斯集成方法(WEBNC)。根据每个条件属性与决策属性的相关度对其赋以相应的权值,然后用AdaBoost训练属性加权后的BNC。该分类方法在16个UCI标准数据集上进行了测试,并与BNC、贝叶斯网和由AdaBoost训练出的BNC进行比较,实验结果表明,该分类器具有更高的分类精度与泛化能力。  相似文献   

3.
Lazy Learning of Bayesian Rules   总被引:19,自引:0,他引:19  
The naive Bayesian classifier provides a simple and effective approach to classifier learning, but its attribute independence assumption is often violated in the real world. A number of approaches have sought to alleviate this problem. A Bayesian tree learning algorithm builds a decision tree, and generates a local naive Bayesian classifier at each leaf. The tests leading to a leaf can alleviate attribute inter-dependencies for the local naive Bayesian classifier. However, Bayesian tree learning still suffers from the small disjunct problem of tree learning. While inferred Bayesian trees demonstrate low average prediction error rates, there is reason to believe that error rates will be higher for those leaves with few training examples. This paper proposes the application of lazy learning techniques to Bayesian tree induction and presents the resulting lazy Bayesian rule learning algorithm, called LBR. This algorithm can be justified by a variant of Bayes theorem which supports a weaker conditional attribute independence assumption than is required by naive Bayes. For each test example, it builds a most appropriate rule with a local naive Bayesian classifier as its consequent. It is demonstrated that the computational requirements of LBR are reasonable in a wide cross-section of natural domains. Experiments with these domains show that, on average, this new algorithm obtains lower error rates significantly more often than the reverse in comparison to a naive Bayesian classifier, C4.5, a Bayesian tree learning algorithm, a constructive Bayesian classifier that eliminates attributes and constructs new attributes using Cartesian products of existing nominal attributes, and a lazy decision tree learning algorithm. It also outperforms, although the result is not statistically significant, a selective naive Bayesian classifier.  相似文献   

4.
Ying Yu  Hao Huang 《Expert Systems》2022,39(1):e12821
With the objective to automatically detect diseases from symptoms in free-text data, a methodology to extract symptom-diagnosis knowledge from online medical textual data in Q&A domain is proposed in this paper: (1) a term frequency-inverse document frequency and PRECISION method is adopted to retrieve symptom words from unstructured text; (2) a variable precision rough set based genetic algorithm is applied to reduce redundant symptom words, and a rough set based rule is utilized for adding discriminative symptom words assisting to discriminate diseases sharing similar symptoms; (3) by employing fuzzy linguistic variables to express the risk level of disease or severity level of symptoms, a knowledge base with fuzzy belief structure is generated. Using data extracted from a Chinese medical Q&A forum for training and testing, some classical gastrointestinal diseases serve as a case study to evaluate the efficiency of the proposed methodology. Subsequently performance comparisons are made between the proposed methodology and some other classifiers, such as the decision tree algorithms including ID3 and J45, and the Bayesian network classifier. The comparative results demonstrate that the proposed methodology outperforms the decision tree algorithms and the Bayesian network classifier.  相似文献   

5.
Bayesian networks are important knowledge representation tools for handling uncertain pieces of information. The success of these models is strongly related to their capacity to represent and handle dependence relations. Some forms of Bayesian networks have been successfully applied in many classification tasks. In particular, naive Bayes classifiers have been used for intrusion detection and alerts correlation. This paper analyses the advantage of adding expert knowledge to probabilistic classifiers in the context of intrusion detection and alerts correlation. As examples of probabilistic classifiers, we will consider the well-known Naive Bayes, Tree Augmented Naïve Bayes (TAN), Hidden Naive Bayes (HNB) and decision tree classifiers. Our approach can be applied for any classifier where the outcome is a probability distribution over a set of classes (or decisions). In particular, we study how additional expert knowledge such as “it is expected that 80 % of traffic will be normal” can be integrated in classification tasks. Our aim is to revise probabilistic classifiers’ outputs in order to fit expert knowledge. Experimental results show that our approach improves existing results on different benchmarks from intrusion detection and alert correlation areas.  相似文献   

6.
针对朴素贝叶斯分类的属性独立性假设的不足,讨论了相关性及多变量相关的概念,给出词间相关度的定义。在TAN分类器的词间相关性分析基础上,提出一种文档特征词相关度估计公式及其在改进朴素贝叶斯分类模型中应用的算法,在Reuters-21578文本数据集上的实验表明,改进算法简单易行,能有效改进贝叶斯分类性能。  相似文献   

7.
为了提高Stacking集成算法的分类性能,充分利用Stacking学习机制产生的先验信息和贝叶斯网络丰富的概率表达能力,提出一种基于属性值加权朴素贝叶斯算法的Stacking集成分类算法AVWNB-Stacking(Stac-king based Attribute Value Weight Naive Bayes)...  相似文献   

8.
阐明决策树分类器在用于分类的数据挖掘技术中依然重要,论述基于决策树归纳分类的ID3、C4.5算法,并且对决策属性的选取法则进行说明。通过实例解析ID3、C4.5算法实现过程,结果表明C4.5算法相比较于ID3算法的优越性.尤其在处理具有多属性值的数据时的更加合理和正确。  相似文献   

9.
This paper presents a novel algorithm named ID6NB for extending decision tree induced by Quinlan’s non-incremental ID3 algorithm. The presented approach is aimed at suggesting the solutions for few unhandled exceptions of the Decision tree induction algorithms such as (i) the situation in which the majority voting makes incorrect decision (generating two different types of rules for same data), and (ii) in case of dimensionality reduction by decision tree induction algorithms, the determination of appropriate attribute at a node where two or more attributes have equal highest information gain. Exception due to majority voting is handled with the help of Naive Bayes algorithm and also novel solutions are given for dimensionality reduction. As a result, the classification accuracy has drastically improved. An extensive experimental evaluation on a number of real and synthetic databases shows that ID6NB is a state-of-the-art classification algorithm that outperforms well than other methods of decision tree learning.  相似文献   

10.
针对数据集中无关的、干扰的属性会降低决策树算法性能的问题,提出了一个新的决策树算法,此算法根据对测试属性进行约简选择,提出以测试属性和决策属性的相似性作为决策树的启发规则来构建决策树,同时使用了分类阈值设定方法简化决策树的生成过程.实验证明,该算法运行效率和预测精度都优于传统的ID3算法.  相似文献   

11.
针对数据集中无关的、干扰的属性会降低决策树算法性能的问题,提出了一个新的决策树算法,此算法根据对测试属性进行约简选择,提出以测试属性和决策属性的相似性作为决策树的启发规则来构建决策树,同时使用了分类阈值设定方法简化决策树的生成过程。实验证明,该算法运行效率和预测精度都优于传统的ID3算法。  相似文献   

12.
沈思倩  毛宇光  江冠儒 《计算机科学》2017,44(6):139-143, 149
主要研究在对不完全数据集进行决策树分析时,如何加入差分隐私保护技术。首先简单介绍了差分隐私ID3算法和差分隐私随机森林决策树算法;然后针对上述算法存在的缺陷和不足进行了修改,提出指数机制的差分隐私随机森林决策树算法;最后对于不完全数据集提出了一种新的WP(Weight Partition)缺失值处理方法,能够在不需要插值的情况下,使决策树分析算法既能满足差分隐私保护,也能拥有更高的预测准确率和适应性。实验证明,无论是Laplace机制还是指数机制,无论是ID3算法还是随机森林决策树算法,都能适用于所提方法。  相似文献   

13.
一种限定性的双层贝叶斯分类模型   总被引:28,自引:1,他引:28  
朴素贝叶斯分类模型是一种简单而有效的分类方法,但它的属性独立性假设使其无法表达属性变量间存在的依赖关系,影响了它的分类性能.通过分析贝叶斯分类模型的分类原则以及贝叶斯定理的变异形式,提出了一种基于贝叶斯定理的新的分类模型DLBAN(double-level Bayesian network augmented naive Bayes).该模型通过选择关键属性建立属性之间的依赖关系.将该分类方法与朴素贝叶斯分类器和TAN(tree augmented naive Bayes)分类器进行实验比较.实验结果表明,在大多数数据集上,DLBAN分类方法具有较高的分类正确率.  相似文献   

14.
基于朴素贝叶斯与ID3算法的决策树分类   总被引:2,自引:0,他引:2       下载免费PDF全文
v在朴素贝叶斯算法和ID3算法的基础上,提出一种改进的决策树分类算法。引入客观属性重要度参数,给出弱化的朴素贝叶斯条件独立性假设,并采用加权独立信息熵作为分类属性的选取标准。理论分析和实验结果表明,改进算法能在一定程度上克服ID3算法的多值偏向问题,并且具有较高的执行效率和分类准确度。  相似文献   

15.
Abstract: The cost of a future exploitation of a decision support system plays a key role. The paper deals with the problem of feature value acquisition cost for such systems. We present a modification of a cost‐sensitive learning method for decision‐tree induction with fixed attribute acquisition cost limit. Properties of the concept are established during computer experiments conducted on chosen benchmark databases from the UCI Machine Learning Repository and a real medical decision task. The results of experiments confirm that, for some decision problems, our proposition allows us to obtain a classifier with the same quality as a classifier obtained without cost limit but its exploitation is cheaper.  相似文献   

16.
决策树是数据挖掘的一种重要方法,通常用来形成分类器和预测模型。ID3算法作为决策树的核心算法,由于它的简单与高效而得到了广泛的应用,然而它倾向于选择属性值较多的属性作为分支属性,从而可能错过分类能力强的属性。对ID3算法的分支策略进行改进,增加了对属性的类区分度的考量。经实验比较,新方法能提高决策树的精度,简化决策树。  相似文献   

17.
Abstract

Although successful in medical diagnostic problems, inductive learning systems were not widely accepted in medical practice. In this paper two different approaches to machine learning in medical applications are compared: the system for inductive learning of decision trees Assistant, and the naive Bayesian classifier. Both methodologies were tested in four medical diagnostic problems: localization of primary tumor, prognostics of recurrence of breast cancer, diagnosis of thyroid diseases, and rheumatology. The accuracy of automatically acquired diagnostic knowledge from stored data records is compared, and the interpretation of the knowledge and the explanation ability of the classification process of each system is discussed. Surprisingly, the naive Bayesian classifier is superior to Assistant in classification accuracy and explanation ability, while the interpretation of the acquired knowledge seems to be equally valuable. In addition, two extensions to naive Bayesian classifier are briefly described: dealing with continuous attributes, and discovering the dependencies among attributes.  相似文献   

18.
Three naive Bayes approaches for discrimination-free classification   总被引:2,自引:1,他引:1  
In this paper, we investigate how to modify the naive Bayes classifier in order to perform classification that is restricted to be independent with respect to a given sensitive attribute. Such independency restrictions occur naturally when the decision process leading to the labels in the data-set was biased; e.g., due to gender or racial discrimination. This setting is motivated by many cases in which there exist laws that disallow a decision that is partly based on discrimination. Naive application of machine learning techniques would result in huge fines for companies. We present three approaches for making the naive Bayes classifier discrimination-free: (i) modifying the probability of the decision being positive, (ii) training one model for every sensitive attribute value and balancing them, and (iii) adding a latent variable to the Bayesian model that represents the unbiased label and optimizing the model parameters for likelihood using expectation maximization. We present experiments for the three approaches on both artificial and real-life data.  相似文献   

19.
决策树算法是数据挖掘中重要的分类算法。目前,已有许多构建决策树的算法,其中,ID3算法是核心算法。本文首先对ID3算法进行研究与分析,针对计算属性的信息熵十分复杂的缺点,提出了一种新的启发式算法SID3,它是基于属性对分类的敏感度的。文章最后通过实例对两种算法进行比较分析,结果表明,SID3算法能够生成正确的决策树,并且使建树过程更简便,更快速。  相似文献   

20.
一种新的基于属性—值对的决策树归纳算法   总被引:6,自引:1,他引:5  
决策树归纳算法ID3是实例学习中具有代表性的学习方法。文中针对ID3易偏向于值数较多属性的缺陷,提出一种新的基于属性-值对的决策树归纳算法AVPI,它所产生的决策树大小及测试速度均优于ID3。该算法应用于色彩匹配系统,取得了较好效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号