首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 343 毫秒
1.
约束高斯分类网研究   总被引:1,自引:0,他引:1  
王双成  高瑞  杜瑞杰 《自动化学报》2015,41(12):2164-2176
针对基于一元高斯函数估计属性边缘密度的朴素贝叶斯分类器不能有效利 用属性之间的依赖信息和使用多元高斯函数估计属性联合密度的完全贝叶斯分类器 易于导致对数据的过度拟合而且高阶协方差矩阵的计算也非常困难等情况,在建立 属性联合密度分解与组合定理和属性条件密度计算定理的基础上,将朴素贝叶斯分类 器的属性选择、分类准确性标准和属性父结点的贪婪选择相结合,进行约束高斯 分类网学习与优化,并依据贝叶斯网络理论,对贝叶斯衍生分类器中属性为类提供 的信息构成进行分析.使用UCI数据库中连续属性分类数据进行实验,结果显示,经过 优化的约束高斯分类网具有良好的分类准确性.  相似文献   

2.

朴素贝叶斯分类器不能有效地利用属性之间的依赖信息, 而目前所进行的依赖扩展更强调效率, 使扩展后分类器的分类准确性还有待提高. 针对以上问题, 在使用具有平滑参数的高斯核函数估计属性密度的基础上, 结合分类器的分类准确性标准和属性父结点的贪婪选择, 进行朴素贝叶斯分类器的网络依赖扩展. 使用UCI 中的连续属性分类数据进行实验, 结果显示网络依赖扩展后的分类器具有良好的分类准确性.

  相似文献   

3.
朴素贝叶斯分类器具有很高的学习和分类效率,但不能充分利用属性变量之间的依赖信息.贝叶斯网络分类器具有很强的分类能力,但分类器学习比较复杂.本文建立广义朴素贝叶斯分类器,它具有灵活的分类能力选择方式、效率选择方式及学习方式,能够弥补朴素贝叶斯分类器和贝叶斯网络分类器的不足,并继承它们的优点.  相似文献   

4.
朴素贝叶斯分类器可以应用于岩性识别.该算法常使用高斯分布来拟合连续属性的概率分布,但是对于复杂的测井数据,高斯分布的拟合效果欠佳.针对该问题,提出基于EM算法的混合高斯概率密度估计.实验选取苏东41-33区块下古气井的测井数据作为训练样本,并选取44-45号井数据作为测试样本.实验采用基于EM算法的混合高斯模型来对测井数据变量进行概率密度估计,并将其应用到朴素贝叶斯分类器中进行岩性识别,最后用高斯分布函数的拟合效果作为对比.结果表明混合高斯模型具有更好的拟合效果,对于朴素贝叶斯分类器进行岩性识别的性能有不错的提升.  相似文献   

5.
The Bayesian classifier is a fundamental classification technique. In this work, we focus on programming Bayesian classifiers in SQL. We introduce two classifiers: Naive Bayes and a classifier based on class decomposition using K-means clustering. We consider two complementary tasks: model computation and scoring a data set. We study several layouts for tables and several indexing alternatives. We analyze how to transform equations into efficient SQL queries and introduce several query optimizations. We conduct experiments with real and synthetic data sets to evaluate classification accuracy, query optimizations, and scalability. Our Bayesian classifier is more accurate than Naive Bayes and decision trees. Distance computation is significantly accelerated with horizontal layout for tables, denormalization, and pivoting. We also compare Naive Bayes implementations in SQL and C++: SQL is about four times slower. Our Bayesian classifier in SQL achieves high classification accuracy, can efficiently analyze large data sets, and has linear scalability.  相似文献   

6.
一种限定性的双层贝叶斯分类模型   总被引:29,自引:1,他引:28  
朴素贝叶斯分类模型是一种简单而有效的分类方法,但它的属性独立性假设使其无法表达属性变量间存在的依赖关系,影响了它的分类性能.通过分析贝叶斯分类模型的分类原则以及贝叶斯定理的变异形式,提出了一种基于贝叶斯定理的新的分类模型DLBAN(double-level Bayesian network augmented naive Bayes).该模型通过选择关键属性建立属性之间的依赖关系.将该分类方法与朴素贝叶斯分类器和TAN(tree augmented naive Bayes)分类器进行实验比较.实验结果表明,在大多数数据集上,DLBAN分类方法具有较高的分类正确率.  相似文献   

7.
朴素贝叶斯分类器是一种简单而高效的分类器,但是其属性独立性假设限制了对实际数据的应用。提出一种新的算法,该算法为避免数据预处理时,训练集的噪声及数据规模使属性约简的效果不太理想,并进而影响分类效果,在训练集上通过随机属性选取生成若干属性子集,并以这些子集构建相应的贝叶斯分类器,进而采用遗传算法进行优选。实验表明,与传统的朴素贝叶斯方法相比,该方法具有更好的分类精度。  相似文献   

8.
一种基于假设检验的贝叶斯分类器   总被引:1,自引:0,他引:1  
分类是数据挖掘领域的重要分支,而贝叶斯分类方法作为分类领域的重要技术得到了日益广泛的研究和应用。限制性贝叶斯网络在不牺牲太多精确性的前提下简化网络结构,是近几年分类领域的研究热点。论文采用统计学中理论较成熟的体积假设检验(Volume Testing)方法寻找属性间的依赖关系,同时结合假设检验的思想和朴素贝叶斯分类算法的优点构造限制性贝叶斯网络,提出了一种基于假设检验的贝叶斯分类算法,并命名为基于体积检验的贝叶斯分类算法。在Weka系统下进行的实验,结果表明,这种方法效果优于朴素贝叶斯方法、TAN算法等,尤其对大数据集有更佳的表现效果。  相似文献   

9.
用Matlab语言建构贝叶斯分类器   总被引:2,自引:1,他引:2  
文本分类是文本挖掘的基础与核心,分类器的构建是文本分类的关键,利用贝叶斯网络可以构造出分类性能较好的分类器。文中利用Matlab构造出了两种分类器:朴素贝叶斯分类器NBC,用互信息测度和条件互信息测度构建了TANC。用UCI上下载的标准数据集验证所构造的分类器,实验结果表明,所建构的几种分类器的性能总体比文献中列的高些,从而表明所建立的分类器的有效性和正确性。笔者对所建构的分类器进行优化并应用于文本分类中。  相似文献   

10.
朴素贝叶斯算法(NB)在处理分类问题时通常假设训练样本的数值型连续属性满足正态分布,其分类精度也受到训练数据完整性的影响,而实际采样数据很难满足上述要求。针对数据缺失问题,基于期望最大值算法(EM),将朴素贝叶斯分类器利用已有的不完整数据进行参数学习;针对样本数值型连续属性非正态分布的情况,基于核密度估计,利用其分布密度(Distribution Density)和新的分析计算方法来求最大后验分布,同时用标准数据集的分类实验验证了改进的有效性。将改良的算法EM-DNB应用在生物工程蛋白质纯化工艺预测中,实验结果表明,预测精度有所提高。  相似文献   

11.
Within the framework of Bayesian networks (BNs), most classifiers assume that the variables involved are of a discrete nature, but this assumption rarely holds in real problems. Despite the loss of information discretization entails, it is a direct easy-to-use mechanism that can offer some benefits: sometimes discretization improves the run time for certain algorithms; it provides a reduction in the value set and then a reduction in the noise which might be present in the data; in other cases, there are some Bayesian methods that can only deal with discrete variables. Hence, even though there are many ways to deal with continuous variables other than discretization, it is still commonly used. This paper presents a study of the impact of using different discretization strategies on a set of representative BN classifiers, with a significant sample consisting of 26 datasets. For this comparison, we have chosen Naive Bayes (NB) together with several other semi-Naive Bayes classifiers: Tree-Augmented Naive Bayes (TAN), k-Dependence Bayesian (KDB), Aggregating One-Dependence Estimators (AODE) and Hybrid AODE (HAODE). Also, we have included an augmented Bayesian network created by using a hill climbing algorithm (BNHC). With this comparison we analyse to what extent the type of discretization method affects classifier performance in terms of accuracy and bias-variance discretization. Our main conclusion is that even if a discretization method produces different results for a particular dataset, it does not really have an effect when classifiers are being compared. That is, given a set of datasets, accuracy values might vary but the classifier ranking is generally maintained. This is a very useful outcome, assuming that the type of discretization applied is not decisive future experiments can be d times faster, d being the number of discretization methods considered.  相似文献   

12.
通过对朴素贝叶斯(NBC)分类器与传统的基于树扩展的贝叶斯(TAN)分类器的分析,对TAN分类器进行改进,提出CTAN分类器。朴素贝叶斯分类器对非类属性独立性进行完全独立假设,传统TAN则弱化所有属性的独立性.提出的CTAN则是通过操作TAN保留对数对部分相关属性有选择的进行弱化。CTAN改进的方向主要是对属性关系树的部分利用,通过实验证明,CTAN要优于传统TAN分类器。  相似文献   

13.
对金融客户进行准确分类是向其提供个性化服务的前提.针对某金融产品的销售需求,通过在线推销测试收集客户样本数据,并根据用户反馈标注样本.通过构造概率分布函数、离散化连续型数据两种方式构建贝叶斯分类器.利用交叉检验训练和测试分类算法,发现朴素贝叶斯分类算法性能优于高斯贝叶斯算法和逻辑回归算法.离散化连续型数据过程中结合分类偏好进行数据过滤,实验证明,异常数据滤除率参数对客户分类算法的准确性有显著影响,通过恰当设置该参数的取值,可以调节分类算法的分类偏好.方法对于提升金融产品销售效率,降低营销成本有参考价值.  相似文献   

14.
Bayesian networks are important knowledge representation tools for handling uncertain pieces of information. The success of these models is strongly related to their capacity to represent and handle dependence relations. Some forms of Bayesian networks have been successfully applied in many classification tasks. In particular, naive Bayes classifiers have been used for intrusion detection and alerts correlation. This paper analyses the advantage of adding expert knowledge to probabilistic classifiers in the context of intrusion detection and alerts correlation. As examples of probabilistic classifiers, we will consider the well-known Naive Bayes, Tree Augmented Naïve Bayes (TAN), Hidden Naive Bayes (HNB) and decision tree classifiers. Our approach can be applied for any classifier where the outcome is a probability distribution over a set of classes (or decisions). In particular, we study how additional expert knowledge such as “it is expected that 80 % of traffic will be normal” can be integrated in classification tasks. Our aim is to revise probabilistic classifiers’ outputs in order to fit expert knowledge. Experimental results show that our approach improves existing results on different benchmarks from intrusion detection and alert correlation areas.  相似文献   

15.
研究了非监督学习Na(i)ve Bayes分类的原理和方法,并将其应用到文本数据--网络安全审计数据的分析中.为了提高分类准确率,根据分类的效果对数据的属性集进行选择,使用能提高分类准确性的属性作为分类的依据.对KDD CUP99数据集进行了基于不同属性集的实验,发现了与分类结果相关的属性,分类效果良好.  相似文献   

16.
研究了非监督学习Na?觙ve Bayes分类的原理和方法,并将其应用到文本数据——网络安全审计数据的分析中。为了提高分类准确率,根据分类的效果对数据的属性集进行选择,使用能提高分类准确性的属性作为分类的依据。对KDD CUP99数据集进行了基于不同属性集的实验,发现了与分类结果相关的属性,分类效果良好。  相似文献   

17.
研究了非监督学习Nave Bayes分类的原理和方法,并将其应用到文本数据——网络安全审计数据的分析中。为了提高分类准确率,根据分类的效果对数据的属性集进行选择,使用能提高分类准确性的属性作为分类的依据。对KDDCUP99数据集进行了基于不同属性集的实验,发现了与分类结果相关的属性,分类效果良好。  相似文献   

18.
Bayesian Network Classifiers   总被引:154,自引:0,他引:154  
Friedman  Nir  Geiger  Dan  Goldszmidt  Moises 《Machine Learning》1997,29(2-3):131-163
Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among features, called naive Bayes, is competitive with state-of-the-art classifiers such as C4.5. This fact raises the question of whether a classifier with less restrictive assumptions can perform even better. In this paper we evaluate approaches for inducing classifiers from data, based on the theory of learning Bayesian networks. These networks are factored representations of probability distributions that generalize the naive Bayesian classifier and explicitly represent statements about independence. Among these approaches we single out a method we call Tree Augmented Naive Bayes (TAN), which outperforms naive Bayes, yet at the same time maintains the computational simplicity (no search involved) and robustness that characterize naive Bayes. We experimentally tested these approaches, using problems from the University of California at Irvine repository, and compared them to C4.5, naive Bayes, and wrapper methods for feature selection.  相似文献   

19.
Since naïve Bayesian classifiers are suitable for processing discrete attributes, many methods have been proposed for discretizing continuous ones. However, none of the previous studies apply more than one discretization method to the continuous attributes in a data set for naïve Bayesian classifiers. Different approaches employ different information embedded in continuous attributes to determine the boundaries for discretization. It is likely that discretizing the continuous attributes in a data set using different methods can utilize the information embedded in the attributes more thoroughly and thus improve the performance of naïve Bayesian classifiers. In this study, we propose a nonparametric measure to evaluate the dependence level between a continuous attribute and the class. The nonparametric measure is then used to develop a hybrid method for discretizing continuous attributes so that the accuracy of the naïve Bayesian classifier can be enhanced. This hybrid method is tested on 20 data sets, and the results demonstrate that discretizing the continuous attributes in a data set by various methods can generally have a higher prediction accuracy.  相似文献   

20.
The Naive Bayes classifier is a popular classification technique for data mining and machine learning. It has been shown to be very effective on a variety of data classification problems. However, the strong assumption that all attributes are conditionally independent given the class is often violated in real-world applications. Numerous methods have been proposed in order to improve the performance of the Naive Bayes classifier by alleviating the attribute independence assumption. However, violation of the independence assumption can increase the expected error. Another alternative is assigning the weights for attributes. In this paper, we propose a novel attribute weighted Naive Bayes classifier by considering weights to the conditional probabilities. An objective function is modeled and taken into account, which is based on the structure of the Naive Bayes classifier and the attribute weights. The optimal weights are determined by a local optimization method using the quasisecant method. In the proposed approach, the Naive Bayes classifier is taken as a starting point. We report the results of numerical experiments on several real-world data sets in binary classification, which show the efficiency of the proposed method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号