首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Due to its simplicity, efficiency and efficacy, naive Bayes (NB) continues to be one of the top 10 data mining algorithms. A mass of improved approaches to NB have been proposed to weaken its conditional independence assumption. However, there has been little work, up to the present, on instance weighting filter approaches to NB. In this paper, we propose a simple, efficient, and effective instance weighting filter approach to NB. We call it attribute (feature) value frequency-based instance weighting and denote the resulting improved model as attribute value frequency weighted naive Bayes (AVFWNB). In AVFWNB, the weight of each training instance is defined as the inner product of its attribute value frequency vector and the attribute value number vector. The experimental results on 36 widely used classification problems show that AVFWNB significantly outperforms NB, yet at the same time maintains the computational simplicity that characterizes NB.  相似文献   

2.
Traditional classification algorithms require a large number of labelled examples from all the predefined classes, which is generally difficult and time-consuming to obtain. Furthermore, data uncertainty is prevalent in many real-world applications, such as sensor network, market analysis and medical diagnosis. In this article, we explore the issue of classification on uncertain data when only positive and unlabelled examples are available. We propose an algorithm to build naive Bayes classifier from positive and unlabelled examples with uncertainty. However, the algorithm requires the prior probability of positive class, and it is generally difficult for the user to provide this parameter in practice. Two approaches are proposed to avoid this user-specified parameter. One approach is to use a validation set to search for an appropriate value for this parameter, and the other is to estimate it directly. Our extensive experiments show that the two approaches can basically achieve satisfactory classification performance on uncertain data. In addition, our algorithm exploiting uncertainty in the dataset can potentially achieve better classification performance comparing to traditional naive Bayes which ignores uncertainty when handling uncertain data.  相似文献   

3.
朴素Bayes分类器是一种简单有效的机器学习工具.本文用朴素Bayes分类器的原理推导出\"朴素Bayes组合\"公式,并构造相应的分类器.经过测试,该分类器有较好的分类性能和实用性,克服了朴素Bayes分类器精确度差的缺点,并且比其他分类器更加快速而不会显著丧失精确度.  相似文献   

4.
    
Due to being fast, easy to implement and relatively effective, some state-of-the-art naive Bayes text classifiers with the strong assumption of conditional independence among attributes, such as multinomial naive Bayes, complement naive Bayes and the one-versus-all-but-one model, have received a great deal of attention from researchers in the domain of text classification. In this article, we revisit these naive Bayes text classifiers and empirically compare their classification performance on a large number of widely used text classification benchmark datasets. Then, we propose a locally weighted learning approach to these naive Bayes text classifiers. We call our new approach locally weighted naive Bayes text classifiers (LWNBTC). LWNBTC weakens the attribute conditional independence assumption made by these naive Bayes text classifiers by applying the locally weighted learning approach. The experimental results show that our locally weighted versions significantly outperform these state-of-the-art naive Bayes text classifiers in terms of classification accuracy.  相似文献   

5.
由于朴素贝叶斯算法的特征独立性假设以及传统TFIDF加权算法仅仅考虑了特征在整个训练集的分布情况,忽略了特征与类别和文档之间关系,造成传统方法赋予特征的权重并不能代表其准确性.针对以上问题,提出了二维信息增益加权的朴素贝叶斯分类算法,进一步考虑到了特征的二维信息增益即特征类别信息增益和特征文档信息增益对分类效果的影响,并设计实验与传统的加权朴素贝叶斯算法相比,该算法在查准率、召回率、F1值指标性能上能提升6%左右.  相似文献   

6.
一种限定性的双层贝叶斯分类模型   总被引:28,自引:1,他引:28       下载免费PDF全文
朴素贝叶斯分类模型是一种简单而有效的分类方法,但它的属性独立性假设使其无法表达属性变量间存在的依赖关系,影响了它的分类性能.通过分析贝叶斯分类模型的分类原则以及贝叶斯定理的变异形式,提出了一种基于贝叶斯定理的新的分类模型DLBAN(double-level Bayesian network augmented naive Bayes).该模型通过选择关键属性建立属性之间的依赖关系.将该分类方法与朴素贝叶斯分类器和TAN(tree augmented naive Bayes)分类器进行实验比较.实验结果表明,在大多数数据集上,DLBAN分类方法具有较高的分类正确率.  相似文献   

7.
个人位置信息是一种物理隐私信息,敌手可以根据背景知识获取用户的真实身份.为了分析位置服务的用户隐私问题,建模了敌手进行身份推理攻击的过程,并提出了一种根据个人位置信息测量身份泄露的贝叶斯推理方法.通过对比观测的位置信息与背景知识数据库的匹配程度,该方法能重新识别用户真实身份.实验采用了真实路网的数据集,结果显示不可信LBS通过收集查询请求能以很高的概率确定用户真实身份.研究表明高精度的个人位置信息泄露导致很高的身份隐私风险.  相似文献   

8.
The RELIEF algorithm is a popular approach for feature weighting. Many extensions of the RELIEF algorithm are developed, and I-RELIEF is one of the famous extensions. In this paper, I-RELIEF is generalized for supervised distance metric learning to yield a Mahananobis distance function. The proposed approach is justified by showing that the objective function of the generalized I-RELIEF is closely related to the expected leave-one-out nearest-neighbor classification rate. In addition, the relationships among the generalized I-RELIEF, the neighbourhood components analysis, and graph embedding are also pointed out. Experimental results on various data sets all demonstrate the superiority of the proposed approach.  相似文献   

9.
The positive unlabeled learning term refers to the binary classification problem in the absence of negative examples. When only positive and unlabeled instances are available, semi-supervised classification algorithms cannot be directly applied, and thus new algorithms are required. One of these positive unlabeled learning algorithms is the positive naive Bayes (PNB), which is an adaptation of the naive Bayes induction algorithm that does not require negative instances. In this work we propose two ways of enhancing this algorithm. On one hand, we have taken the concept behind PNB one step further, proposing a procedure to build more complex Bayesian classifiers in the absence of negative instances. We present a new algorithm (named positive tree augmented naive Bayes, PTAN) to obtain tree augmented naive Bayes models in the positive unlabeled domain. On the other hand, we propose a new Bayesian approach to deal with the a priori probability of the positive class that models the uncertainty over this parameter by means of a Beta distribution. This approach is applied to both PNB and PTAN, resulting in two new algorithms. The four algorithms are empirically compared in positive unlabeled learning problems based on real and synthetic databases. The results obtained in these comparisons suggest that, when the predicting variables are not conditionally independent given the class, the extension of PNB to more complex networks increases the classification performance. They also show that our Bayesian approach to the a priori probability of the positive class can improve the results obtained by PNB and PTAN.  相似文献   

10.
In this article, we introduce a personalized counseling system based on context mining. As a technique for context mining, we have developed an algorithm called CANSY. It adopts trained neural networks for feature weighting and a value difference metric in order to measure distances between all possible values of symbolic features. CANSY plays a core role in classifying and presenting most similar cases from a case base. Experimental results show that CANSY along with a rule base can provide personalized information with a relatively high level of accuracy, and it is capable of recommending appropriate products or services. An erratum to this article can be found at  相似文献   

11.
王峻 《微机发展》2007,17(2):205-207
朴素贝叶斯分类模型一种简单而高效的分类模型,但它的条件独立性假设使其无法将属性间的依赖表达出来,影响了它分类的正确率。属性间的依赖关系与属性本身的特性有关,有些属性的特性决定了其他属性必然依赖于它,即强属性。文中通过分析属性相关性的度量和贝叶斯定理的变形公式,介绍了强属性的选择方法,通过在强弱属性之间添加增强弧以弱化朴素贝叶斯的独立性假设,扩展了朴素贝叶斯分类模型的结构。在此基础上提出一种基于强属性限定的贝叶斯分类模型SANBC。实验结果表明,与朴素贝叶斯分类模型相比,SANBC分类模型具有较高的分类正确率。  相似文献   

12.
有监督学习算法是机器学习中的一类重要算法,该类算法要求外界提供含监督信号的样本作为训练数据。虽然机器学习领域提供了许多基准测试数据,但很多情况下需要自己生成训练样本。给出了一种交互式训练样本获取方法:通过对原始图像进行一种或多种混合的随机变换,用户挑选那些能被人眼识别的样本作为有效样本加以保存。实验结果表明,所提方法产生的图片能模拟摄像头在不同角度、姿态、光照、遮挡等各种复杂场景下拍摄的图像的效果。用系统生成的训练样本训练朴素贝叶斯(NB)分类器,能达到95.042%的识别精度,结果优于UCI人工字符集训练同样的NB分类器时88.4875%的识别精度。  相似文献   

13.
基于朴素贝叶斯分类器邮件分类系统的改进   总被引:1,自引:0,他引:1  
目前朴素贝叶斯分类方法在电子邮件分类起到了良好的效果,但是并不能100%区分垃圾邮件与非垃圾邮件,然而在商业应用中,我们不能遗漏任何一封重要邮件。本文先简单介绍Bayes方法,然后提出一种对目前的Bayes分类方法的改进思想和方法。  相似文献   

14.
基于结构特征的nBayes双层过滤模型   总被引:7,自引:0,他引:7  
王斌  许洪波  王申 《计算机应用》2006,26(1):191-0194
由于算法的简单和效果的出色,Nave Bayes被广泛地应用到了垃圾邮件过滤当中。通过理论与实验分析发现,结构差异较大的邮件集特征分布差异也较大,这种特征分布差异影响到了Nave Bayes算法的效果。在此基础上,论文提出了一种基于结构特征的双层过滤模型,对不同结构的邮件使用不同的Nave Bayes分类器分开训练和学习。实验分析表明,Nave Bayes使用该模型之后效果有明显的提高,已经与SVM非常接近。  相似文献   

15.
    
Abstract: This research focused on investigating and benchmarking several high performance classifiers called J48, random forests, naive Bayes, KStar and artificial immune recognition systems for software fault prediction with limited fault data. We also studied a recent semi-supervised classification algorithm called YATSI (Yet Another Two Stage Idea) and each classifier has been used in the first stage of YATSI. YATSI is a meta algorithm which allows different classifiers to be applied in the first stage. Furthermore, we proposed a semi-supervised classification algorithm which applies the artificial immune systems paradigm. Experimental results showed that YATSI does not always improve the performance of naive Bayes when unlabelled data are used together with labelled data. According to experiments we performed, the naive Bayes algorithm is the best choice to build a semi-supervised fault prediction model for small data sets and YATSI may improve the performance of naive Bayes for large data sets. In addition, the YATSI algorithm improved the performance of all the classifiers except naive Bayes on all the data sets.  相似文献   

16.
         下载免费PDF全文
Hardware security remains as a major concern in the circuit design flow.Logic block based encryption has been widely adopted as a simple but effective protection method.In this paper,the potential threat arising from the rapidly developing field,i.e.,machine learning,is researched.To illustrate the challenge,this work presents a standard attack paradigm,in which a three-layer neural network and a naive Bayes classifier are utilized to exemplify the key-guessing attack on logic encryption.Backed with validation results obtained from both combinational and sequential benchmarks,the presented attack scheme can specifically accelerate the decryption process of partial keys,which may serve as a new perspective to reveal the potential vulnerability for current anti-attack designs.  相似文献   

17.
  总被引:12,自引:0,他引:12  
Of numerous proposals to improve the accuracy of naive Bayes by weakening its attribute independence assumption, both LBR and Super-Parent TAN have demonstrated remarkable error performance. However, both techniques obtain this outcome at a considerable computational cost. We present a new approach to weakening the attribute independence assumption by averaging all of a constrained class of classifiers. In extensive experiments this technique delivers comparable prediction accuracy to LBR and Super-Parent TAN with substantially improved computational efficiency at test time relative to the former and at training time relative to the latter. The new algorithm is shown to have low variance and is suited to incremental learning.  相似文献   

18.
为了获得高效的超文本分类算法,提出了一种新的协调分类超文本算法,并将k-NN,Bayes和文档相似性引入了超文本分类领域,且这对3种分类器的超的分类效果进行了实验比较,最终得出一个高效的超文本分类器,目前,该方法已应用于新开发的两个实验系统;智能搜索引擎系统WebSearch和智能软件助理WebSoft。  相似文献   

19.
云计算的研究和应用将有一片广阔的前景,详细研究和分析了Hadoop平台架构和核心原理,研究了Hadoop现有的典型作业调度算法,并针对算法存在需要预先配置的问题,提出了基于朴素贝叶斯分类的作业调度算法,通过仿真实验,可以看出改进的算法具备了良好的学习能力,性能良好,可以减轻管理员的负担,提高管理效率,减少人工错误的可能性.  相似文献   

20.
    
Email spam has become a major problem for Internet users and providers. One major obstacle to its eradication is that the potential solutions need to ensure a very low false‐positive rate, which tends to be difficult in practice. We address the problem of low‐FPR classification in the context of naive Bayes, which represents one of the most popular machine learning models applied in the spam filtering domain. Drawing from the recent extensions, we propose a new term weight aggregation function, which leads to markedly better results than the standard alternatives. We identify short instances as ones with disproportionally poor performance and counter this behavior with a collaborative filtering‐based feature augmentation. Finally, we propose a tree‐based classifier cascade for which decision thresholds of the leaf nodes are jointly optimized for the best overall performance. These improvements, both individually and in aggregate, lead to substantially better detection rate of precision when compared with some of the best variants of naive Bayes proposed to date. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号