共查询到20条相似文献,搜索用时 0 毫秒
1.
Due to its simplicity, efficiency and efficacy, naive Bayes (NB) continues to be one of the top 10 data mining algorithms. A mass of improved approaches to NB have been proposed to weaken its conditional independence assumption. However, there has been little work, up to the present, on instance weighting filter approaches to NB. In this paper, we propose a simple, efficient, and effective instance weighting filter approach to NB. We call it attribute (feature) value frequency-based instance weighting and denote the resulting improved model as attribute value frequency weighted naive Bayes (AVFWNB). In AVFWNB, the weight of each training instance is defined as the inner product of its attribute value frequency vector and the attribute value number vector. The experimental results on 36 widely used classification problems show that AVFWNB significantly outperforms NB, yet at the same time maintains the computational simplicity that characterizes NB. 相似文献
2.
Traditional classification algorithms require a large number of labelled examples from all the predefined classes, which is generally difficult and time-consuming to obtain. Furthermore, data uncertainty is prevalent in many real-world applications, such as sensor network, market analysis and medical diagnosis. In this article, we explore the issue of classification on uncertain data when only positive and unlabelled examples are available. We propose an algorithm to build naive Bayes classifier from positive and unlabelled examples with uncertainty. However, the algorithm requires the prior probability of positive class, and it is generally difficult for the user to provide this parameter in practice. Two approaches are proposed to avoid this user-specified parameter. One approach is to use a validation set to search for an appropriate value for this parameter, and the other is to estimate it directly. Our extensive experiments show that the two approaches can basically achieve satisfactory classification performance on uncertain data. In addition, our algorithm exploiting uncertainty in the dataset can potentially achieve better classification performance comparing to traditional naive Bayes which ignores uncertainty when handling uncertain data. 相似文献
3.
朴素Bayes分类器是一种简单有效的机器学习工具.本文用朴素Bayes分类器的原理推导出\"朴素Bayes组合\"公式,并构造相应的分类器.经过测试,该分类器有较好的分类性能和实用性,克服了朴素Bayes分类器精确度差的缺点,并且比其他分类器更加快速而不会显著丧失精确度. 相似文献
4.
Due to being fast, easy to implement and relatively effective, some state-of-the-art naive Bayes text classifiers with the strong assumption of conditional independence among attributes, such as multinomial naive Bayes, complement naive Bayes and the one-versus-all-but-one model, have received a great deal of attention from researchers in the domain of text classification. In this article, we revisit these naive Bayes text classifiers and empirically compare their classification performance on a large number of widely used text classification benchmark datasets. Then, we propose a locally weighted learning approach to these naive Bayes text classifiers. We call our new approach locally weighted naive Bayes text classifiers (LWNBTC). LWNBTC weakens the attribute conditional independence assumption made by these naive Bayes text classifiers by applying the locally weighted learning approach. The experimental results show that our locally weighted versions significantly outperform these state-of-the-art naive Bayes text classifiers in terms of classification accuracy. 相似文献
5.
6.
朴素贝叶斯分类模型是一种简单而有效的分类方法,但它的属性独立性假设使其无法表达属性变量间存在的依赖关系,影响了它的分类性能.通过分析贝叶斯分类模型的分类原则以及贝叶斯定理的变异形式,提出了一种基于贝叶斯定理的新的分类模型DLBAN(double-level Bayesian network augmented naive Bayes).该模型通过选择关键属性建立属性之间的依赖关系.将该分类方法与朴素贝叶斯分类器和TAN(tree augmented naive Bayes)分类器进行实验比较.实验结果表明,在大多数数据集上,DLBAN分类方法具有较高的分类正确率. 相似文献
7.
8.
Chin-Chun Chang Author Vitae 《Pattern recognition》2010,43(8):2971-2981
The RELIEF algorithm is a popular approach for feature weighting. Many extensions of the RELIEF algorithm are developed, and I-RELIEF is one of the famous extensions. In this paper, I-RELIEF is generalized for supervised distance metric learning to yield a Mahananobis distance function. The proposed approach is justified by showing that the objective function of the generalized I-RELIEF is closely related to the expected leave-one-out nearest-neighbor classification rate. In addition, the relationships among the generalized I-RELIEF, the neighbourhood components analysis, and graph embedding are also pointed out. Experimental results on various data sets all demonstrate the superiority of the proposed approach. 相似文献
9.
The positive unlabeled learning term refers to the binary classification problem in the absence of negative examples. When only positive and unlabeled instances are available, semi-supervised classification algorithms cannot be directly applied, and thus new algorithms are required. One of these positive unlabeled learning algorithms is the positive naive Bayes (PNB), which is an adaptation of the naive Bayes induction algorithm that does not require negative instances. In this work we propose two ways of enhancing this algorithm. On one hand, we have taken the concept behind PNB one step further, proposing a procedure to build more complex Bayesian classifiers in the absence of negative instances. We present a new algorithm (named positive tree augmented naive Bayes, PTAN) to obtain tree augmented naive Bayes models in the positive unlabeled domain. On the other hand, we propose a new Bayesian approach to deal with the a priori probability of the positive class that models the uncertainty over this parameter by means of a Beta distribution. This approach is applied to both PNB and PTAN, resulting in two new algorithms. The four algorithms are empirically compared in positive unlabeled learning problems based on real and synthetic databases. The results obtained in these comparisons suggest that, when the predicting variables are not conditionally independent given the class, the extension of PNB to more complex networks increases the classification performance. They also show that our Bayesian approach to the a priori probability of the positive class can improve the results obtained by PNB and PTAN. 相似文献
10.
In this article, we introduce a personalized counseling system based on context mining. As a technique for context mining,
we have developed an algorithm called CANSY. It adopts trained neural networks for feature weighting and a value difference
metric in order to measure distances between all possible values of symbolic features. CANSY plays a core role in classifying
and presenting most similar cases from a case base. Experimental results show that CANSY along with a rule base can provide
personalized information with a relatively high level of accuracy, and it is capable of recommending appropriate products
or services.
An erratum to this article can be found at 相似文献
11.
朴素贝叶斯分类模型一种简单而高效的分类模型,但它的条件独立性假设使其无法将属性间的依赖表达出来,影响了它分类的正确率。属性间的依赖关系与属性本身的特性有关,有些属性的特性决定了其他属性必然依赖于它,即强属性。文中通过分析属性相关性的度量和贝叶斯定理的变形公式,介绍了强属性的选择方法,通过在强弱属性之间添加增强弧以弱化朴素贝叶斯的独立性假设,扩展了朴素贝叶斯分类模型的结构。在此基础上提出一种基于强属性限定的贝叶斯分类模型SANBC。实验结果表明,与朴素贝叶斯分类模型相比,SANBC分类模型具有较高的分类正确率。 相似文献
12.
有监督学习算法是机器学习中的一类重要算法,该类算法要求外界提供含监督信号的样本作为训练数据。虽然机器学习领域提供了许多基准测试数据,但很多情况下需要自己生成训练样本。给出了一种交互式训练样本获取方法:通过对原始图像进行一种或多种混合的随机变换,用户挑选那些能被人眼识别的样本作为有效样本加以保存。实验结果表明,所提方法产生的图片能模拟摄像头在不同角度、姿态、光照、遮挡等各种复杂场景下拍摄的图像的效果。用系统生成的训练样本训练朴素贝叶斯(NB)分类器,能达到95.042%的识别精度,结果优于UCI人工字符集训练同样的NB分类器时88.4875%的识别精度。 相似文献
13.
基于朴素贝叶斯分类器邮件分类系统的改进 总被引:1,自引:0,他引:1
目前朴素贝叶斯分类方法在电子邮件分类起到了良好的效果,但是并不能100%区分垃圾邮件与非垃圾邮件,然而在商业应用中,我们不能遗漏任何一封重要邮件。本文先简单介绍Bayes方法,然后提出一种对目前的Bayes分类方法的改进思想和方法。 相似文献
14.
15.
Abstract: This research focused on investigating and benchmarking several high performance classifiers called J48, random forests, naive Bayes, KStar and artificial immune recognition systems for software fault prediction with limited fault data. We also studied a recent semi-supervised classification algorithm called YATSI (Yet Another Two Stage Idea) and each classifier has been used in the first stage of YATSI. YATSI is a meta algorithm which allows different classifiers to be applied in the first stage. Furthermore, we proposed a semi-supervised classification algorithm which applies the artificial immune systems paradigm. Experimental results showed that YATSI does not always improve the performance of naive Bayes when unlabelled data are used together with labelled data. According to experiments we performed, the naive Bayes algorithm is the best choice to build a semi-supervised fault prediction model for small data sets and YATSI may improve the performance of naive Bayes for large data sets. In addition, the YATSI algorithm improved the performance of all the classifiers except naive Bayes on all the data sets. 相似文献
16.
Hardware security remains as a major concern in the circuit design flow.Logic block based encryption has been widely adopted as a simple but effective protection method.In this paper,the potential threat arising from the rapidly developing field,i.e.,machine learning,is researched.To illustrate the challenge,this work presents a standard attack paradigm,in which a three-layer neural network and a naive Bayes classifier are utilized to exemplify the key-guessing attack on logic encryption.Backed with validation results obtained from both combinational and sequential benchmarks,the presented attack scheme can specifically accelerate the decryption process of partial keys,which may serve as a new perspective to reveal the potential vulnerability for current anti-attack designs. 相似文献
17.
Geoffrey I.?Webb "author-information "> "author-information__contact u-icon-before "> "mailto:geoff.webb@infotec.monash.edu " title= "geoff.webb@infotec.monash.edu " itemprop= "email " data-track= "click " data-track-action= "Email author " data-track-label= " ">Email author Janice R.?Boughton Zhihai?Wang 《Machine Learning》2005,58(1):5-24
Of numerous proposals to improve the accuracy of naive Bayes by weakening its attribute independence assumption, both LBR and Super-Parent TAN have demonstrated remarkable error performance. However, both techniques obtain this outcome at a considerable computational cost. We present a new approach to weakening the attribute independence assumption by averaging all of a constrained class of classifiers. In extensive experiments this technique delivers comparable prediction accuracy to LBR and Super-Parent TAN with substantially improved computational efficiency at test time relative to the former and at training time relative to the latter. The new algorithm is shown to have low variance and is suited to incremental learning. 相似文献
18.
为了获得高效的超文本分类算法,提出了一种新的协调分类超文本算法,并将k-NN,Bayes和文档相似性引入了超文本分类领域,且这对3种分类器的超的分类效果进行了实验比较,最终得出一个高效的超文本分类器,目前,该方法已应用于新开发的两个实验系统;智能搜索引擎系统WebSearch和智能软件助理WebSoft。 相似文献
19.
20.
Email spam has become a major problem for Internet users and providers. One major obstacle to its eradication is that the potential solutions need to ensure a very low false‐positive rate, which tends to be difficult in practice. We address the problem of low‐FPR classification in the context of naive Bayes, which represents one of the most popular machine learning models applied in the spam filtering domain. Drawing from the recent extensions, we propose a new term weight aggregation function, which leads to markedly better results than the standard alternatives. We identify short instances as ones with disproportionally poor performance and counter this behavior with a collaborative filtering‐based feature augmentation. Finally, we propose a tree‐based classifier cascade for which decision thresholds of the leaf nodes are jointly optimized for the best overall performance. These improvements, both individually and in aggregate, lead to substantially better detection rate of precision when compared with some of the best variants of naive Bayes proposed to date. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献