首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
反垃圾邮件技术已成为人们关注的一个焦点。基于贝叶斯理论的垃圾邮件过滤技术有着独特的优势,而其中的朴素贝叶斯模型具有算法简单、有效,易于实现等优点而成为最常用的模型。本文系统地介绍了朴素贝叶斯及其扩展模型的核心思想,并对朴素贝叶斯模型的发展作了大胆的预测,这对贝叶斯垃圾邮件过滤技术具有理论和现实的意义。  相似文献   

2.
基于结构特征的nBayes双层过滤模型   总被引:7,自引:0,他引:7  
王斌  许洪波  王申 《计算机应用》2006,26(1):191-0194
由于算法的简单和效果的出色,Nave Bayes被广泛地应用到了垃圾邮件过滤当中。通过理论与实验分析发现,结构差异较大的邮件集特征分布差异也较大,这种特征分布差异影响到了Nave Bayes算法的效果。在此基础上,论文提出了一种基于结构特征的双层过滤模型,对不同结构的邮件使用不同的Nave Bayes分类器分开训练和学习。实验分析表明,Nave Bayes使用该模型之后效果有明显的提高,已经与SVM非常接近。  相似文献   

3.
针对垃圾邮件过滤过程中分类模型难以个性化、难以适应用户兴趣动态变化的问题,提出了一种基于用户行为的邮件分类算法。通过分析朴素贝叶(NB)斯分类算法的原理,改造朴素贝叶斯算法,使其具有动态调整能力。邮件服务器接收到新邮件后自动进行分类判别,用户浏览邮件的过程中对邮件进行操作,根据用户对错分邮件的处理自动将该邮件加入训练数据集,并动态更新相应特征的统计概率,使邮件分类算法能够依据用户对不同邮件的操作行为动态调整分类模型,以达到有效过滤垃圾邮件的目的。与常用的贝叶斯分类算法的实验比较表明在给定小样本集合进行训练的情况下,新算法对于垃圾邮件的识别率比传统的朴素贝叶斯方法、基于风险敏感的朴素贝叶斯方法等提高了10%,获得了较好的分类性能。  相似文献   

4.
肖明  刘乃琦 《福建电脑》2004,(11):37-38
随着互联网的快速发展,电子邮件已成为人们传递信息的一种重要手段,而垃圾邮件正成为互联网上的一棵毒瘤,严重阻碍了网络的正常发展。针对这种情况,本文给出了一个结合简单规则与基于支持向量机技术的邮件内容过滤模型,并对模型系统中用到邮件向量表示、降维处理,训练集修剪等算法进行了说明。初步实验表明,该模型具有较好过滤效果。  相似文献   

5.
贝叶斯过滤算法和费舍尔过滤算法均是利用统计学知识对于垃圾邮件进行过滤的算法,有着良好的过滤效果。该文设计将某一词组(单词)出现概率使用加权计算的方法,改善了朴素贝叶斯算法和朴素费舍尔的邮件过滤算法对于出现较少的单词误判情况,使系统对于垃圾邮件判断的准确率上升。设计可以使用个性化的垃圾邮件过滤方案,支持使用邮件下载协议(POP3、IMAP协议)从邮件服务器下载邮件,以及使用邮件解析协议(MIME协议)对于邮件进行解析,支持邮件发送协议(SMTP协议)帮助用户发送邮件。  相似文献   

6.
基于危险理论的动态垃圾邮件过滤模型   总被引:1,自引:1,他引:0       下载免费PDF全文
基于危险理论,建立了一个垃圾邮件过滤模型DTDEF,对过滤垃圾邮件具有一定的动态性、学习的自适应性和分类的有效性,给出了该模型的基本架构及其具体实现算法,并通过与Bayes算法比较,表明该模型相对Bayes方法在邮件过滤时具有更好的动态性和有效性。  相似文献   

7.
粗糙集与决策树在电子邮件分类与过滤中的应用   总被引:1,自引:0,他引:1       下载免费PDF全文
垃圾邮件的识别与过滤是目前研究的热点问题之一。而粗糙集是一种新的处理模糊和不确定性知识的数据分析工具,已被成功地应用到许多有关分类的领域。将粗糙集与决策树结合,提出一个基于RS-DT的邮件分类方案与模型,并进行了实验及结果分析。通过与朴素贝叶斯模型及SVM的比较,表明提出的基于RS-DT的模型可以降低把正常邮件错分为垃圾邮件的比率,提高过滤系统的自学习能力。  相似文献   

8.
惠孛  吴跃 《计算机应用》2009,29(3):903-904
由于朴素贝叶斯分类模型的简单高效,在垃圾邮件分类时可以达到较好的效果;但朴素贝叶斯的条件独立假设割裂了属性之间的关系,影响了分类的准确性。放松朴素贝叶斯分类模型关于属性之间条件独立假设,介绍一种新的基于不完全朴素贝叶斯分类模型的垃圾邮件分类模型,N平均1 依赖邮件过滤模型。使用N个1 依赖分类模型的平均概率作为分类的预测概率。实验证明,该模型在简单、高效的同时降低了对垃圾邮件分类的错误率。  相似文献   

9.
分析了一种基于直线几何分割的朴素贝叶斯邮件过滤模型LGDNBF,用更为精确的代价因子描述了分类器误判的代价。定义了高风险决策区域,对高风险决策区域中的邮件引入SVM方法进行二次分类,提出了基于精确代价因子的两层邮件过滤模型。在中文邮件语料集上的实验结果证明了这一两层过滤模型的分类效果较之朴素贝叶斯邮件过滤模型有明显的改进。  相似文献   

10.
随着网络的不断发展,电子邮件已成为人们生活中较为普及的通信手段,相应地垃圾邮件也成为了困扰E-mail用户的主要问题,因此研究如何更好的抑制垃圾邮件的滥发变得愈发紧迫.在基于朴素贝叶斯算法的基础上提出了带有损失因子k的最小风险贝叶斯算法,该算法通过调整k值,来改善正常邮件的误判问题,最大程度上减少用户的损失.最后实验结果表明,最小风险贝叶斯算法可以使垃圾邮件有着更好的过滤效果.  相似文献   

11.
王丽侠 《微机发展》2005,15(9):42-44,47
研究了邮件过滤的主要方法,提出了将Agent技术、粗糙集和最小风险的Bayes分类方法结合的邮件过滤及个性化分类模型。该模型首先利用粗糙集方法对邮件样本向量空间进行约简,然后利用已知样本对最小风险的Bayes分类器进行训练,得到具有智能分类功能的邮件分类器,利用该分类器过滤掉用户不感兴趣的邮件,并利用Agent学习用户的个性化知识,最后利用学习的知识将用户感兴趣邮件进行再分类。  相似文献   

12.
《Knowledge》2007,20(2):120-126
The naive Bayes classifier continues to be a popular learning algorithm for data mining applications due to its simplicity and linear run-time. Many enhancements to the basic algorithm have been proposed to help mitigate its primary weakness – the assumption that attributes are independent given the class. All of them improve the performance of naive Bayes at the expense (to a greater or lesser degree) of execution time and/or simplicity of the final model. In this paper we present a simple filter method for setting attribute weights for use with naive Bayes. Experimental results show that naive Bayes with attribute weights rarely degrades the quality of the model compared to standard naive Bayes and, in many cases, improves it dramatically. The main advantages of this method compared to other approaches for improving naive Bayes is its run-time complexity and the fact that it maintains the simplicity of the final model.  相似文献   

13.
贝叶斯分类算法在冠心病中医临床证型诊断中的应用   总被引:2,自引:0,他引:2  
在中医药临床个体化诊疗信息平台的基础上,使用中医证型的辨证相关因素,利用信息增益算法进行辨证属性选择,并分别采用朴素贝叶斯和强属性集贝叶斯网络算法建立了中医冠心病临床证型诊断模型。实验结果表明该分类算法在中医冠心病临床诊断模型中具有良好的分类性能。  相似文献   

14.
This paper considers a Bayesian model-averaging (MA) approach to learn an unsupervised naive Bayes classification model. By using the expectation model-averaging (EMA) algorithm, which is proposed in this paper, a unique naive Bayes model that approximates an MA over selective naive Bayes structures is obtained. This algorithm allows to obtain the parameters for the approximate MA clustering model in the same time complexity needed to learn the maximum-likelihood model with the expectation-maximization algorithm. On the other hand, the proposed method can also be regarded as an approach to an unsupervised feature subset selection due to the fact that the model obtained by the EMA algorithm incorporates information on how dependent every predictive variable is on the cluster variable.  相似文献   

15.
Abstract: This research focused on investigating and benchmarking several high performance classifiers called J48, random forests, naive Bayes, KStar and artificial immune recognition systems for software fault prediction with limited fault data. We also studied a recent semi-supervised classification algorithm called YATSI (Yet Another Two Stage Idea) and each classifier has been used in the first stage of YATSI. YATSI is a meta algorithm which allows different classifiers to be applied in the first stage. Furthermore, we proposed a semi-supervised classification algorithm which applies the artificial immune systems paradigm. Experimental results showed that YATSI does not always improve the performance of naive Bayes when unlabelled data are used together with labelled data. According to experiments we performed, the naive Bayes algorithm is the best choice to build a semi-supervised fault prediction model for small data sets and YATSI may improve the performance of naive Bayes for large data sets. In addition, the YATSI algorithm improved the performance of all the classifiers except naive Bayes on all the data sets.  相似文献   

16.
Feng Zeng  Lan Yao  Baoling Wu  Wenjia Li  Lin Meng 《Software》2020,50(11):2031-2045
Human contact prediction is a challenging task in mobile social networks. The existing prediction methods are based on the static network structure, and directly applying these static prediction methods to dynamic network prediction is bound to reduce the prediction accuracy. In this paper, we extract some important features to predict human contacts and propose a novel human contact prediction method based on naive Bayes algorithm, which is suitable for dynamic networks. The proposed method takes the ever-changing structure of mobile social networks into account. First, the past time is partitioned into many periods with equal intervals, and each period has a feature matrix of all node pairs. Then, with the feature matrixes used for classifiers training based on naive Bayes algorithm, we can get a classifier for each time period. At last, the different weights are assigned to the classifiers according to their importance to contact prediction, and all classifiers are weighted combination into the final prediction classifier. The extensive experiments are conducted to verify the effectiveness and superiority of the proposed method, and the results show that the proposed method can improve the prediction accuracy and TP Rate to a large extent. Besides, we find that the size of time interval has a certain impact on the clustering coefficient of mobile social networks, which further affects the prediction accuracy.  相似文献   

17.
目前对以朴素贝叶斯算法为代表的文本分类算法,普遍存在特征权重一致,考虑指标单一等问题。为了解决这个问题,提出了一种基于TF-IDF的朴素贝叶斯改进算法TF-IDF-DL朴素贝叶斯算法。该算法以TF-IDF为基础,引入去中心化词频因子和特征词位置因子以加强特征权重的准确性。为了验证该算法的效果,采用了搜狗实验室的搜狗新闻数据集进行实验,实验结果表明,在朴素贝叶斯分类算法中引入TF-IDF-DL算法,能够使该算法在进行文本分类中的准确率、召回率和F 1值都有较好的表现,相比国内同类研究TF-IDF-dist贝叶斯方案,分类准确率提高8.6%,召回率提高11.7%,F 1值提高7.4%。因此该算法能较好地提高分类性能,并且对不易区分的类别也能在一定程度上达到良好的分类效果。  相似文献   

18.
Many algorithms have been proposed for the machine learning task of classification. One of the simplest methods, the naive Bayes classifier, has often been found to give good performance despite the fact that its underlying assumptions (of independence and a normal distribution of the variables) are perhaps violated. In previous work, we applied naive Bayes and other standard algorithms to a breast cancer database from Nottingham City Hospital in which the variables are highly non-normal and found that the algorithm performed well when predicting a class that had been derived from the same data. However, when we then applied naive Bayes to predict an alternative clinical variable, it performed much worse than other techniques. This motivated us to propose an alternative method, based on naive Bayes, which removes the requirement for the variables to be normally distributed, but retains the essential structure and other underlying assumptions of the method. We tested our novel algorithm on our breast cancer data and on three UCI datasets which also exhibited strong violations of normality. We found our algorithm outperformed naive Bayes in all four cases and outperformed multinomial logistic regression (MLR) in two cases. We conclude that our method offers a competitive alternative to MLR and naive Bayes when dealing with data sets in which non-normal distributions are observed.  相似文献   

19.
利用CHI值特征选取和前向神经网络的覆盖算法,通过对文本进行分词的预处理后,实现文本的自动分类。该方法利用CHI值进行特征选取即特征降维,应用覆盖算法进行文本分类。该方法将CHI值特征选取和覆盖算法充分结合,在提高了分类速度的同时还保证了分类的准确度。应用该方法对标准数据集中的文本进行实验,并在不同的维数上与SVM算法、朴素贝叶斯方法的实验结果进行了比较。结果表明,与SVM算法和朴素贝叶斯方法相比较,覆盖算法在准确度上更好。并且,维数的选择对分类的精确度影响很大。  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号