首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 218 毫秒
1.
支持向量机在垃圾邮件过滤中能达到较高的分类准确率,实际应用中,将正常邮件误判为垃圾邮件会给用户造成更大的损失。该文提出一个基于代价敏感支持向量机的垃圾邮件过滤方案,通过为正类和负类训练样本设置不同的错误惩罚系数对分类器进行训练,在保证较高的垃圾邮件召回率的前提下,尽可能降低正常邮件的误判率(假阳性率)。实验结果表明,该方案能有效地提高过滤器的整体性能,更好地满足垃圾邮件过滤的实际要求。  相似文献   

2.
基于多Bayes网的垃圾邮件智能过滤研究   总被引:2,自引:0,他引:2  
在分析朴素Bayes方法用于垃圾邮件自动过滤中存在的一些问题基础上,提出了一种新的基于多Bayes网的垃圾邮件自动过滤方法。该方法利用多个Bayes网构成的多个分类器同时对邮件进行分类,当前邮件被认定是垃圾邮件当且仅当全部分类器都判断它为垃圾邮件。这种多个分类器同时工作及分类临界值的使用在一定程度上减少了将有用邮件误判为垃圾邮件的可能性。该方法还引入动态学习机制,在邮件分类过程中能够补充训练样本,满足不同用户的邮件分类标准。  相似文献   

3.
王庆幸  徐从富  何俊 《计算机科学》2008,35(10):197-199
研究如何实现Logistic回归模型在中文垃圾邮件过滤中的应用,给出了关键技术,并将其应用于SEWM20071)垃圾邮件语料库上,取得了较优的过滤效果.还分析了影响正常邮件误判率、垃圾邮件误判率和精确率等因子.对比实验结果表明,应用于中文垃圾邮件过滤的Logistic回归模型与SVM相比具有更优的ROC指标和更快的运行效率.  相似文献   

4.
基于双隶属度模糊支持向量机的邮件过滤   总被引:2,自引:0,他引:2       下载免费PDF全文
针对邮件所含信息的模糊性和合法邮件与垃圾邮件错分代价的不对称性提出了基于双隶属度模糊支持向量机的邮件过滤方法,通过对每个样本赋予不同的双隶属度,得到最优分类器,提高了邮件过滤的正确率。经仿真实验证明,该方法能够有效降低将合法邮件误判为垃圾邮件,而且有很高的正确率等特点。  相似文献   

5.
在垃圾邮件过滤中,考虑到特征词对合法邮件和垃圾邮件分类贡献的不同,通过定义分类贡献比系数,将特征词分类贡献的思想应用到特征选择和朴素贝叶斯过滤器的设计中,在英文语料库上进行实验,实验结果表明,应用特征词分类贡献的垃圾邮件过滤方法可以有效提高过滤器对合法邮件和垃圾邮件的识别能力,降低过滤器对合法邮件和垃圾邮件的误判率。  相似文献   

6.
龚伟  李柳柏 《微机发展》2007,17(3):163-165
以智能决策支持系统结构为基础,提出了一种新的电子邮件过滤模型,并对中文垃圾邮件过滤中的中文分词及垃圾邮件特征知识库的更新等关键问题进行了探讨。开发了“智能邮件过滤系统(IEFS)”,使垃圾邮件误判率得到了一定程度的控制,有效防止了垃圾邮件的泛滥。  相似文献   

7.
在垃圾邮件过滤中,针对过滤器对合法邮件的误判问题,提出一种改进的垃圾邮件过滤算法。该算法对信息增益的条件熵估计方法作了改进,结合最小风险贝叶斯决策方法,在英文语料库上进行实验,并采用召回率和正确率对算法进行评价分析。实验结果表明,改进后的方法可提高过滤器对合法邮件的识别能力,降低对合法邮件的误判,减少用户的损失。  相似文献   

8.
以智能决策支持系统结构为基础,提出了一种新的电子邮件过滤模型.并对中文垃圾邮件过滤中的中文分词及垃圾邮件特征知识库的更新等关键问题进行了探讨。开发了“智能邮件过滤系统(JEFS)”,使垃圾邮件误判率得到了一定程度的控制.有效防止了垃圾邮件的泛滥。  相似文献   

9.
传统的贝叶斯垃圾邮件过滤系统虽然具有较高的分类准确性,但是在处理邮件时存在效率低、消耗资源量大的问题。本文针对贝叶斯垃圾邮件过滤算法进行了在Hadoop Map Reduce下的研究,并对判定类别的阈值进行了优化,实验表明,本文提出的算法降低了正常邮件的误判率,提高了垃圾邮件判定的准确率和F值,同时提高了垃圾邮件过滤的效率。  相似文献   

10.
分析了贝叶斯分类方法在中文垃圾邮件过滤中的应用。提出了基于贝叶斯最小风险的垃圾邮件过滤技术,通过选择适当的损失函数,尽可能减少合法邮件的误判。实验结果表明,该方法是切实可行的并具有良好的效果。  相似文献   

11.
A new technique for managing and disseminating Web-based email prefetches messages and generates dynamic pages, displaying them at the network edge. Compared to other popular Web-based email servers, the prefetching and caching emails (PACE) prototype shows an improved performance with respect to user-perceived latency. Additionally, PACE'S centralized neural-network-based personalized spam filter will filter spam and viruses at the server's origin, thus saving bandwidth. Another major concern for users is the email accounts being clogged with spam. Spam filters can be classified as server-side or client-side. Server-side filters are integrated with email servers and filter out spam at the server end.  相似文献   

12.
As the importance of email increases, the amount of malicious email is also increasing, so the need for malicious email filtering is growing. Since it is more economical to combine commodity hardware consisting of a medium server or PC with a virtual environment to use as a single server resource and filter malicious email using machine learning techniques, we used a Hadoop MapReduce framework and Naïve Bayes among machine learning methods for malicious email filtering. Naïve Bayes was selected because it is one of the top machine learning methods(Support Vector Machine (SVM), Naïve Bayes, K-Nearest Neighbor(KNN), and Decision Tree) in terms of execution time and accuracy. Malicious email was filtered with MapReduce programming using the Naïve Bayes technique, which is a supervised machine learning method, in a Hadoop framework with optimized performance and also with the Python program technique with the Naïve Bayes technique applied in a bare metal server environment with the Hadoop environment not applied. According to the results of a comparison of the accuracy and predictive error rates of the two methods, the Hadoop MapReduce Naïve Bayes method improved the accuracy of spam and ham email identification 1.11 times and the prediction error rate 14.13 times compared to the non-Hadoop Python Naïve Bayes method.  相似文献   

13.
垃圾邮件的处理是电子邮件服务中非常重要的功能,该文在对标准邮件集表示为向量空间模型,降维处理处理工作的基础上,运用神经网络集成的方法来构造邮件分类器,对邮件进行过滤;该方法在垃圾邮件语料库上进行了实验,实验证明该方法对于垃圾邮件的过滤有较好的效果。  相似文献   

14.
基于粗糙集的加权朴素贝叶斯邮件过滤方法   总被引:5,自引:3,他引:2  
邮件过滤中有两个关键问题,一是如何选择有效的邮件特征集,二是设计较好的邮件过滤算法。在对邮件特性进行分析的基础上,综合邮件头及邮件内容的主要形象特征给出了一种新的邮件特征集提取方法。用粗糙集的信息观点度量了各属性的重要性,并以此为权重进行加权朴素贝叶斯垃圾邮件过滤,有效地解决了朴素贝叶斯分类中的条件依赖性问题。通过在中英文邮件集上的测试实验,证明了所提出的邮件过滤方法的有效性。  相似文献   

15.
Traditional classification methods assume that the training and the test data arise from the same underlying distribution. However, in several adversarial settings, the test set is deliberately constructed in order to increase the error rates of the classifier. A prominent example is spam email where words are transformed to get around word based features embedded in a spam filter.  相似文献   

16.
Email spam filtering is typically treated as a binary classification problem that can be solved by machine learning algorithms. We argue that a three-way decision approach provides a more meaningful way to users for precautionary handling their incoming emails. Three email folders instead of two are produced in a three-way spam filtering system, a suspected folder is added to allow users make further examinations of suspicious emails, thereby reducing the chances of misclassification. Different from existing ternary email spam filtering systems, we focus on two issues that are less studied, that is, the computation of required thresholds to define the three email categories, and the interpretation of the cost-sensitive characteristics of spam filtering. Instead of supplying the thresholds based on intuitive understandings of the levels of tolerance for errors, we systematically calculate the thresholds based on decision-theoretic rough set model. A loss function is interpreted as the costs of making classification decisions. A decision is made for which the overall cost is minimum. Experimental results show that the new approach reduces the error rate of misclassifying a legitimate email to spam and demonstrates a better performance for the cost-sensitivity aspect.  相似文献   

17.
电子邮件随着Intemet的发展给人们带来了方便,随之而来的垃圾邮件却给人们带来无尽的烦恼。本文针对反垃圾邮件技术发展与现状,对目前已应用或正在研究的垃圾邮件过滤技术进行了分析,为项目组改进垃圾邮件过滤方法的下一步工作做前期准备。  相似文献   

18.
电子邮件随着Internet的发展给人们带来了方便,随之而来的垃圾邮件却给人们带来无尽的烦恼.本文针对反垃圾邮件技术发展与现状,对目前已应用或正在研究的垃圾邮件过滤技术进行了分析,为项目组改进垃圾邮件过滤方法的下一步工作做前期准备.  相似文献   

19.
基于改进贝叶斯的垃圾邮件过滤系统设计与实现   总被引:10,自引:3,他引:7  
该文设计并实现了一种基于改进贝叶斯的垃圾邮件过滤系统。传统的贝叶斯方法对邮件进行过滤时,将邮件视为一个无序关键词的向量空间,丢掉了词与词之间,句子之间的相互关系。该文则将邮件视为句间有序,句子内部关键词无序但是相关的部分有序的集合。减少传统方法处理时信息的丢失。得到的实验结果比传统方法更好。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号