首页 | 本学科首页   官方微博 | 高级检索  
     


New Spam Filtering Method with Hadoop Tuning-Based MapReduce Naïve Bayes
Authors:Keungyeup Ji  Youngmi Kwon
Affiliation:Department of Radio and Information Communications Engineering, Chungnam National University, Daejeon, 34134, Korea
Abstract:As the importance of email increases, the amount of malicious email is also increasing, so the need for malicious email filtering is growing. Since it is more economical to combine commodity hardware consisting of a medium server or PC with a virtual environment to use as a single server resource and filter malicious email using machine learning techniques, we used a Hadoop MapReduce framework and Naïve Bayes among machine learning methods for malicious email filtering. Naïve Bayes was selected because it is one of the top machine learning methods(Support Vector Machine (SVM), Naïve Bayes, K-Nearest Neighbor(KNN), and Decision Tree) in terms of execution time and accuracy. Malicious email was filtered with MapReduce programming using the Naïve Bayes technique, which is a supervised machine learning method, in a Hadoop framework with optimized performance and also with the Python program technique with the Naïve Bayes technique applied in a bare metal server environment with the Hadoop environment not applied. According to the results of a comparison of the accuracy and predictive error rates of the two methods, the Hadoop MapReduce Naïve Bayes method improved the accuracy of spam and ham email identification 1.11 times and the prediction error rate 14.13 times compared to the non-Hadoop Python Naïve Bayes method.
Keywords:Hadoop  hadoop distributed file system(HDFS)  MapReduce  configuration parameter  malicious email filtering  Naïve Bayes
点击此处可从《计算机系统科学与工程》浏览原始摘要信息
点击此处可从《计算机系统科学与工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号