New Spam Filtering Method with Hadoop Tuning-Based MapReduce Naïve Bayes |
| |
Authors: | Keungyeup Ji Youngmi Kwon |
| |
Affiliation: | Department of Radio and Information Communications Engineering, Chungnam National University, Daejeon, 34134, Korea |
| |
Abstract: | As the importance of email increases, the amount of malicious email is also increasing, so the need for malicious email filtering is growing. Since it is more economical to combine commodity hardware consisting of a medium server or PC with a virtual environment to use as a single server resource and filter malicious email using machine learning techniques, we used a Hadoop MapReduce framework and Naïve Bayes among machine learning methods for malicious email filtering. Naïve Bayes was selected because it is one of the top machine learning methods(Support Vector Machine (SVM), Naïve Bayes, K-Nearest Neighbor(KNN), and Decision Tree) in terms of execution time and accuracy. Malicious email was filtered with MapReduce programming using the Naïve Bayes technique, which is a supervised machine learning method, in a Hadoop framework with optimized performance and also with the Python program technique with the Naïve Bayes technique applied in a bare metal server environment with the Hadoop environment not applied. According to the results of a comparison of the accuracy and predictive error rates of the two methods, the Hadoop MapReduce Naïve Bayes method improved the accuracy of spam and ham email identification 1.11 times and the prediction error rate 14.13 times compared to the non-Hadoop Python Naïve Bayes method. |
| |
Keywords: | Hadoop hadoop distributed file system(HDFS) MapReduce configuration parameter malicious email filtering Naïve Bayes |
|
| 点击此处可从《计算机系统科学与工程》浏览原始摘要信息 |
|
点击此处可从《计算机系统科学与工程》下载全文 |
|