New Spam Filtering Method with Hadoop Tuning-Based MapReduce Naïve Bayes期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

New Spam Filtering Method with Hadoop Tuning-Based MapReduce Naïve Bayes

Authors:	Keungyeup Ji Youngmi Kwon

Affiliation:	Department of Radio and Information Communications Engineering, Chungnam National University, Daejeon, 34134, Korea

Abstract:	As the importance of email increases, the amount of malicious email is also increasing, so the need for malicious email filtering is growing. Since it is more economical to combine commodity hardware consisting of a medium server or PC with a virtual environment to use as a single server resource and filter malicious email using machine learning techniques, we used a Hadoop MapReduce framework and Naïve Bayes among machine learning methods for malicious email filtering. Naïve Bayes was selected because it is one of the top machine learning methods(Support Vector Machine (SVM), Naïve Bayes, K-Nearest Neighbor(KNN), and Decision Tree) in terms of execution time and accuracy. Malicious email was filtered with MapReduce programming using the Naïve Bayes technique, which is a supervised machine learning method, in a Hadoop framework with optimized performance and also with the Python program technique with the Naïve Bayes technique applied in a bare metal server environment with the Hadoop environment not applied. According to the results of a comparison of the accuracy and predictive error rates of the two methods, the Hadoop MapReduce Naïve Bayes method improved the accuracy of spam and ham email identification 1.11 times and the prediction error rate 14.13 times compared to the non-Hadoop Python Naïve Bayes method.

Keywords:	Hadoop hadoop distributed file system(HDFS) MapReduce configuration parameter malicious email filtering Naïve Bayes

	点击此处可从《计算机系统科学与工程》浏览原始摘要信息
	点击此处可从《计算机系统科学与工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏