Using differential evolution for fine tuning naïve Bayesian classifiers and its application for text classification |
| |
Affiliation: | 1. College of Technological Innovation, Zayed University, P.O. Box 144534 Abu Dhabi, United Arab Emirates;2. Computer Science Department, Taiz University, Yemen |
| |
Abstract: | The Naive Bayes (NB) learning algorithm is simple and effective in many domains including text classification. However, its performance depends on the accuracy of the estimated conditional probability terms. Sometimes these terms are hard to be accurately estimated especially when the training data is scarce. This work transforms the probability estimation problem into an optimization problem, and exploits three metaheuristic approaches to solve it. These approaches are Genetic Algorithms (GA), Simulated Annealing (SA), and Differential Evolution (DE). We also propose a novel DE algorithm that uses multi-parent mutation and crossover operations (MPDE) and three different methods to select the final solution. We create an initial population by manipulating the solution generated by a method used for fine tuning the NB. We evaluate the proposed methods by using their resulted solutions to build NB classifiers and compare their results with the results of obtained from classical NB and Fine-Tuning Naïve Bayesian (FTNB) algorithm, using 53 UCI benchmark data sets. We name these obtained classifiers NBGA, NBSA, NBDE, and NB-MPDE respectively. We also evaluate the performance NB-MPDE for text-classification using 18 text-classification data sets, and compare its results with the results of obtained from FTNB, BNB, and MNB. The experimental results show that using DE in general and the proposed MPDE algorithm in particular are more convenient for fine-tuning NB than all other methods, including the other two metaheuristic methods (GA, and SA). They also indicate that NB-MPDE achieves superiority over classical NB, FTNB, NBDE, NBGA, NBSA, MNB, and BNB. |
| |
Keywords: | Fine tuning Naïve Bayes Differential evolution Text classification Improving estimated probabilities Multi-parent mutation Multi-parent crossover Genetic algorithm Simulated annealing Bernoulli NB Multinomial NB |
本文献已被 ScienceDirect 等数据库收录! |
|