摘要:In recent years, e-mail spam has become an increasingly important problem with a big economic impact in society. Fortunately, there are different approaches able to automatically detect and remove most of these messages, and the best-known ones are based on Bayesian decision theory. However, the most of these probabilistic approaches have the same difficulty: the high dimensionality of the feature space. Many term selection methods have been proposed in the literature. In this paper, we revise the most popular methods used as term selection techniques with seven different versions of Naive Bayes spam filters.
关键词:Redução de dimensionalidade;Filtragem de spams;Aprendizagem de máquina