摘要:Support Vector Machine (SVM) is a powerful classification and regression tool. Varying approaches including SVM based techniques are proposed for email classification. Automated email classification according to messages or user-specific folders and information extraction from chronologically ordered email streams have become interesting areas in text machine learning research. This paper presents a parallel SVM based on MapReduce (PSMR) algorithm for email classification. We discuss the challenges that arise from differences between email foldering and traditional document classification. We show experimental results from an array of automated classification methods and evaluation methodologies, including Naive Bayes, SVM and PSMR method of foldering results on the Enron datasets based on the timeline. By distributing, processing and optimizing the subsets of the training data across multiple participating nodes, the parallel SVM based on MapReduce algorithm reduces the training time significantly