期刊名称:International Journal of Computer Science & Information Technology (IJCSIT)
印刷版ISSN:0975-4660
电子版ISSN:0975-3826
出版年度:2017
卷号:9
期号:2
页码:131
出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:In this paper we present gender and authorship categorisationusing the Prediction by Partial Matching(PPM) compression scheme for text from Twitter written in Arabic. The PPMD variant of the compressionscheme with different orders was used to perform the categorisation. We also applied different machinelearning algorithms such as Multinational Na飗e Bayes (MNB), K-Nearest Neighbours (KNN), and animplementation of Support Vector Machine (LIBSVM), applying the same processing steps for all thealgorithms. PPMD shows significantly better accuracy in comparison to all the other machine learningalgorithms, with order 11 PPMD working best, achieving 90 % and 96% accuracy for gender andauthorship respectively.
关键词:Arabic text categorisation; Data compression; Machine learning Algorithms