文章基本信息

标题：Machine Learning Algorithms for Document Classification: Comparative Analysis
本地全文：下载
作者：Faizur Rashid ; Suleiman M. A. Gargaare ; Abdulkadir H. Aden 等
期刊名称：International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN：2158-107X
电子版ISSN：2156-5570
出版年度：2022
卷号：13
期号：4
DOI：10.14569/IJACSA.2022.0130430
语种：English
出版社：Science and Information Society (SAI)
摘要：Automated document classification is the machine learning fundamental that refers to assigning automatic categories among scanned images of the documents. It reached the state-of-art stage but it needs to verify the performance and efficiency of the algorithm by comparing. The objective was to get the most efficient classification algorithms according to the usage of the fundamentals of science. Experimental methods were used by collecting data from a sum of 1080 students and researchers from Ethiopian universities and a meta-data set of Banknotes, Crowdsourced Mapping, and VxHeaven provided by UC Irvine. 25% of the respondents felt that KNN is better than the other models. The overall analysis of performance accuracies through various parameters namely accuracy percentage of 99.85%, the precision performance of 0.996, recall ratio of 100%, F-Score 0.997, classification time, and running time of KNN, SVM, Perceptron and Gaussian NB was observed. KNN performed better than the other classification algorithms with a fewer error rate of 0.0002 including the efficiency of the least classification time and running time with ~413 and 3.6978 microseconds consecutively. It is concluded by looking at all the parameters that KNN classifiers have been recognized as the best algorithm.
关键词：Document classification; machine learning algorithms; text classification; analysis