文章基本信息

标题：Advantages of Using a Spell Checker in Text Mining Pre-Processes
本地全文：下载
作者：Jhonathan Quillo-Espino ; Rosa María Romero-González ; Alberto Lara-Guevara 等
期刊名称：Journal of Computer and Communications
印刷版ISSN：2327-5219
电子版ISSN：2327-5227
出版年度：2018
卷号：06
期号：11
页码：43-54
DOI：10.4236/jcc.2018.611004
出版社：Scientific Research Publishing
摘要：The aim of this work was the behavior analysis when a spell checker was integrated as an extra pre-process during the first stage of the test mining. Different models were analyzed, choosing the most complete one considering the pre-processes as the initial part of the text mining process. Algorithms for the Spanish language were developed and adapted, as well as for the methodology testing through the analysis of 2363 words. A capable notation for removing special and unwanted characters was created. Execution times of each algorithm were analyzed to test the efficiency of the text mining pre-process with and without orthographic revision. The total time was shorter with the spell-checker than without it. The key difference of this work among the existing related studies is the first time that the spell checker is used in the text mining preprocesses.
关键词：Spell Checker;Text Mining;Stemming;Tokenization;Porter Algorithm;Snowball Algorithm