首页    期刊浏览 2024年11月24日 星期日
登录注册

文章基本信息

  • 标题:Advantages of Using a Spell Checker in Text Mining Pre-Processes
  • 本地全文:下载
  • 作者:Jhonathan Quillo-Espino ; Rosa María Romero-González ; Alberto Lara-Guevara
  • 期刊名称:Journal of Computer and Communications
  • 印刷版ISSN:2327-5219
  • 电子版ISSN:2327-5227
  • 出版年度:2018
  • 卷号:6
  • 期号:11
  • 页码:43-54
  • DOI:10.4236/jcc.2018.611004
  • 语种:English
  • 出版社:Scientific Research Publishing
  • 摘要:The aim of this work was the behavior analysis when a spell checker was integrated as an extra pre-process during the first stage of the test mining. Different models were analyzed, choosing the most complete one considering the pre-processes as the initial part of the text mining process. Algorithms for the Spanish language were developed and adapted, as well as for the methodology testing through the analysis of 2363 words. A capable notation for removing special and unwanted characters was created. Execution times of each algorithm were analyzed to test the efficiency of the text mining pre-process with and without orthographic revision. The total time was shorter with the spell-checker than without it. The key difference of this work among the existing related studies is the first time that the spell checker is used in the text mining preprocesses.
  • 关键词:Spell Checker;Text Mining;Stemming;Tokenization;Porter Algorithm;Snowball Algorithm
国家哲学社会科学文献中心版权所有