首页    期刊浏览 2024年11月25日 星期一
登录注册

文章基本信息

  • 标题:Text Normalization in Social Media: Progress, Problems and Applications for a Pre-Processing System of Casual English
  • 本地全文:下载
  • 作者:Eleanor Clark ; Eleanor Clark ; Kenji Araki
  • 期刊名称:Procedia - Social and Behavioral Sciences
  • 印刷版ISSN:1877-0428
  • 出版年度:2011
  • 卷号:27
  • 页码:2-11
  • DOI:10.1016/j.sbspro.2011.10.577
  • 语种:English
  • 出版社:Elsevier
  • 摘要:AbstractThe rapid expansion in user-generated content on the Web of the 2000s, characterized by social media, has led to Web content featuring somewhat less standardized language than the Web of the 1990s. User creativity and individuality of language creates problems on two levels. The first is that social media text is often unsuitable as data for Natural Language Processing tasks such as Machine Translation, Information Retrieval and Opinion Mining, due to the irregularity of the language featured. The second is that non-native speakers of English, older Internet users and non-members of the “in-group” often find such texts difficult to understand. This paper discusses problems involved in automatically normalizing social media English, various applications for its use, and our progress thus far in a rule-based approach to the issue. Particularly, we evaluate the performance of two leading open source spell checkers on data taken from the microblogging service Twitter, and measure the extent to which their accuracy is improved by pre-processing with our system. We also present our database rules and classification system, results of evaluation experiments, and plans for expansion of the project.
  • 关键词:Natural Language Processing;Machine Translation;Social Media;Twitter;Text Normalization
国家哲学社会科学文献中心版权所有