首页    期刊浏览 2024年11月28日 星期四
登录注册

文章基本信息

  • 标题:GraWiTas: a Grammar-basedWikipedia Talk Page Parser
  • 本地全文:下载
  • 作者:Benjamin Cabrera ; Laura Steinert ; Björn Ross
  • 期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
  • 出版年度:2017
  • 卷号:2017
  • 页码:21-24
  • 语种:Indonesian
  • 出版社:ACL Anthology
  • 摘要:Wikipedia offers researchers unique insights into the collaboration and communication patterns of a large self-regulating community of editors. The main medium of direct communication between editors of an article is the article’s talk page. However, a talk page file is unstructured and therefore difficult to analyse automatically. A few parsers exist that enable its transformation into a structured data format. However, they are rarely open source, support only a limited subset of the talk page syntax – resulting in the loss of content – and usually support only one export format. Together with this article we offer a very fast, lightweight, open source parser with support for various output formats. In a preliminary evaluation it achieved a high accuracy. The parser uses a grammar-based approach – offering a transparent implementation and easy extensibility.
国家哲学社会科学文献中心版权所有