首页    期刊浏览 2025年04月30日 星期三
登录注册

文章基本信息

  • 标题:Comparison of Segmentable Units as Indicators of Two Texts Being Parallel (Short Paper)
  • 作者:Afonso Xavier Canosa
  • 期刊名称:OASIcs : OpenAccess Series in Informatics
  • 电子版ISSN:2190-6807
  • 出版年度:2018
  • 卷号:62
  • 页码:16:1-16:7
  • DOI:10.4230/OASIcs.SLATE.2018.16
  • 出版社:Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
  • 摘要:A bitext produced from a Portuguese historical text and its English translation, Fernão Mendes Pinto's Pilgrimage, serves as a case study to describe the creation of a parallel corpus and investigate which linguistic and textual units are the best indicators of alignability. The process of building the corpus goes through preparation of transcriptions, annotation, segmentation and sentence alignment. Once the bitext is ready, the corpus is used to inquire which units appear as more relevant to predict that both texts are parallel. From the largest content units, those of chapters, to sentences, word types, tokens and characters, the latest, despite being the unit with less textual and linguistic significance, were found to be the best indicator of both texts being alignable.
  • 关键词:parallel corpora; text alignment; bitexts
Loading...
联系我们|关于我们|网站声明
国家哲学社会科学文献中心版权所有