首页    期刊浏览 2024年09月16日 星期一
登录注册

文章基本信息

  • 标题:Design and Annotation of MultiMedica – A Multilingual Text Corpus of the Biomedical Domain
  • 本地全文:下载
  • 作者:Antonio Moreno-Sandoval ; Antonio Moreno-Sandoval ; Leonardo Campillos-Llanos
  • 期刊名称:Procedia - Social and Behavioral Sciences
  • 印刷版ISSN:1877-0428
  • 出版年度:2013
  • 卷号:95
  • 页码:33-39
  • DOI:10.1016/j.sbspro.2013.10.619
  • 语种:English
  • 出版社:Elsevier
  • 摘要:AbstractThis article describes the MultiMedica corpus, a multilingual collection of Spanish, Japanese, and Arabic texts from the biomedical domain. This novel combination of languages has been chosen with two purposes: the contrastive study of three languages that are typologically and genetically different, and the creation of a gold standard to develop and evaluate an Automatic Term Recognition (ATR) system. A total of 51,476 documents have been collected from the Web, and the corpus contains over seven and a half million words. Most documents were written by medical doctors and edited by journalists for the general public. Each text has been tagged for Part-of-Speech and indexed in an Information Retrieval system and a concordance interface that is aimed at students of Translation, Medicine, and Medical Humanities.
  • 关键词:Biomedical discourse;Text corpus;Terminology;Spanish;Arabic;Japanese
国家哲学社会科学文献中心版权所有