出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:The quality, length and coverage of a parallel corpus are fundamental features in theperformance of a Statistical Machine Translation System (SMT). For some pair of languagesthere is a considerable lack of resources suitable for Natural Language Processing tasks. Thispaper introduces a technique for extracting medical information from the Wikipedia page.Using a medical ontological dictionary and then we evaluate on a Japanese-Spanish SMTsystem. The study shows an increment in the BLEU score.