文章基本信息

标题：Word prediction in computational historical linguistics
本地全文：下载
作者：Peter Dekker ; Willem Zuidema
期刊名称：Journal of Language Modelling
印刷版ISSN：2299-856X
电子版ISSN：2299-8470
出版年度：2020
卷号：8
期号：2
页码：295-336
DOI：10.15398/jlm.v8i2.268
语种：English
出版社：Polish Academy of Sciences
摘要：In this paper, we investigate how the prediction paradigm from machine learning and Natural Language Processing (NLP) can be put to use in computational historical linguistics. We propose word prediction as an intermediate task, where the forms of unseen words in some target language are predicted from the forms of the corresponding words in a source language. Word prediction allows us to develop algorithms for phylogenetic tree reconstruction, sound correspondence identification and cognate detection, in ways close to attested methods for linguistic reconstruction. We will discuss different factors, such as data representation and the choice of machine learning model, that have to be taken into account when applying prediction methods in historical linguistics. We present our own implementations and evaluate them on different tasks in historical linguistics.
关键词：computationalhistoricallinguistics;machine learning;deep learning