文章基本信息

标题：Extraction of Turkish Semantic Relation Pairs using Corpus Analysis Tool
本地全文：下载
作者：Gurkan Sahin ; Banu Diri ; Tugba Yildiz 等
期刊名称：International Journal of Computer and Information Technology
印刷版ISSN：2279-0764
出版年度：2016
卷号：5
期号：6
页码：491-499
出版社：International Journal of Computer and Information Technology
摘要：In this study, we have developed a Turkish semantic relation extraction tool. The tool takes an unparsed corpus as input and gives hyponym, meronym and antonym words with their reliability scores as output for given target words. Corpus is parsed by Turkish morphological parser called Zemberek and word vectors are created by Word2Vec for each unique word in corpus. To extract relation patterns, hyponymy, holonymy, antonymy pairs called initial seeds are prepared then, all possible relation patterns are extracted using initial seeds. Reliability of patterns are calculated using corpus statistics and various association metrics. Reliable patterns are selected to extract new semantic pairs from parsed corpus. To determine correctness of extracted pairs, total pattern frequency, different pattern frequency and Word2Vec vector cosine similarity have been used. After experiments, we have obtained 83%, 63%-86%, and 85% average precisions for hyponymy, holonymy and antonymy relations, respectively.
关键词：hyponymy; holonymy; antonymy; Word2Vec; semantic relation; pattern;based approach;