期刊名称:Lecture Notes in Engineering and Computer Science
印刷版ISSN:2078-0958
电子版ISSN:2078-0966
出版年度:2017
卷号:2227&2228
页码:24-29
出版社:Newswood and International Association of Engineers
摘要:In this paper, we propose a method to automatically discover valuable keyphrases in Japanese and link these keyphrases to related Chinese Wikipedia pages. The method that we propose has four stages. Firstly, we extract nouns from a Japanese document using a morphological analyzer and extract the candidates of keyphrases using a method called Top Consecutive Nouns Cohesion (TCNC) [1]. Then, we judge the degree of difficulty of the extracted keyphrases and tag them with different linguistic levels. Secondly, we translate extracted Japanese keyphrases into Chinese using a combination of three translation methods. Thirdly, we extract the corresponding Chinese articles of the translated keyphrases. Fourthly, we translate the original Japanese document into Chinese and make a vector of noun frequencies. Then, we calculate the cosine similarities of the translated original document and candidate Chinese Wikipedia articles. Finally, we create links from the Japanese keyphrases to the top-ranking Chinese Wikipedia articles.
关键词:Entity linking; keyphrase extraction; Wikipedia; ; Cross-language Link Discovery; linguistic difficulty level ; estimation