期刊名称:Journal of King Saud University @?C Computer and Information Sciences
印刷版ISSN:1319-1578
出版年度:2022
卷号:34
期号:8
页码:6092-6103
语种:English
出版社:Elsevier
摘要:Word Sense Disambiguation (WSD) is significant for improving the accuracy of the interpretation of a Natural language text. Various supervised learning-based models and knowledge-based models have been developed in the literature for WSD of the language text. However, these models do not provide good results for low-resource languages, due to the lack of labelled and tagged data. Therefore, in this work, we have examined different word embedding techniques for word sense disambiguation of the Hindi language texts. Several studies in the literature show that these embeddings have been utilized for different foreign languages in the field of word sense disambiguation. However, to the best of our knowledge, no such work exists for the Hindi language. Therefore, in this paper, we utilize various existing word embeddings for WSD of Hindi text. Moreover, we have created Hindi word embeddings on articles taken from Wikipedia and test the quality of the created word embeddings using Pearson correlation. In this direction, we perform different experiments and observe that Word2Vec model gives best performance among all the considered embeddings on the used Hindi dataset. In our method, the proposed model directly takes input that is trained with word embedding methods and helps to develop a sense inventory using clustering that has been employed for performing disambiguation. Experimental observations indicate that the performance of the proposed approach is moderate and competent in terms of accuracy. The paper, thus, presents how WSD can leverage these representations to encode rich semantic information.