期刊名称:Indian Journal of Computer Science and Engineering
印刷版ISSN:2231-3850
电子版ISSN:0976-5166
出版年度:2021
卷号:12
期号:3
页码:765-778
DOI:10.21817/indjcse/2021/v12i3/211203207
出版社:Engg Journals Publications
摘要:It is truly amazing that human beings can easily understand and comprehend the intended meaning of an ambiguous word. The meaning of an ambiguous word differs with its usage in a different context. Still, human beings can Fig. out the meanings with ease. We have machine translation (MT) systems that can translate from a source language to its equivalent target language. The main intension of these MT systems is to seamlessly transfer the intended meaning of the source text to its target text. But due to ambiguous nature of natural language, MT systems do suffer from setbacks. Word sense disambiguation (WSD) is one of the greatest challenges to overcome. The researchers have contributed a number of WSD algorithms that operate over textual data. These algorithms were primarily developed to disambiguate an ambiguous word or to determine the exact meaning of an ambiguous word based on the context. It seems that context plays a decisive role while disambiguating an ambiguous word. A section of the research community are of the opinion that the neighbouring words that appear along with an ambiguous word in a sentence might help in finding the meaning or sense of an ambiguous word. This is commonly known as distributional semantics. In this paper, we have proposed a novel technique to remove the ambiguity of polysemous noun using a multimodal distributional semantics model (MDSM). The arduous task was to find a standard multimodal database for carrying out our desired experiments. This was compensated by using ImageNet database. ImageNet is a large-scale database containing tens of millions of annotated images organized by the semantic hierarchy of WordNet. Our MDSM exploits both the image features and textual features from the annotated images of ImageNet database. For both the training and testing purpose, we have used a total of 8 different synsets. The total no. of 800 related images to these synsets are used for training (each synset contains a reduced set of images i.e. 100 images). However, the total no. of images used for testing is 8 ((each synset contains 1 image) only. The 8 synsets that we have considered are {(bat: word, mammal: sense), (bat: word, cricket: sense), (bass: word, guitar: sense), (bass: word, fish: sense), (mouse: word, animal: sense), (mouse: word, device: sense), (bank: word, piggy: sense), (bank: word, river: sense)}. These synsets are carefully crafted from voluminous ImageNet database so that each of the synset represents a polysemous noun. The training phase generates two co-occurrence matrices namely (i) reduced weighted word-synset matrix of size |n * 6| where n=total of nouns and (ii) reduced weighted codeword-synset matrix of size |k * 6| where k=total of visual codewords. The value of k is not fixed but varies where k = 50, 200 and 400.Each noun that appears in the weighted word-synset matrix is a vector vw. As per distributional semantics, neighbouring words that occur along with a polysemous word may help in disambiguation of a polysemous word. Keeping this mind, all neighbouring nouns that occur along with the polysemous noun vector in the weighted word-synset matrix are later used in the testing phase. Although the test data contains 8 annotated images related to the above-mentioned synsets, the textual data is entirely omitted out during testing; only the images are considered. By applying image processing algorithms, m no. of features are extracted out from each test image. The m no. of image features are assigned with a codeword label by measuring the Euclidean distance (nearest to the cluster centre) between the m image features and codewords. Since, test image features contain m number of codewords, so a single codeword vector cannot be used to represent an image vector. So, all the m no. of codewords are summed up to obtain a single vector 𝒗𝒊. To measure the semantic relatedness between the test image I and a word (noun) w, e.g. chords, we simply compute the cosine similarity between vi and vw. From the experimental results, we can draw the conclusion that our proposed algorithm could disambiguate a polysemous noun with the help of neighbouring words (nouns) and image features. We may that our algorithm is based on distributional semantics and a joint semantic space of words (nouns) and images.