期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2021
卷号:12
期号:11
DOI:10.14569/IJACSA.2021.0121142
语种:English
出版社:Science and Information Society (SAI)
摘要:Information retrieval has been an ever-going process for end users to fetch relevant data at one go. The problem intensifies more with unstructured data in a semantic web environment. It is also a promising area for researchers to dive in and refine it from time to time. Expanding the user query and reformulating it is one probable solution to increase the efficiency of the information retrieval system. In this paper we propose “WeOnto”, a novel two-level query expansion algorithm that utilizes the combination of web ontologies and word embeddings for similarity calculation. In the first level, the Real estate Ontology (REO) is created using Protégé and Sparql queries are passed to retrieve probable semantic words from the given ontology for each inputted user query. The first level gave significant results and improved the information retrieval by 18%. The second level of algorithm uses word embedding enhanced with the domain knowledge that helps to retrieve similar meaningful words based on cosine similarity for the same user query. Word embeddings are implemented using Word2Vec method that follows two architectures namely CBOW or Skip Gram. Most similar semantic words are retrieved using the CBOW word embeddings method in the proposed algorithm and concatenated with the semantic keywords generated from the real estate ontology to form a powerful reformulated query that gives promising relevant results. Finally, two topmost words as per their similarity index are taken to reformulate the original user query. Experimental results depict that proposed algorithm has given distinct results and has showcased significant improvement of 93% over the initial user query.
关键词:CBOW; Information retrieval; ontology; query reformulation; semantic web; skip gram; word embeddings; word2vec