摘要:Vector space model allows computing a continuous degree of similarity between queries and retrieved documents and then ranks the documents in increasing order of cosine (similarity) value. It computes cosine or similarity value using their cosine function. The cosine function computes the similarity value by computing the weight of each term in the documents using a weighting scheme but it is a complex process to compute the weight of each term in the documents. It is also found that sometimes it fails to compute a similarity score, Firstly if there is only one document in the corpus and query terms match with the document and secondly, if the number of documents containing query terms and total number of documents retrieved are equal. To address this problem in order to improve the performance, we proposed an enhanced approach for computation of cosine or similarity value by enhancing the vector space model. Our work intends to analyze and implement our proposed method in performance evaluation of three search engines Google, Yahoo and MSN. To verify our method, we compared our proposed method with a manually computed relevance score and found that our evaluations match with manual method.
关键词:Information Retrieval; Term Frequency; Cosine Value; IDF; Vector Space Model