期刊名称:International Journal of Computer Science and Network Solutions
印刷版ISSN:2345-3397
出版年度:2014
卷号:2
期号:5
页码:91-97
出版社:International Journal of Computer Science and Network Solutions
摘要:In information retrieval systems search quality is directly related to the number of relevantretrieved documents. Database and data archive that these systems are implemented on caninclude variety of documents with different sizes. In this case, the chance of retrieving the longerdocuments can be more than the shorter ones. To avoid this and giving equal chance to alldocuments be retrieved and increasing the quality and integrity in information retrieval, the lengthof documents must be normalized. In this study, both conventional and proposed methods fornormalization, Cosine Similarity normalization and Pivoted Unique normalization are used fornormalizing the length of documents. Their performance is tested on Wikipedia MM2008 dataarchive and compared to each other. Finally the best model has been introduced
关键词:information retrieval; normalization; Retrieval quality; Wikipedia MM; Cosine Similarity;Vector Space.